Jump to content

IkarosBD

Admin
  • Content count

    304
  • Joined

  • Last visited

  • Days Won

    2

IkarosBD last won the day on April 19

IkarosBD had the most liked content!

Community Reputation

520 Trusted

About IkarosBD

  • Rank
    LostYears Distro/Xertion IRC Network Root Administrator

Profile Information

  • Gender
    Male
  • Location
    Texas USA

Recent Profile Visitors

10,022 profile views
  1. We sincerely apologize for that prolonged outage the other day. After monitoring the server for a few days I am ready to post a detailed report about what happened this time. What happened?! At approximately 4:00 PM CDT (GMT-5) on April 19th, the entire set of Kametsu websites became completely unresponsive. Active troubleshooting began at 8 PM CDT - I had been unable to respond to the incident immediately as I had been asleep from 3 PM to 8 PM CDT. Initial investigation revealed that the CPU load average on the Kametsu server had nearly exceeded 7 times the maximum CPU time available, and this had in turn caused the web server to effectively freeze, unable to accept or process requests due to the enormously abnormal load on the system. A check of the process list - after a few failed attempts to obtain that list due to resources being obviously strained - showed that the web server processes were consuming almost the entirety of this CPU time. The web server processes were all forcibly killed to ease the load on the system and give it time to return to idle. Once this had occurred, the main web server process was restarted. Within minutes, the CPU load began to spike again, and when this started to occur, a quick check of the process list once again showed one of the web server processes consuming well over 100% of the CPU consistently, without abating. Troubleshooting immediately shifted to the web server configuration and its various modules and libraries, and tracing back each and every one to determine if one of them was causing the problem. Unfortunately, this took a lot of time as there is a lot of these to go through and trace, and this effect wasn't always immediate, sometimes it took a while to actually occur with each process restart. Eventually I had determined it was not any of the loaded modules or libraries. Given that the server had previously thrown errors the day before (see the other RFO for details on that), I opted to reboot the entire server itself to see if that would fix anything. While this was effective in providing a cleaner environment, this unfortunately did not fix the issue. Troubleshooting then turned to the server configuration and server logs themselves. The log files hinted at a possible recursion problem, but did not indicate where it would possibly be if there was one. I began looking at our redirect rules that we use in various places across the server to see if one of them would be causing it. I eventually found the problem after nearly 2 hours of testing at this stage - an old redirect rule that redirected forums.kametsu.com (which is not valid anymore) to the base domain of kametsu.com that had been mistakenly left in place when it should have been removed. This rule was causing a recursion error with search bots that were still relying on the old URL, because the supporting rules that assisted in that redirect had already been removed. Given the frequency by which search bots crawl our site, this was causing so many recursing requests that it was completely overtaxing the web server. I had to observe this over a period of about an hour to be able to confirm this was the case, and once I was able to confirm this, the old redirect rule was disabled, the web server restarted, and monitored for a day or two to ensure no recurrences. Thankfully, there were none, and the server returned to normal soon after. What was done to resolve this? As stated above, once the cause was nailed down, the old redirect rule was fully disabled as it should have been in the first place. Once it was disabled, the recursion errors went away and the server went back to normal load. Normal service was confirmed restored at approximately 4:25 AM CDT on April 20th. Last words Once again, I sincerely apologize for the trouble this caused everyone, and especially for the extensive downtime. Had I been able to respond quicker, it probably wouldn't have taken as long. From this point forward we'll have a better system in place for responding to incidents like this in the future. I'll also make sure to be thorough with my work on the server from now on - and probably a good idea not to touch things if I'm drunk or drowsy. Many thanks again to our wonderful staff for keeping everyone updated as I progressed through troubleshooting. I spent nearly 12 hours straight troubleshooting the server, even sacrificing my dinner, to get this community back up and running. It was not easy to do but I love this community so much I'm always willing to go that extra mile.
  2. If nobody is receiving mail from the forum since the outage, please let me know. I'll take a look at the mail logs.

     

    But I can tell you now ahead of time, if there is any mail failure it's certainly not on our end. Every single mail that's gone out has done so successfully (mail server returns "Accepted for delivery" etc). Nothing has been rejected from our servers.

     

    Again though, if you know you should be receiving mail but you are not (e.g a thread you're subscribed to that you're not getting emailed about when updated), let me know. Thank you.

    1. Pollux

      Pollux

      Just got a notification of this status update via email, so yay! :D

       

  3. Alright guys, sorry about that. I'm continuing to monitor the server but so far everything appears to be OK again.

     

    Provided everything continues to work for the next 12 hours or so, I'll have an RFO up for this outage as well at that time.

     

    Thanks for bearing with me as I troubleshooted this newest problem.

     

    And sadly...I missed my dinner and my jRPG game night because of this :(:(:(:(:(

    1. LeStig

      LeStig

      I initially thought that our moronic government blocked Kametsu intentionally, or while "chasing Telegram".

    2. spaceman99

      spaceman99

      Thanks for getting things going again and for keeping the site up and running, you were at it for hours because i periodically checked and saw a few of your updates on the backup page. Once again thank you for all your hard work!

  4. I am SO, SO sorry about that everyone. I REALLY should have been watching the server's disk space much, MUCH more closely than I was, especially knowing how it was set up.


    I've posted details about what happened here:

     

    Again, really, super sorry about that. :(

    1. Gogeta-Blue

      Gogeta-Blue

      No need for appologies.

    2. NeutralHatred

      NeutralHatred

      I'm not sure how servers work but, Windows warns you periodically if a disk is getting full. Does the server not run on a Windows environment?

      Again. Dumb. Not a server person.

    3. IkarosBD

      IkarosBD

      @NeutralHatred it does not, lol. About the only time you'll know you're out of disk space on Linux is when shit starts throwing weird errors (like the database software for example). Some other programs will throw this obvious message at you (in the comment, see https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/errno-base.h for a list of all the different basic error codes Linux can throw at you, more are defined here as well, you might see some familiar ones: https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/errno.h):

      #define	ENOSPC	28	/* No space left on device */

      In other words...the only time you'll know it's actually out of space, is when it's already too late and shit's hitting the fan, lol (unless you have some 3rd-party monitoring tool installed that tells you ahead of time).

       

      In our case, some people reported an "Error" page showing up yesterday night (for us admins, we get that as well as the cause - which was a dump of the SQL query that caused the exception). I didn't know it at the time, but the reason that exception had been thrown was because the system thought one of the database tables was 'crashed', which usually you just run a check command and then a repair command on and go about your business. I did not find any crashed tables. What I should have realized then was, it was not in fact the database itself with a problem but the underlying disk device that its data files were located on...which just so happened to be completely full. As in..."0 bytes free" full. Thank you, you damn webserver :/

  5. We apologize for that unexpected prolonged outage. Here is a brief failure report. What happened?! During the overnight hours on April 18th-19th (US time), we experienced a database crash. The database was subsequently checked and restarted, and this appeared to fix the problem initially. Unfortunately, unbeknownst to me at the time, the root cause of this had been left unaddressed, and as a result the entire web infrastructure became unusable minutes later, and I was forced to redirect the site to my emergency backup server in order to troubleshoot. Upon inspection of the server, it was discovered that the web server's log files had grown to such an enormous size that they had eaten up every last byte of remaining disk space on the root partition of the server. We found that our usual log rotation processes were not performing correctly, and this is why the log file had grown so large. It was also discovered that this had been the cause of the previous database issue, as well, as those data files are stored on the same root partition. What was done to resolve this? In order to resolve this problem, I first had to move the overgrown log files off the server. We retain a subset of these logs for diagnostic purposes in case of internal web server errors. Since the server processes for rotating old log files out of the way was not functioning properly, I had to quickly determine exactly which of these log files to move and retain, then compress them and move them off to the larger partition. This process took time to complete due to the sheer size of some of these files. After this was done, the entire web server log directory was completely wiped clean and the disk space freed. Since the database files were also on the same root partition, I wanted to avoid any future problems regarding these files. The data directory was moved to the larger partition, along with the existing data files. The original copies located on the root partition were left untouched, and what we will do from this point forward is back up to the root partition and run the database from the larger partition. This should avoid the database being constrained by the limitations of the root partition itself from now on. Once all these operations were complete, the primary infrastructure was restarted, and the web server was pushed back to the primary server. Login sessions were all reset as a result of this so you will have to log in again, but everything APPEARS to be in order again. Last words I sincerely apologize for the trouble this caused everyone. It certainly caught me off guard, when I should have been watching the disk space more closely than I had been. I feel wholly responsible for allowing this to occur in the first place. The good news is nothing appears to be damaged, and everything does appear to be working again. But if anyone encounters a forum error again, please let me know ASAP. Many thanks to @Renzourin for catching and reporting the initial issue to me, and to @Koby for jumping on IRC and keeping everyone informed while I worked on the server.
  6. Proof that @Scyrous is weird, brought to you by Kametsu's IRC staff channel:

     

    Quote

    02:17:42 AM <&Scyrous> IkarosBD, do you consider mayonnaise to be an instrument?
    02:18:03 AM <&IkarosBD> ...if that isn't the most random thing...
    02:19:49 AM <&Scyrous> please answer the question
    02:20:20 AM <&IkarosBD> I'm afraid I cannot
    02:20:36 AM <&Scyrous> Now that´s a shame
    02:20:44 AM <&IkarosBD> No
    02:21:23 AM <&Scyrous> Oh yes it is
    02:26:04 AM <&IkarosBD> I was answering your question.
    02:26:23 AM <&Scyrous> No you weren´t
    02:26:49 AM <&IkarosBD> How do you know
    02:26:55 AM <&Scyrous> I know all
    02:27:02 AM <&IkarosBD> Bullshit.

     

    1. Show previous comments  22 more
    2. Saf

      Saf

      A TANKER FULL OF SPAGHETTI SAUCE

    3. Nabull
    4. Saf

      Saf

      Well, I'm usually into girls, but...okay. I'll give it a chance.

  7. Rules for Kametsu's official Discord server

    Suggestion implemented, I like that idea. Thanks @Moodkiller
  8. Kametsu is now on Discord!

    UPDATE: The old invite link is no longer valid. The server structure was changed slightly - the new invite link is https://discord.gg/wTMknvP The first thing you'll see when joining with this link, is that you'll have been dropped into a #readme channel. This channel contains a pinned message that has our rules for the Discord server. They are the same rules as posted above, but you'll still need to read them again, so as to be sure you understand them thoroughly. You can then join channels from that channel using the links at the bottom of the pinned message. Any questions/concerns, please let me know. Thank you.
  9. Kametsu is now on Discord!

    I have posted official rules for the Discord server. They aren't that much different than our usual community guidelines but they do have a few specific to Discord itself that must be read and understood. Please be sure to read these BEFORE joining our Discord server!
  10. Now that we've had a Discord server for a while, I got to thinking that it's probably time to lay out some official, specific rules for it. This is largely due to a recent incident that was reported to me by Discord's Trust & Safety team, but also slightly because we don't have a solid ruleset for it. These rules will now be shown to you when you join the server via the invite link, as well! Without further ado, here they are: Do not ask to be assigned a role. The exception to this is as follows: If you are marked as belonging to certain usergroups here (Donators, Uploaders, to name a few), please be certain to let me know via PM - once connected to Discord you'll need to provide me your Discord handle and ID so I can grant you the appropriate role. Do not ask if you can mod the server. I only select specific staff members I can personally entrust with the upkeep of the server, and only if/when there is a need for more. ALWAYS comply with the Discord Community Guidelines, which are located here: https://discordapp.com/guidelines Don't use this server as a means to insult others, nor to troll others with the intention of starting a flame war. Violators will be removed permanently on the first offense. Have respect for eachother's opinions. EVERYONE is entitled to their own, of course, but do not attempt to impose your own upon others. NSFW content goes into the NSFW channel. What's considered NSFW? Anything with any degree of nudity, sexual references, references to pornography, as well as excessive swearing, to name some things. You MUST be 18 years or older to enter the NSFW channel. IMPORTANT NOTE REGARDING THE ABOVE: Under NO circumstances are you to post ANY remotely pornographic depiction involving minors - this includes lolicon and shotacon. DO NOT POST IT. This will likely result in removal of the post, being removed from the server, and/or possibly being reported to Discord Trust & Safety. Racism is not welcome here. Everyone on this server is an equal and they can do without racist bigots. No spamming. This includes (but is not limited to) advertising of other servers other than those authorized by staff. No flooding. That means don't go pasting huge amounts of text. While Discord can handle such things much better than IRC can, the fact remains that nobody likes to have to read an entire wall of text in the middle of a conversation. Use an online paste tool. You have any questions, comments, or concerns, please feel free to drop IkarosBD or anyone in one of the admin or mod roles a DM. They'll be happy to lend an ear should you need it. These rules are subject to change at any time, with or without prior notice. By joining one of our channels, you are hereby acknowledging to fully abide by these rules. Failing to read and understand these rules does not excuse you if you break them. If you have anything else you'd like me to add to these rules, or any modifications you think should be made, please post in here and let me know. I did want to originally include this with the original Discord post but that one was already lengthy as it was. I opted instead to make a new thread, and simply link to it from the official Discord thread, which allows for more proper discussion.
  11. Just use the damn sheet, lol, and stop your whining about how it was created.
  12. Sorry about that folks. A security upgrade was applied to the webserver and somehow this completely borked the server due to a broken upgrade script on the maintainer's end. I've fixed it though, thank GOD. I was about to have a damn heart attack or something. :(

  13. Nobody panic! Those last few errors, I triggered on purpose, in an effort to track down an apparent gremlin that's been causing grief for some people (in the form of Internal Server Error messages). Proving real tough to track down. I've made some emergency adjustments in the meantime, hopefully these will hold.

    1. Gogeta-Blue

      Gogeta-Blue

      Otsukaresama desu.

      thanks-for-all-your-hard-work.png

  14. [UPCOMING] Multi-stage server maintenance

    Good. No errors (e.g like the error 504 pages that some people were getting on occasion, no unusual lag anywhere outside of the norm, etc?)
  15. Just a heads-up I'm going to be making a few emergency tweaks to the backend to see if it helps anything with the "504" errors some of you have been getting. Expect some interruptions here and there over the next several hours as I try to work with the server a bit.

    1. Pikanet128

      Pikanet128

      OK

      Just FYI this Is no longer an issue for me.

    2. IkarosBD

      IkarosBD

      Great, that's good news. Means whatever I'm doing, is working. Let me know if anything else breaks. This is a relatively new configuration I'm running here and I'm still needing to evaluate its performance and stability.

×