Outage

Started by klondike, June 21, 2022, 09:17:31 AM

« previous - next »

klondike

Sorry about the server outage. Not sure when it stopped responding or why. Fixed by rebooting the server.

It's good to moan.

Raven

Hadn't noticed any outage. :smiley: 

-Oy-

Spolied my breakfast that did! Glad we're back!
"If a man does not keep pace with his companions, perhaps it is because he hears a different drummer."

klondike

It highlighted some problems for me which I'll need to sort out. I store and maintain all my bookmarks in a database and present them using a web page. When the server stopped I lost access to them all including any link to the AWS account which I can control the server from.

I had already thought of that ages ago and save a static copy of my Favourites page in Dropbox. Unfortunately the last time I refreshed that was some months ago. Before I switched over to using an AWS Lightsail server. I had to retrieve them manually from a database archive. Doh!

It also highlighted something else which I had never tested. I just set up the AWS account to do daily system snapshots. When I used the reboot option the server didn't restart correctly and I assumed that the disk was corrupted so I created a fresh instance from a snapshot. That came up but the database wasn't working. Google shows that others have had the same problem. I deleted that new instance and set up another from an earlier snapshot. Same problem. I thought I was going to have to set up a vanilla instance and build it from scratch from my own automatic backup routines. Then I saw there was an option to stop the instance and another to start it. Luckily doing that got it going again otherwise it may have been out for a couple of hours or more and I'd have been a very unhappy bunny.

It's good to moan.

Scrumpy

I thought you had 'shut up shop'. 
Nearly went over to the other side..    :angry:
Don't ask me.. I know nuffink..

klondike

I think I found out what the problem was. Not the server the site runs on but a Cloudflare outage.

https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/

That explains why the recovery required nothing more than turning it off and on and why I could find no errors in the logs to explain the problem. In fact it required nothing doing at all. By the time I'd finished naffing about the real problem had been fixed by Cloudflare and web traffic was able to get through again.

It's good to moan.

-Oy-

I enabled Cloudfare on my forum a couple of years ago and got nothing but problems. Promptly turned it off.
"If a man does not keep pace with his companions, perhaps it is because he hears a different drummer."

klondike

Normally they are reliable - a sizeable portion of the internet uses them. The main advantage apart from increased speed from their caching though is it cuts out any DNS propagation issues if you move server.

Recovery on AWS can mean starting up a fresh instance which has a different IP address and the switch to it through Cloudflare is instant as soon as you specify the new endpoint IP. Without it it could take anything up to a day for everybody to reach the recovered forum. Hopefully I'll never need that advantage of course. 

It's good to moan.