When the *status* page is returning a 500 error... not a good sign.

traviscj · on Oct 31, 2017

On the other hand, makes it sound more likely to be a routing/reverse proxy issue instead of (say) a database issue. Those sound easier to deal with via a rollback vs something like "oops we dropped a critical index on the `messages` table".

nyrikki · on Nov 1, 2017

With the length of the outage it seems more like a DB issue. And if this is still correct there look to be some fragile dependencies.

https://aws.amazon.com/solutions/case-studies/slack/

The API isn't even returning valid error codes.

  logging error: {"subtype":"api_call_error","message":"{\"ok\":false,\"error\":\"_http_error\",\"status\":0,\"retry_after\":null}","stack":"Error\n

As you can hit their api servers I am betting some replication error in mysql but that is just a guess based on that case study.

I am betting they are saying 'connectivity' because that is the error the client logs.

nyrikki · on Nov 1, 2017

It is back up for me, but now I am annoyed they can't follow specs at all.

For l

They ignore:

https://standards.freedesktop.org/basedir-spec/basedir-spec-...

And kitchen sink everything under the xdg config dir...

  ~$ ls .config/Slack/
  Cache/                Cookies-journal       
  dictionaries/         installation          Local 
  Storage/        
  Preferences           QuotaManager-journal  
  Cookies               databases/            GPUCache/             
  local-settings.json   logs/                 QuotaManager          
  storage/

munk-a · on Oct 31, 2017

It's an even worse sign when the page half-loads with some stylesheets missing.

Florin_Andrei · on Oct 31, 2017

Maybe they should not serve the static content with "Cache-Control: max-age=1". That's rarely a good idea.

cadr · on Oct 31, 2017

Yeah, it alternates loading telling me everything is fine, and just giving me a nginx 500 error. Seems that the status page should be hosted differently so it can be up even when other things are having issues.

seanp2k2 · on Nov 1, 2017

Returned for me when I was looking but took ~30 seconds to do so. They should host their status page on a separate domain and use something like CloudFlare in front of it to help with sudden spikes in traffic. Another alternative is to use Twitter / Facebook as the status page and let them deal with the traffic spikes, or just serve static HTML.

Gibbon1 · on Oct 31, 2017

It's a sign that we need to appease the beer gods.