The problem of logs is actually a problem of CloudWatch Logs being just not a very good service. A great way to solve that is to push all logs from CloudWatch Logs into an ElasticSearch cluster (using a Lambda function). AWS even has the code already done for you if you click the "subscribe" button in CWL. Then with Kibana/ElasticSearch the experience of inspecting and analysing logs is MUCH better.
This is what drives me nuts about AWS, but is genius on their part - they've got you to spend more $$ on ElasticSearch and Lambda to overcome the fact that one of their other products just doesn't work very well.
I don't think this is intentional. My experience with AWS is that they aim to constantly improve products and create hosted services that are cheaper than if you frankensteined the same thing yourself. They don't always succeed, but I'm willing to give them the benefit of the doubt for how bad cloudwatch truly is, and just assume either they're blind to the pain because they know what not to do internally, or they're just really hamstrung in trying to modify that feature since so much relies on it.
Hopefully this isn't just stockholm syndrome speaking though...
We built IOpipe[1] to address these issues by offering our own wrapper[2] that sends telemetry to our service. IOpipe aggregates metrics, and errors, and allows the creation of alerts with multiple rules per alert.
FYI: except on very wide screens that landing page copy is barely readable over the background image, and with a narrow window it's also obscured by the header and navigation.
Have you tried out Azure Functions? It has pretty good features for continuous integration, a CLI where you can run locally, and a really nice monitoring experience. Obviously it's probably not worth migrating an existing project, but it might be useful for future projects. Plus, we have a Serverless Framework plugin.
I used Azure Functions (when I was a MSFT employee, in fact), and found it to be unusable. A list of complaints:
* setup is 100% completely clicky-clicky UI driven, which was a huge pain to scale. instantiation of a Function on behalf of a developer for production use was a huge time sink
* it's clearly a thin veneer on Azure Web Services, and the abstractions leak badly in the portal (deployment credentials, for example)
* the web UI breaks completely and mysteriously if you enable authentication
* management of service princi- uh, I mean, Azure AD Applications was weird, and the (internal to MSFT, I suspect) permissions model to the Graph API was a huge barrier to ease of use
* management of NPM packages required me to start a terminal session in the UI and run commands manually, which was a huge turnoff (and had to be repeated ad nauseam with every new Function created)
* the configuration files for the runtime are utterly undocumented, with the sole exception of the bits used to plug Azure inputs/outputs together. this makes automating things exceedingly difficult. I recall there even being a magic value in the topmost config file
* the edit-commit-push-test cycle was VERY slow, with new commits sometimes taking tens of minutes to "appear" in my function
* I never found a way to run it locally, making the previous point that much worse
* log output is very difficult to find, and can live in a few different places. I spent too much time hunting for errors, especially things like syntax errors that make the runtime itself go kaboom. This was the thing that really killed it for me; if I had an error that resulted in anything but a "clean" return, it was torture trying to figure out where I'd missed the paren.
- It’s true that Functions is built on App Service, but I see that as an advantage. You get all the great features of Continuous Integration, custom domains, automated deployment, etc.
- Indeed, the portal does not do well when auth is enabled and all routes are protected. The problem is that the portal calls admin APIs that are also protected, so it fails. We now have better error messages for this, and we’re tracking this bug: https://github.com/Azure/azure-functions-ux/issues/499
- The Graph API issue is probably not specific to Functions, but it is a bit easier with the Authentication/Authorization feature. Can you provide more detail?
- You can install npm packages at the "root" of your Function and not reinstall them for each Function, just like a normal Node.js app - it walks the directories.
- Our documentation is much better now, and we even have documentation for all bindings in the portal. We also have much better conceptual docs on bindings, see https://docs.microsoft.com/en-us/azure/azure-functions/funct.... We’d welcome any specific feedback on docs that are missing.
- CI should be faster now, it usually takes about 2-3 minutes for commits to show up. It’s fast enough that I’ve demo’d it.
- You can now run locally and debug using the Azure Functions Core Tools (npm i -g azure-functions-core-tools; func init; func host start). See docs: https://docs.microsoft.com/en-us/azure/azure-functions/funct.... This is something that our users always praise us for. We support C# debugging with Visual Studio and JavaScript debugging with VSCode.
- Logs definitely weren't great. Initially, they always went to table storage, but the ones you see streaming in the portal get written to disk to enable the realtime portal stream - they are only written to disk when you're in the portal, so they are "sometimes" there. The good news is that we've tightly integrated Application Insights, which means logs are easy to find. It's easy to alert on failed functions. You can see perf and metric data all in one place without log parsing. For a demo, go to the 6 minute mark of this video: https://www.youtube.com/watch?v=TgB-fs1hwlw&t=6m
Thanks for such a comprehensive follow-up. It's true that it has been a while, and it really does sound like you've addressed nearly all of the issues I had. (btw, the undocumented file was host.json, which it appears may be better documented now)