Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This would be super timely for me if it supported Python, but even without that it's interesting. Coming from the world of servers (virtual or physical) I find the logging latency and the lack of "observability" in general quite maddening in Lambda.

Two things I wonder about this:

1) The "no code changes" part implies it's adding another layer of code somewhere, which means another thing to trust and probably not be able to audit, which is not automatically OK outside the "move fast break things" world. Am I misreading it?

2) How can you add metrics with "no overhead?" Surely there must be some overhead... and then it would be nice to be able to measure how much.

For me personally, #1 is a real issue and #2 is just hypothetical, but I would expect it to be the other way around for a fast-growing startup.



You are right about lack of observability tools for AWS Lambda. Thundra was born because of same maddening at our side. You have very solid questions that we are enthusiastic to tell more and more:

1) Automatic instrumentation is what creates no code change. Once you add Thundra to your environment variables, you can change the monitoring settings with annotation or additional environment variables with no code change for Java. For Node.js and Go, You simply wrap your functions with our agents. With manual instrumentation on the other hand, you can also add code blocks to inspect your variables.

You can check our codes here: https://github.com/thundra-io. We are open to share more, if you want any further questions.

2) Zero overhead is one of other strong points of us. You need to switch to async. monitoring for this. This means you need to add "our" lambda to your environment variables (Please check for it: https://github.com/thundra-io/serverless-plugin-thundra-moni...). Then this lambda sends logs of your function to us. Overhead of sending and retrying for failed logs will not happen. Only overhead can occur because of make your code to gather more logs for our lambda to read. But this is truly negligible.


Hi,

1) The "no code changes" means that, no code change is required by you. Because Thundra makes it for you automatically. This is not the new approach for existing APM and instrumentation tools but new for AWS Lambda which makes Thundra unique (and first) here at the moment. Additionally, Thundra can make method level CPU profiling without instrumenting the user code. It is supported by Java at the moment but we will support it for other languages Node.js, Go, Python soon.

2) "No overhead" means that almost no overhead. As far as we measured it is just a few microseconds in average (note that it is not milliseconds) for executing our interceptors and printing trace data to console. In respect to publishing monitor data outside of container synchronously, publishing them over CloudWatch logs asynchronously can be considered as no overhead in practice


Re: Lambda

I thought aws integrated Lambda with X Ray. Is that enough for your use cases along with Cloudwatch?

https://docs.aws.amazon.com/xray/latest/devguide/xray-servic...

Re: No overhead

I think it implies little overhead and no overhead if your servers is busy performing actual requests. Think UDP based communication and temporarily disabling or batching traces when the network or server is busy. That said, I don't know how Thundra does it.

Also see:

opencensus.io

https://research.google.com/pubs/pub36356.html


If you're in a rush, IOpipe[1] already has Python support offering profiling, tracing, and other debugging and observability tools for AWS Lambda. Co-founder of IOpipe here, feel free to ask anything.

[1] - https://www.iopipe.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: