Hacker News new | past | comments | ask | show | jobs | submit login

Looks really cool, but in production systems, won't the trace files proliferate at extreme speed? How would you correlate the files to a certain session for user identification for example?



We are also planning to develop a distributed tracing platform, similar to Jaeger and OpenTelemetry, that continuously records the execution of many distributed processes (e.g. micro-services).

Unlike the existing platforms, which capture only message flows and require you to make educated guesses when some anomaly is observed, our system will let you accurately replay the processing code for each message to quickly identify the root cause for the anomaly.

This would rely on our ability to jump to the specific moment in time when a certain incoming message starts being processed. This moment can be identified either by a log line with a specific format or by a call to some special tracking function (e.g. track_incoming_message(request_id)).

For the system languages, the RR[1] recordings try to be practical by capturing only the non-deterministic events in the program execution. You can pair this with a ring buffer that discards the data after a certain retention period.

For the scripting languages(or any implementation using the db-like traces) we might add some advanced record filtering options.

(But maybe we are misunderstanding the question?)

1: https://rr-project.org/


You can not just discard the oldest data of a long-running execution trace when doing replay-based time-travel debugging.

You can not replay execution without a known state followed by all non-determinism after that state which is most easily done by starting from the initial state. To discard data, you need to manifest a state snapshot corresponding to that time to enable forward reconstruction from that state.


you're right: in the RR case: currently this is not merged yet, but a RR contributor works on persistent checkpoints; they can act as snapshots


Especially since the trace files are in .json. [0]

[0] https://github.com/metacraft-labs/runtime_tracing#format


True! The next major version of the format should use a more optimized format, as mentioned.

However, some of the important optimizations, that we're preparing are not related so much to the format, but to record more specific things and reconstruct more in the postprocessing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: