It is as complicated as you want or need it to be. You can avoid any magic and stick to a subset that is easy to reason about and brings the most value in your context.
For our team, it is very simple:
* we use a library send traces and traces only[0]. They bring the most value for observing applications and can contain all the data the other types can contain. Basically hash-maps vs strings and floats.
* we use manual instrumentation as opposed to automatic - we are deliberate in what we observe and have great understand of what emits the spans. We have naming conventions that match our code organization.
* we use two different backends - an affordable 3rd party service and an all-on-one Jaeger install (just run 1 executable or docker container) that doesn't save the spans on disk for local development. The second is mostly for piece of mind of team members that they are not going to flood the third party service.
[0] We have a previous setup to monitor infrastructure and in our case we don't see a lot of value of ingesting all the infrastructure logs and metrics. I think it is early days for OTEL metrics and logs, but the vendors don't tell you this.
"It might help to go over a non-exhaustive list of things the offical SDK handles that our little learning library doesn’t:
- Buffer and batch outgoing telemetry data in a more efficient format. Don’t send one-span-per-http request in production. Your vendor will want to have words."
- Gracefully handle errors, wrap this library around your core functionality at your own peril"
Maybe the confusion here is in comparing different things.
The InfluxData docs you're linking to are similar to Observability vendor docs, which do indeed amount to "here's the endpoint, plug it in here, add this API key, tada".
But OpenTelemetry isn't an observability vendor. You can send to an OpenTelemetry Collector (and the act of sending is simple), but you also need to stand that thing up and run it yourself. There's a lot of good reasons to do that, but if you don't need to run infrastructure right now then it's a lot simpler to just send directly to a backend.
Would it be more helpful if the docs on OTel spelled this out more clearly?
The problem is ecosystem wide - the documentation starts at 8/10 and is written for observability nerds where easy things are hard, and hard things are slightly harder.
I understand the role that all the different parts of OTel plays in the ecosystem vs InfluxDB, but if you pay attention to that documentation page, it starts off with the easiest thing (here's how you manually send one metric), and then ramps up the capabilities and functionality from here. OTel docs slam you straight into "here's a complete observaility stack for logs, metrics, and traces for your whole k8s deployment".
However, since OTel is not a backend, there's no pluggable endpoint + API key you can just start sending to. Since you were comparing the relative difficulties of sending data to a backend, that's why I responded in kind.
I do agree that it's more complicated, there's no argument there. And the docs have a very long way to go to highlight easier ways to do things and ramp up in complexity. There's also a lot more to document since OTel is for a wider audience of people, many of whom have different priorities. A group not talked about much in this thread is ops folks who are more concerned with getting a base level of instrumentation across a fleet of services, normalizing that data centrally, pulling in from external sources, and making sure all the right keys for common fields are named the right way. OTel has robust tools for (and must document) these use cases as well. And since most of us who work on it do so in spare time, or a part-time capacity at work, it's difficult to cover it all.
First time it takes 5 minutes to setup locally, from then on you just run the command in a separate terminal tab (or Docker container, they have an image too).
I did not find that manual instrumentation made things simpler. You’re trading a learning curve that now starts way before you can demonstrate results for a clearer understanding of the performance penalties of using this Rube Goldberg machine.
Otel may be okay for a green field project but turning this thing on in a production service that already had telemetry felt like replacing a tire on a moving vehicle.
My whole career I’ve been watching people on greenfield projects looking down on devs on already successful products for not using some tool they’ve discovered, missing the fact that their tool only functions if you build your whole product around the exact mental model of the tool (green field).
Wisdom is learning to watch for people obviously working on brownfield projects espousing a tool. Like moving from VMs to Docker. Ansible to Kubernetes (maybe not the best example). They can have a faster adoption cycle and more staying power.
... if (and only if) all the libraries you use also stick to that subset, yea. That is overwhelmingly not true in my experience. And the article shows a nice concrete example of why.
For green-field projects which use nothing but otel and no non-otel frameworks, yea. I can believe it's nice. But I definitely do not live in that world yet.
For our team, it is very simple:
* we use a library send traces and traces only[0]. They bring the most value for observing applications and can contain all the data the other types can contain. Basically hash-maps vs strings and floats.
* we use manual instrumentation as opposed to automatic - we are deliberate in what we observe and have great understand of what emits the spans. We have naming conventions that match our code organization.
* we use two different backends - an affordable 3rd party service and an all-on-one Jaeger install (just run 1 executable or docker container) that doesn't save the spans on disk for local development. The second is mostly for piece of mind of team members that they are not going to flood the third party service.
[0] We have a previous setup to monitor infrastructure and in our case we don't see a lot of value of ingesting all the infrastructure logs and metrics. I think it is early days for OTEL metrics and logs, but the vendors don't tell you this.