Can I have a smaller Prometheus

mperham · on Jan 29, 2022

I maintain Faktory, a complex background job system written in Go. I keep the dependencies to a minimum and use `upx` to compress the built binary...

...for a grand total binary size of 5 MB.

So many modern systems are huge because of the complex dependency tree they pull in. My entire binary likely fits within the L3 cache of the CPU you are using.

liftm · on Jan 29, 2022

The first thing a 5 MB upx-compressed binary does when executed is uncompress itself to 15~30 MB of memory, right? So what does the on-disk binary size have to do with my L3 cache?

kimixa · on Jan 29, 2022

Yup, upx is just a loading stub that decompresses the whole thing then gets out the way.

If anything it'll use more memory from the remains of the decompression stub and not being able to be clever about only reading from the backing file needed memory pages.

mperham · on Jan 29, 2022

Uncompressed is less than 10MB.

morelisp · on Jan 29, 2022

So instead of 10MB you're paying 15MB? Seems like UPX is an unnecessary dependency.

8note · on Jan 29, 2022

Does that improve the usage of it? Have you experimented with larger sizes?

rezonant · on Jan 29, 2022

Thank you for Sidekiq, Mike :-D

NelsonMinar · on Jan 28, 2022

For comparison librrd is a 0.4MB .so; the entire rrdtool userspace is about 1MB. Yes, I realize librrd is much simpler and less featureful than Prometheus. But it's an interesting comparable for what a small, old utility that does something similar could be.

SuperQue · on Jan 29, 2022

Yup, it's tiny and simple. But the goals are wildly different.

The same Prometheus binary you can run on a Pi scales to millions of series and millions of samples per second.

The Prometheus TSDB itself is only one part of a the larger system. But, compared to librrd, it's vastly more functional.

Besides the scaling I mentioned

* It's ACID compliant.

* It has WAL for reliability.

* It has CPU and memory efficient compression.

* It has an efficient mmap-based data loader.

RRDtool, while efficient from a '90s perspective, is a toy by comparison. And yes, I've used rrdtool. Back in old-school days when Cacti was the new hot shit compared to MRTG.

petre · on Jan 29, 2022

Yet RRD databases are awfully big but often compress to single digit percentages. We use them to store one time graph data, so they do not ever get updated after the report job finishes. I'm looking for an alternative and it better not be a huge json array of arrays plus a header dictionary. It also should be an on disk something, not Influx or Prometheus.

SuperQue · on Jan 29, 2022

You can use the Prometheus TSDB package, it's easy to import and use. Plus it's designed around mmap, so the kernel deals with the caching.

You don't have to use a whole Prometheus server to make something like this work.

simonmales · on Jan 29, 2022

I only know RRD from Munin graphs, and now have a toy idea to used it directly to render a SVG.

If disk space is an issue, do you or have you considered decompressing on the fly ?

chrismorgan · on Jan 29, 2022

Or use a file system that supports transparent compression (e.g. btrfs).

wejick · on Jan 29, 2022

I just knew about this, good stuff. Many projects I used are using this one.

dewey · on Jan 28, 2022

In which use case where Prometheus is used does it matter if the binary is 103MB or 2MB?

ori_b · on Jan 29, 2022

In the case where some rarely used feature is exploitable via some misconfiguration.

See also: log4j

dylan604 · on Jan 29, 2022

This is the thing. Is all of that 105MB vs 2MB of stuff the devs actually know exactly what it is and why it's there? As others have stated, the dependancy fluff is just a multitude of footguns cocked, locked, and ready to rock your world

tgv · on Jan 29, 2022

But how? I've got prometheus running in its own, non-root account. That makes exploiting vulnerabilities nearly useless, unless you can find a way to get it to spawn a process which can sudo. It runs behind nginx, which has a simple name/pwd protection on it. That makes it very hard to exploit. And prometheus only runs on one server; the others run node_exporter, or publish app data.

What am I missing?

zufallsheld · on Jan 29, 2022

You described multiple layers of security. Removing dependencies and stripping down code is the same: another layer of security.

ori_b · on Jan 29, 2022

> What am I missing?

If I knew, it would have a CVE number.

However, the less code you have, the fewer the places there are where you need to ask that question.

immibis · on Jan 29, 2022

None, but it's an interesting case study of a widespread concern (not necessarily problem)

atmosx · on Jan 28, 2022

I don't think anyone cares. They could make the binary 15kb, no one would notice. Most of the times runs inside a pod that will feature a 500+ MB operating system anyway...

valyala · on Jan 31, 2022

Go applications can be built into static self-contained binaries, which don't need extra dependencies. These binaries can run inside `scratch` Docker container, which doesn't take additional space. So it is easy to create small Docker images (less than 10Mb) with such binaries. See, for example, the following article - https://valyala.medium.com/stripping-dependency-bloat-in-vic...

pjmlp · on Jan 29, 2022

They would notice on their Kubernetes storage accounting.

atmosx · on Jan 29, 2022

Ah no :-P

jdalsgaard · on Jan 28, 2022

> Prometheus alternative

Well... if size of the executable is really a concern, perhaps Victoria Metrics is worth considering; my amd64 executable is about 17MiB in size.

fullsend · on Jan 29, 2022

Everyone should consider Victoria Metrics anyway. It scales better performance wise, and they broke out components to improve scalability (vmagent, vmalert, etc) when Prometheus was just one huge process that did all the things. The two work closely together, and even did a good talk together about the differences.

pphysch · on Jan 29, 2022

I love Prometheus because the (OpenMetrics) data protocol is so darn simple and easy to grok. You can do things like take an arbitrary data source, pipe it through awk and curl, and get it into prometheus metrics via remotewrite. You can also easily write your own /metrics endpoint in your favorite language.

VictoriaMetrics sweetens the deal by offering a solution to long-term storage and more flexible service architecture without leaving the simple and highly interopable Prometheus ecosystem.

madushan1000 · on Jan 29, 2022

Not only that, we were able to reduce the total virtual machine ram where our monitoring was hosted by half and storage is more efficient too I think when we switched to victoriametrics.

mappu · on Jan 29, 2022

If only service-discovery features are used from the entire SDK, then the real story here is that Go's DCE passes aren't strong enough - or the SDK is coupled in a way that makes DCE not work (overuse of reflection?).

jrockway · on Jan 29, 2022

Reflection in general isn't the problem, but if you specifically use reflection to look up a method in your program, then dead code elimination gives up. https://go.dev/src/cmd/link/internal/ld/deadcode.go

Taught to me on HN recently: https://news.ycombinator.com/item?id=30041763

nhoughto · on Jan 29, 2022

Interesting detail, didn’t realize go dead code analysis was this easily made ineffective. 52MB for ec2 SDK is a big chunk of code that must be mostly dead/unused.

Seems like eventually this should become a priority for library writers to support dead code analysis, otherwise going to get worse and worse..

yencabulator · on Jan 29, 2022

These SDKs typically hang everything off of the same Client type, and since methods can be reached via reflect I'd guess none of the exported API is safe to drop. The SDKs would need to be written to be much more modular.

Those organizations also tend to spew a lot of code, most of it quite tangled together. They're definitely the largest dependencies I regularly see.

Too · on Jan 29, 2022

Not only are they spewing out a lot of manual code. Additionally they include a lot of auto generated and versioned code. Azure python for example has a duplicate of every method for every version of the api.

barsonme · on Jan 29, 2022

IIRC it’s because prom imports protobufs which use either Method or MethodByName, both of which put the linker in conservative mode.

physicles · on Jan 29, 2022

We were using Prometheus for metrics (for k8s services), but switched to influxdb because we were using it to store other non-metrics data, and it’s nice not to have two time series databases with two different query languages you need to learn.

We went from using Prometheus itself to scrape metrics from services and export them to influxdb (fine, but very heavy), to using kapacitor (an utter nightmare for this particular use case), to just writing our own Prometheus metrics harvester from scratch (took about 3 days with service discovery). The current solution has been in place for more than a year and it’s perfect.

I love influxdb’s on-disk compression, but its query planner and unpredictable ram usage leave a lot to be desired (at least the 1.x versions).

halfmatthalfcat · on Jan 29, 2022

I just moved companies from a pure Prometheus stack to an Influx one. Kapacitor is needlessly complex and overkill for what most people need.

There are THREE different ways to query data: InfluxQL, Flux and TICKScripts…all inferior to PromQL imo. The worse is Influx encourages you to mix them (e.g. PromQL in TICK), causing even more confusion. Documentation on advanced use cases is non existent.

Been an absolute nightmare tbh. The new company had a similar evolution too: holding non-metrics data in influx while also holding metrics. Trying to at least move metrics onto a Prom/Thanos stack here soon.

dgnorton · on Jan 29, 2022

Hi halfmatthalfcat, InfluxDB team member here. I wouldn't say we encourage mixing InfluxQL, Flux, and TICK scripts. We encourage new use cases to use Flux and existing use cases to start migrating to Flux when possible. Allowing the three to interact enables existing users to migrate gradually and take advantage of newer features before they're fully migrated.

Regarding docs on advanced use cases, if you haven't already, try posting questions in the community forum: https://community.influxdata.com. Or, if you prefer Slack, there's a link at the top of that page to join our community Slack. We do our best to help with specific issues in those spaces and we also look for common themes that are causing problems for multiple users so that we can focus our efforts there, whether that's bug fixes, performance, features, or docs.

dgnorton · on Jan 29, 2022

Hi physicles, member of the InfluxDB team here. What version of 1.x are you running? Also curious to know if you've tried 2.x.

physicles · on Jan 29, 2022

We're still on 1.7.x and using TSM (not TSI), because we've found that to work well enough for the time being. Haven't tried 2.x yet because the investment needed to properly benchmark and upgrade would probably be around a week at our scale, and there are just more pressing things to do.

To be fair, we still haven't paid you guys a dime so I can't complain. But if you're interested in hearing my thoughts I can email you after next week's holiday.

dgnorton · on Jan 29, 2022

I haven't looked but the latest 1.7.x is probably close to 2 years old. There have been quite a few improvements since 1.7 so you might see some improvements by upgrading to the latest 1.x.

Minor nit on "TSM (not TSI)" - TSM (.tsm files) are the data files and that format hasn't changed. TSI is the newer, although mature at this point, indexing option that spills onto disk and can therefore be larger than the original in-mem index. You're probably using in-mem indexing.

We're always interested in feedback. My name is david. You can email me at <name> at influxdata dot com.

hughrr · on Jan 28, 2022

Yeah that’s only unused until you need it at which point it doesn’t involve futzing with anything.

One of the things that kills me is running fluentd because you have to fuck around with ruby gems in containers every two minutes to get it to do something reasonable.

This is pain. Prom is not.

tobyjsullivan · on Jan 28, 2022

Great work on the part of the author. Pareto principal holds. Often all it takes is one person motivated enough to look for efficiency opportunities.

As for next steps, I can't imagine the Prometheus crew would object to a proposal + PR to make the Service Discovery an optional add-on in the next major version. It does open a can of worms around how such an add-on would be distributed if not built into the binary. (Caveat: I have no familiarity with this particular project or its unique constraints or goals.)

rob74 · on Jan 28, 2022

Easy: they can provide an option to remove the SD functionality at compile time, and if you really care about the executable size, you can compile the code with this option (and `-ldflags="-s -w"`). The standard build would still be the "all batteries included" one to avoid support issues (people downloading the smaller binary and then asking why SD isn't working).

mongrelion · on Jan 29, 2022

Caddy for example does this on their website. You can pick whatever addon you want and you'll be able to download a binary with exactly what you want.

SuperQue · on Jan 29, 2022

We've already talked about it at length, for many years. There's an approved development proposal to implement it. But, nobody has stepped up to contribute the code.

The main problem is, Go doesn't have any kind of reasonable loadable library system.

The current proposal is to make it easier to do compile-time plugins, similar to how Caddy and CoreDNS do things.

We don't even need a major version to add such a thing. Just the time to write the feature. The thing is, it's just not that important an improvement. When you are running Prometheus in a production environment, it will end up using gigabytes of memory and disk space to operate. The savings of a few megabytes of binary and runtime memory are just not that important.

akireu · on Jan 28, 2022

On a side note, Prometheus seems to be built for bloat. AFAIK, it isn't even designed to consume metrics other than from apps linked to its client library. It's like a microservice, but with the footprint of an operating system.

hughrr · on Jan 28, 2022

It’s a database engine not a microservice and needs to be treated along the lines of postgresql etc.

yencabulator · on Jan 29, 2022

  podman run postgres:14 du -sh /usr/lib/postgresql
  24M /usr/lib/postgresql

This includes the binary.

hughrr · on Jan 29, 2022

Try running big instances in containers.

Our smallest is 32 cores and 1TB of RAM.

momothereal · on Jan 28, 2022

> AFAIK, it isn't even designed to consume metrics other than from apps linked to its client library.

Could you elaborate? I use Prometheus to scrape from an HTTP endpoint in various Pods in Kubernetes, so the service discovery is pretty useful to me.

I could see the Kubernetes & the other SDs split out of the core binary if default size is really an issue. Or are you talking about something else?

akireu · on Jan 28, 2022

I'm talking about the way I'm expected to provide metrics for my apps. Rather than exporting free-form JSON and then scripting Prometheus to understand it, I'm expected to use a custom client library to export the metrics. As for Kubernetes, you can only use it with Prometheus because of not insignificant amount of work on both sides. Basically, the latter is designed for vendor lock-in.

emidoots · on Jan 28, 2022

What a bizarre claim.

Prometheus scrapes the same text format as OpenMetrics 1.0 and over 700 public exporters use this format, and there are TONS of other non-Prometheus software that consume the exact same text format. Prometheus's biggest competitor, Datadog (which is not open source mind you), consumes it too. I think even Grafana consumes it directly. It's becoming an IETF standard[0].

Would I have preferred JSON over a custom text format like this? Yeah. But to claim an open source project like Prometheus with effectively no business at all is using a text format like this to have vendor lock-in? That's quite a stretch.

[0] https://github.com/OpenObservability/OpenMetrics/blob/main/s...

morelisp · on Jan 28, 2022

> Prometheus scrapes the same text format as OpenMetrics 1.0

I find the GP's claims weird - I've written a relative ton of collectors, exporters, and translators and the format is pretty OK, not worse than most that came before it and better than lots - but I think this relationship is backwards. Prometheus "scrapes OpenMetrics" because OpenMetrics was formal documentation of what Prometheus was already doing for years.

I would not have preferred JSON. That an exposed metric is also a query is also pretty close to a schematic definition is nice.

akireu · on Jan 28, 2022

I apologize for my mistake, then. My understanding was based on reading the Prometheus docs on making exporters alone - something I needed urgently for a job.

emidoots · on Jan 29, 2022

no apology needed, I am sorry the world has a culture of mistakes being a bad thing.

jrockway · on Jan 29, 2022

Include the client library if you want, but the wire format is ridiculously simple. I'll implement it from memory in a HN comment.

    http.HandleFunc("/metrics", func(w http.ResponseWriter, req *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Header().Add("content-type", "text/plain")
        w.Write([]byte("# HELP foo_bar The numbers of foos barred.\n# TYPE foo_bar counter\nfoo_bar 42\n"))
    })

The client library is largely to keep track of running counters (and gauges, histograms, etc.), with a small amount of code to actually report those metrics when scraped. It's a very simple format.

morelisp · on Jan 29, 2022

A good example of it looking simple but having annoying corner cases.

The content type MUST be: application/openmetrics-text; version=1.0.0; charset=utf-8

gempir · on Jan 28, 2022

Prometheus follows the OpenMetrics standard I'm not sure what you find propietary about that or specific to prometheus.

https://github.com/OpenObservability/OpenMetrics/blob/main/s...

momothereal · on Jan 28, 2022

To be precise it was the other way around. OpenMetrics is a standardization effort for the format Prometheus made up.

However Prometheus was designed before JSON was standardized itself, so I'm just glad they didn't choose XML!

lokar · on Jan 28, 2022

And the Prometheus format is a copy of the "varz" format

SuperQue · on Jan 29, 2022

Sorta, I would say it's an evolution of varz.

IIRC (it's been almost a decade since I used varz), having multiple label values would be a map of maps in varz. It got quite ugly if you wanted to have a number of dimensions.

momothereal · on Jan 28, 2022

Ah ok, I see what you mean.

The other commenters have pointed out that it _is_ based on another open standard, but admittedly one less common than say, JSON. So you'll generally have to implement your own metrics producer or use a client library, that's true.

However it's also a dead simple format and you can probably implement it with a for-loop or a shell script.

SuperQue · on Jan 29, 2022

Prometheus supported a JSON representation in the beginning. It was deprecated and removed before 1.0. The current exposition format was created because it cut CPU and memory for scraping metrics in half.

JSON, especially free-form JSON, is not a good format for efficient metrics monitoring.

NikolaeVarius · on Jan 28, 2022

The prometheus format is literally just a text page. It dead simple to implement

rixed · on Jan 29, 2022

The design consideration was not that it had to be simple to implement. It's that it had to be easy to parse by a human during an outage when nothing else works.

SuperQue · on Jan 29, 2022

It's a little bit of both. Simple to implement, simple (and fast/cheap) to parse, and human readable.

morelisp · on Jan 28, 2022

There are a frustrating number of fundamental corner cases due variance to floating point text formats, and slightly more in the descriptor if you also need that. It's simple to implement an expositor for a limited set of cases. As usual, it's much more difficult to parse what you actually find in the world.

morelisp · on Jan 29, 2022

Maybe I'm being downvoted for insufficient examples? OK, here's a big one:

OpenMetrics's production rule for the format says:

    labels = "{" [label *(COMMA label)] "}"

And yet, the Prometheus Java client library exports a trailing comma with no subsequent label.

As for fp, I've seen parsers break on `e` v. `E`, and `NaN`/`Inf`/ vs. `nan`/`inf`. The latest IETF draft even has a comment,

     ; Not 100% sure this captures all float corner cases.

Control characters are allowed in descriptors and label values and no normalization form is specified.

SuperQue · on Jan 29, 2022

Yea, there still some corner cases and implementation bugs out there. We spent months deliberating how to deal with some of these. Because the base libraries in some languages just don't produce string output from IEEE 754 the same way.

IIRC, Java is different from Python is different from Go. So, really, this is a standardization in languages problem. We tried to work around these as best we could in the OM format.

atmosx · on Jan 28, 2022

> It's like a microservice, but with the footprint of an operating system.

What? It's 100mb vs 25mb, like 75MB of data more... Who cares about a binary being 75MB larger in 2022?

jasonjayr · on Jan 29, 2022

This is a frustrating position.

Everything is layers. We build things on top of other things. If every layer had that attitude, then the bloat would be enormous. It's already getting there.

We should praise judicious effort into optimizing any of the resources used in the systems build, at every layer.

atmosx · on Jan 29, 2022

I understand where you're getting at. IMO you're barking at the wrong tree: the problem with bloatware will not be solved by prometheus shipping a lighter statically linked binary.

rob_c · on Jan 28, 2022

I suspect this is 70%+ of all features of all tools remain undiscovered by the users.

uaas · on Jan 29, 2022

Your use-case is not completely clear to me based on the article, but you might be better off with Prometheus’ agent approach, introduced recently: https://prometheus.io/docs/prometheus/latest/feature_flags/#...

fnlurkr · on Jan 29, 2022

Maybe plugins are the way to go? https://medium.com/learning-the-go-programming-language/writ...

moondev · on Jan 29, 2022

How does the author determine how "most" people use prometheus?

contravariant · on Jan 29, 2022

Simple, you notice it's capable of communicating with lots of mutually exclusive cloud services and note that it could be smaller if you remove some of the relevant dependencies.

Now whether that's a particularly useful observation I'm still not sure.

aliswe · on Jan 29, 2022

these software development kits are to a large extent a strongly typed representation of the REST API graph.

Even though the application might only need two or three endpoints in kubernetes - which would be trivial to implement in go in just a couple of lines - they favor strong typing and include the SDK which is several megabytes. And the same for AWS, Azure, ...

I'm not passing any judgment here by the way.

rektide · on Jan 28, 2022

i'd like to see memory usage differences, load time & runtime performance impacts. i expect most of these to be small but i expect some impact.

also just worth oting that the memory impact of statically compiling in general is probably massive. most systems probably would have a good percent of these libraries in memory already if promtheus were using dynamic linking.

SuperQue · on Jan 29, 2022

I've done this test in the past. The typical footprint of default Prometheus is around 100MiB of RSS.

Removing everything but flie and static configs reduces it to about 50MiB.

Interesting for more embedded use cases, but not really a big deal when you're using a few GiB of memory for TSDB ingestion buffering.

jrockway · on Jan 29, 2022

Now that Go has reflection in the upcoming 1.18 release, most HN comments about Go relate to binary size. Here we are again.

esprehn · on Jan 29, 2022

Do you mean generics? Go has had reflection for a long time.

jrockway · on Jan 29, 2022

Yup! Long day. And I just wrote a more substantive comment about reflection...

dekhn · on Jan 29, 2022

Why optimize a binary that's 109MB? That's too small to matter.

immibis · on Jan 29, 2022

30 years later:

" Why optimize a binary that's 109GB? That's too small to matter."

dekhn · on Jan 29, 2022

I mean, for current computers (or even my 10 year old server), 128MB is so small that it's not worth optimizing. My $25 raspberry pis can run this without any problems while also running a bunch of other programs.

my first linux computer had 4MB of RAM but that doesnt' mean I try to fit anything into that (once I upgraded to 32MB, I could run g++, emacs, X11 and xterm at the same time!)

rixed · on Jan 29, 2022

Why did you upgrade back then?

Because you wanted to be able to run more stuff, or because you wanted to be able to run the exact same executables, just bigger?

dekhn · on Jan 29, 2022

Less paging. I could swap but it was slow.

basically nobody is swapping because of a 128MB executable. if you are, get more ram or don't run prometheus.

immibis · on Jan 31, 2022

Less paging. What causes paging? Programs using lots of memory.

dralley · on Jan 29, 2022

A binary that doesn't fit in cache isn't too small to matter.

dekhn · on Jan 29, 2022

What's the page size on linux. Are executables (even statically linked ones) demand paged? How much of the executable that you don't use is paged in when you don't use it?

SuperQue · on Jan 29, 2022

That's not running binaries works.

Only the parts of the code that are in use are paged into the page cache. So if you only use a couple of the features, it fits in cache just fine.

djmetzle · on Jan 29, 2022

Outstanding!

whateveracct · on Jan 28, 2022

Clutching pearls about binary size is and always will be hilarious to me.

dang · on Jan 29, 2022

Please keep snark, name-calling, shallow dismissal, and supercilious putdowns off this site. We're trying to avoid all of that here.

https://news.ycombinator.com/newsguidelines.html

Edit: we've had to ask you repeatedly to follow the site guidelines. Could you please review them and start following them now?

whateveracct · on Feb 4, 2022

ACK :)

akireu · on Jan 28, 2022

It's all fun and games until you're stuck for a hour downloading 600MB of updated packages over a metered LTE. The same is with RAM usage: 512MB was enough for a phone back in 2014, now a smart TV with 2GB is barely capable of multitasking. Sure, binary sizes don't matter in most contexts. But when they do, it's a PITA.

LimaBearz · on Jan 28, 2022

Sure, but we're talking about an application written for a cloud/hosted environment in a datacenter somewhere. nicking at the size of a statically linked binary meant for production grade environments with fast computers and fat pipes feels overly pedantic no? Especially when we're talking about a mere 100MB

immibis · on Jan 29, 2022

All the more reason! On the cloud, you're often paying per kilobyte.

apetresc · on Jan 29, 2022

Not for ingress, and certainly not for packages that are mirrored from the cloud vendor's own repositories (which Prometheus absolutely is).

jayd16 · on Jan 28, 2022

Galaxy S5 from 2014 had 2GB and that was 1080p vs 4k texture sizes for today. Seems on par.

akireu · on Jan 28, 2022

What you're kind of missing is that the S5 was a flagship phone. Generally, one has to save for more than a month to afford a purchase like that. The idea of working an extra month so that some FAANG prick meets their KPI by cutting corners on optimization doesn't even look like feudalism. It looks like idiocracy. Paying the lip service of fat shaming code bloat is the cost-effective option by comparison :)

asiachick · on Jan 28, 2022

What does FAANG have to do with this?

Don't FANNG people obsess over bloat because they're trying to reach billions of customers? It might not seem that way since their pages are bigger but I'd be surprised if they were happy to leave 10s of millions of customers on the table.

akireu · on Jan 28, 2022

They're just poster children for the particular brand of disdain $100k+/year "tech workers" bear for their users: they make enough for the shiniest of toys, so they're too far above spending their valuable time to make their software run smooth on our $100 crap phones. Nevermind that each Fb client update likely produces hundreds of tons of toxic trash called gadgets. Sure, sometimes they do optimizations. Generally, though, both Fb and Google keep exploring the physical limits to code bloat. Remember that one time that Fb hit the JVM class count limit?

ysleepy · on Jan 28, 2022

FAANG are the worst offenders. Didn't facebook employ ungodly hacks to unload/load parts of the android app to navigate around the 65k method limit of dex? Have you looked at the js monstrosity of the Google hardware shop website?

jayd16 · on Jan 28, 2022

You misunderstand the point. You're comparing a 1080p phone to a 4k television when texture memory is what will take up the vast majority of ram. Code footprint is pretty irrelevant.

Still the TV does fine with 2GB. Doesn't seem fair to complain.

akireu · on Jan 28, 2022

I wasn't speaking of a 4k TV, but still, this doesn't check out. A single 2160p framebuffer is 8MPix, or 32MiB. Not counting the original FB size, the extra 1.5GiB are enough for 48 whole framebuffers. You don't need that much image data all at once, the number is ridiculous. No, I believe it's just that the code became that much less efficient.

jayd16 · on Jan 29, 2022

Think of each app and all the texture content that needs to be loaded. App textures get 4x as big, all things being equal. You see a 4x change in ram across those devices.

8note · on Jan 29, 2022

Would you pay an extra 20k for your tv so it could have 500MB of memory and have all it's apps work?

mdoms · on Jan 29, 2022

I can guarantee 100% my Prometheus instance will never be running on metered LTE. If such a situation arises then my operational metrics are the least of my concern.

jeppesen-io · on Jan 28, 2022

This one does not even make sense - 100 megs for a binary for centralized metrics? Who would even notice next to the OS and metrics storage.

By design you should not install prometheus on every server you monitor - it's designed to scrape metrics

Its a database, webui with support for email, webhooks, slack, pagerduty, aws api and many others. 100megs does not sound like a lot for all Pormetheus provides

0xbadcafebee · on Jan 28, 2022

It correlates to performance, speed to iterate, security, and design complexity, but ok

8note · on Jan 29, 2022

I'm unclear that it correlates to iteration speed or design complexity.

Actually, performance too.

dralley · on Jan 29, 2022

More compact code fits in caches better.

Zababa · on Jan 29, 2022

Any hard data on any of that?

goodpoint · on Jan 29, 2022

Unless a fat binary embeds pictures and some music, it's all CPU instructions.

Tenths of megabytes of CPU instructions is complexity.

This kind of bloat is the number one enemy of security, as any security engineer could confirm.

0xbadcafebee · on Jan 29, 2022

Sure, lots, go look for studies on estimation of defects based on LOC and project size/complexity (they go back to the 1970s). But you don't need to look, the principles are simple.

Unless an application is filled with JPEGs or uncompressed arbitrary data files, its size reflects lines of code (machine code, interpreted code, etc). Bigger the app, the more lines of code.

Every line of code has a non-zero bug probability. Every new line of code increases probability. More lines of code, higher probability. Bugs include security bugs; higher probability of bugs, higher probability of security bugs.

CPU cache is finite. Only so many lines of code can be cached or optimized. Larger size takes up more room in memory, which when combined with lot of other gigantic apps, means less memory for heap space, disk cache, etc. Larger size also takes up more room on disk, which adds up when you don't delete old builds on disk and loop over a build process. Since larger size means more lines of code, that means longer compile times, which means longer wait every time you change a line and need to recompile, copy an artifact somewhere, retest.

More lines of code means more code executed. If you have 10 lines of code in a function, and you add 100 lines to it, the compiler doesn't just optimize away all 100 new lines, it's going to add more machine code and code paths. Unless you only ever add new code paths, some of that new code will extend existing code paths or add instructions, and that means more CPU cycles to complete execution. (Same concept for interpreted code)

More lines of code means more code paths. More code paths increases complexity. The more code paths, the longer and more difficult testing gets to the point you can't even develop enough tests to cover all the code paths, so it's impossible to even find all the bugs. More complexity leads to difficulty in humans understanding and working with the codebase, and difficulty in understanding leads to slower and more error-prone development.

Larger means more network bandwidth, meaning file transfers take longer, increasing speed to iterate and producing worse UX. If people download your app every 10 minutes in their CI/CD pipeline, larger size means more network bandwidth used. "Free" CDNs have limits; the larger a project gets, the more file size affects network performance, reliability, and cost. If you pay for bandwidth, a 100MB file costs 100x more than a 1MB file.

The more apps you use that are big, the more every one of these effects increase. One big app you might not notice. 100 big apps lead to noticeable slowness, bugs, less memory, less disk space.

foxfluff · on Jan 28, 2022

It's kind of ironic reading this comment given that at the time you posted it, I was screaming at gcc's stupid code generator for wasting bytes recreating constants that were already there in that very register! That code needs to fit in a couple hundred bytes..

And half an hour ago I was (once again) checking out hosting providers and lamenting the fact that most don't seem to offer support for loading custom ISOs so I could install a 30 megabyte distro and make the most out of the cheap plans that only offer something like 10 gigabytes of storage. Half of it is wasted after you install one of the these obese mainstream distros.

mitjam · on Jan 28, 2022

Hetzner Cloud can start instances from ISOs - here is an example for ipfire : https://wiki.ipfire.org/installation/hetzner-cloud

foxfluff · on Jan 28, 2022

I got a VPS from Hetzner last year but they decided to block my home IP. After reading some anecdotes on the internet, I had to conclude they're exactly the kind of company I want to avoid (large, opaque, they employ weird algorithms/heuristics to flat out reject customers or suddenly take down their servers, no warning, you can't get an explanation, you're just fucked, just like when Google decides to arbitrarily block you; I've been there).

IMO the point of a hosting provider is supposed to be that you can have some peace of mind and not worry about your shit breaking (that's still a worry as I continue to host everything at home). Instead with providers like this, you worry about them breaking your shit.

TheSmiddy · on Jan 29, 2022

vultr lets you install custom ISOs, can also pxeboot.

jeffbee · on Jan 28, 2022

Author doesn't even say why they object to the size. Are they aware that file-backed executables are paged on demand and only the active parts of the program will be resident?

Topgamer7 · on Jan 28, 2022

Granted these days everyone is used to applications consuming massive amounts of drive space. But perhaps they're using legacy hardware for a home lab, or a IoT device with limited disk space.

From a security stand point, reduced application code decreases risk. It was service discovery code he removed, what if it reached out to discover services on application start up, that's a potential attack vector.

8note · on Jan 29, 2022

Does it actually reduce the risk? Sure if you audit, its easier to identify the risks, but a windows 98 program is going to be full of vulnerabilities while being small. Being small doesn't remove the vulnerabilities

shoo · on Jan 28, 2022

> From a security stand point, reduced application code decreases risk. It was service discovery code he removed, what if it reached out to discover services on application start up, that's a potential attack vector.

Agreed. I've see a similar pattern with certain open source libraries.

The first example I think of is the spf13/viper [1] library, used to load configuration into go applications. Viper is equipped with code for reading config from various file formats, environment variables, as well as remote config sources such as etcd, consul. If you introduce the viper library as a dependency of your application to merely read config from environment variables and YAML files in the local filesystem, then your go application suddenly gains a bunch of transitive dependencies on modules related to remote config loading for various species of remote config provider. It's not uncommon for these kind of remote config loading dependencies to have security vulnerabilities.

As well as the potential increased attack surface if a bunch of unnecessary code to load application configuration from all manner of remote config providers ends up in your application binary [2], if you work in an environment that monitors for vulnerabilities in open source dependencies, if you depend on an open source library that drags in dozens of transitive dependencies you don't really need, it adds a fair bit of additional overhead re: detecting, investigating and patching the potential vulnerabilities.

I guess there's arguably a "Hickean" simple-vs-easy tradeoff in how such libraries are designed. The "easy" design, that makes it quick for developers to get started and achieve immediate success with a config loading library, is to include code to load config from all popular supported config sources into the default configuration of the library, reducing the amount of steps a new user has to do to get the library to work for their use case. A less easy but arguably "simpler" design might be to only include a common config-provider interface in the core module and push all config-provider-specific client/adaptor code into separate modules, and force the user to think about which config sources they want to read from and then manually add and integrate the dependencies for the corresponding modules that contain the additional code they want.

edit: there has indeed been some discussion about the proliferation of dependencies, and what to do about them, in viper's issue tracker [3] [4]

[1] https://github.com/spf13/viper [2] this may or may not actually happen, depending on which function calls you actually use and what the compiler figures out. If your application doesn't call any remote-config-provider library functions then you shouldn't expect to find any in your resulting application binary, even if the dependency is there at the coarser-grain module dependency level [3] https://github.com/spf13/viper/issues/887 [4] https://github.com/spf13/viper/issues/707

cosmotic · on Jan 28, 2022

Image pull size for a container is likely the concern. It could shave a few seconds off a regularly-run integration test. If it's run via on-demand build agents, then there's no image cache.

jeffbee · on Jan 28, 2022

If it takes multiple seconds to pull ~35MB of compressible text into your CI environment, there may be other, larger problems to solve.

cosmotic · on Jan 29, 2022

I was estimating off the speed it takes to pull images to my local computer where the limiting factor appears to be something other than my internet connection so either the image extraction process or a docker hub throttle.

wahern · on Jan 28, 2022

That only helps if the code is well segregated by usage. Looking at the ELF symbol table for prometheus-2.33.0-rc.1.linux-amd64, it's not clear to me this is the case. Not sure how it's ordered. Lexical import order? Anyhow, without profiling how could the compiler know how to order things optimally?

I think this is one of those cases where, in the absence of profiling or some other hack (e.g. ensuring all routines within a library are cleanly segregated across page boundaries within the static binary and the I/O scheduler doesn't foil your intent), dynamic linking would prove superior, at least for such large amounts of code.

wejick · on Jan 28, 2022

sorry not to make it obvious in the article, I'm planning to run it in small iot pi based device locally. So having something small and fast is preferable, however the runtime performance is a more important thing I haven't touch.

jeppesen-io · on Jan 28, 2022

Prometheus is rather efficient, but it's focus is a little different than yours. Its designed to for large scale collection of metrics, scraped from many remote endpoints

You can run it locally but the "prometheus" way for iot env would be a central prometheus server that scrapes the iot devices running a prometheus exporter, which tend to be very light weight

wejick · on Jan 28, 2022

Totally agree. Another part of it is just feeding curiosities.

SuperQue · on Jan 29, 2022

Prometheus works fine as-is on Pi devices. You'll spend most of your memory on ingestion buffering. I did the same tests as you did a while back, it only saves like 25-50MiB of memory IIRC.

The only thing you really need to worry about on a Pi is that the default kernels are still 32-bit, and are set to 2GiB kernel boundary. So you'll be limited to how much TSDB storage can be mmap'd unless you switch to a 64-bit kernel.

You may want to consider agent mode on your IoT device, and stream the data to an external server/service.

https://prometheus.io/docs/prometheus/latest/feature_flags/#...

wejick · on Jan 29, 2022

That's a good insight. Thanks

ts4z · on Jan 28, 2022

I'm curious -- is it the binary size that's a problem, or the resident size in memory? Demand paging should help, although you'd be stuck with carrying the enlarged binary.

wejick · on Jan 28, 2022

my gut feeling tell it will be both memory and cpu utilization. cant be sure, until I can find good way to measure it.

ts4z · on Feb 2, 2022

ps RSS measurement is pretty good for a start (although note in a forked process, shared copy-on-write pages are both reported as RSS for both the parent and child process).

top reports this as RES.

IIRC, debugging information is in a separate part of the process, so it's not loaded until it's used. Does that make it free? Probably not quite, but the kernel can ignore it until the process (presumably via its debugger) looks at it.

SuperQue · on Jan 29, 2022

If only there was a monitoring system you could install to measure such a thing.

xuhu · on Jan 28, 2022

Auditd_2.8-amd64.deb is 194kb on debian, rsyslog_8.32-amd64 is 411kb, and they both support centralized auditing and log collection from multiple hosts.

djbusby · on Jan 28, 2022

Do they do the metrics like Prometheus does? And include the central collector and basic graph builder?

dtech · on Jan 28, 2022

This doesn't seem a fair comparison. Prometheus is statically linked like all Go applications, and those packages are not. You can debate the merits of that, but if you compare a "only rsyslog" server vs a "only prometheus" server the 2 will be much closer in size.

dralley · on Jan 29, 2022

Even journalctl is only 90kb, and the entire combined systemd package is 4mb (but that's including all the documentation and a dozen other different binaries)

yodon · on Jan 28, 2022

And oh what a beautiful bike shed it will be...

TillE · on Jan 28, 2022

I work in a lot of situations with hard or soft resource limits where I actually do need to count bytes and/or CPU cycles, so it's bizarre to see anyone shrugging about distributing tens of megabytes of fat for literally no reason.

One thing Microsoft got right a long time ago was separating out debug symbols into their own file by default. I think that's still awkward on Linux.

Arnavion · on Jan 29, 2022

>I think that's still awkward on Linux.

At least for .deb and .rpm packages, the default build process automatically extracts debug symbols into separate packages. Eg the process of building package `foo` also produces `foo-dbgsym` and `foo-debuginfo` packages respectively that contain debug symbols for every involved binary and library, while `foo` contains the stripped files.

So anyone who wants to debug a coredump / live process just installs the corresponding -dbgsym / -debuginfo package and now gdb has all the debug info it needs.

Distros have also started incorporating debuginfod into their repos so that gdb can download symbols automatically. So you don't even have to hunt for the right debuginfo package.

sigmonsays · on Jan 28, 2022

I also find this comical.

I'd love to know why 100MB is that big of a deal. If network is slow, cache locally. Seems like nothing here to worry about.