Hacker News new | past | comments | ask | show | jobs | submit login
Can I have a smaller Prometheus (wejick.wordpress.com)
142 points by wejick on Jan 28, 2022 | hide | past | favorite | 147 comments



I maintain Faktory, a complex background job system written in Go. I keep the dependencies to a minimum and use `upx` to compress the built binary...

...for a grand total binary size of 5 MB.

So many modern systems are huge because of the complex dependency tree they pull in. My entire binary likely fits within the L3 cache of the CPU you are using.


The first thing a 5 MB upx-compressed binary does when executed is uncompress itself to 15~30 MB of memory, right? So what does the on-disk binary size have to do with my L3 cache?


Yup, upx is just a loading stub that decompresses the whole thing then gets out the way.

If anything it'll use more memory from the remains of the decompression stub and not being able to be clever about only reading from the backing file needed memory pages.


Uncompressed is less than 10MB.


So instead of 10MB you're paying 15MB? Seems like UPX is an unnecessary dependency.


Does that improve the usage of it? Have you experimented with larger sizes?


Thank you for Sidekiq, Mike :-D


For comparison librrd is a 0.4MB .so; the entire rrdtool userspace is about 1MB. Yes, I realize librrd is much simpler and less featureful than Prometheus. But it's an interesting comparable for what a small, old utility that does something similar could be.


Yup, it's tiny and simple. But the goals are wildly different.

The same Prometheus binary you can run on a Pi scales to millions of series and millions of samples per second.

The Prometheus TSDB itself is only one part of a the larger system. But, compared to librrd, it's vastly more functional.

Besides the scaling I mentioned

* It's ACID compliant.

* It has WAL for reliability.

* It has CPU and memory efficient compression.

* It has an efficient mmap-based data loader.

RRDtool, while efficient from a '90s perspective, is a toy by comparison. And yes, I've used rrdtool. Back in old-school days when Cacti was the new hot shit compared to MRTG.


Yet RRD databases are awfully big but often compress to single digit percentages. We use them to store one time graph data, so they do not ever get updated after the report job finishes. I'm looking for an alternative and it better not be a huge json array of arrays plus a header dictionary. It also should be an on disk something, not Influx or Prometheus.


You can use the Prometheus TSDB package, it's easy to import and use. Plus it's designed around mmap, so the kernel deals with the caching.

You don't have to use a whole Prometheus server to make something like this work.


I only know RRD from Munin graphs, and now have a toy idea to used it directly to render a SVG.

If disk space is an issue, do you or have you considered decompressing on the fly ?


Or use a file system that supports transparent compression (e.g. btrfs).


I just knew about this, good stuff. Many projects I used are using this one.


In which use case where Prometheus is used does it matter if the binary is 103MB or 2MB?


In the case where some rarely used feature is exploitable via some misconfiguration.

See also: log4j


This is the thing. Is all of that 105MB vs 2MB of stuff the devs actually know exactly what it is and why it's there? As others have stated, the dependancy fluff is just a multitude of footguns cocked, locked, and ready to rock your world


But how? I've got prometheus running in its own, non-root account. That makes exploiting vulnerabilities nearly useless, unless you can find a way to get it to spawn a process which can sudo. It runs behind nginx, which has a simple name/pwd protection on it. That makes it very hard to exploit. And prometheus only runs on one server; the others run node_exporter, or publish app data.

What am I missing?


You described multiple layers of security. Removing dependencies and stripping down code is the same: another layer of security.


> What am I missing?

If I knew, it would have a CVE number.

However, the less code you have, the fewer the places there are where you need to ask that question.


None, but it's an interesting case study of a widespread concern (not necessarily problem)


I don't think anyone cares. They could make the binary 15kb, no one would notice. Most of the times runs inside a pod that will feature a 500+ MB operating system anyway...


Go applications can be built into static self-contained binaries, which don't need extra dependencies. These binaries can run inside `scratch` Docker container, which doesn't take additional space. So it is easy to create small Docker images (less than 10Mb) with such binaries. See, for example, the following article - https://valyala.medium.com/stripping-dependency-bloat-in-vic...


They would notice on their Kubernetes storage accounting.


Ah no :-P


> Prometheus alternative

Well... if size of the executable is really a concern, perhaps Victoria Metrics is worth considering; my amd64 executable is about 17MiB in size.


Everyone should consider Victoria Metrics anyway. It scales better performance wise, and they broke out components to improve scalability (vmagent, vmalert, etc) when Prometheus was just one huge process that did all the things. The two work closely together, and even did a good talk together about the differences.


I love Prometheus because the (OpenMetrics) data protocol is so darn simple and easy to grok. You can do things like take an arbitrary data source, pipe it through awk and curl, and get it into prometheus metrics via remotewrite. You can also easily write your own /metrics endpoint in your favorite language.

VictoriaMetrics sweetens the deal by offering a solution to long-term storage and more flexible service architecture without leaving the simple and highly interopable Prometheus ecosystem.


Not only that, we were able to reduce the total virtual machine ram where our monitoring was hosted by half and storage is more efficient too I think when we switched to victoriametrics.


If only service-discovery features are used from the entire SDK, then the real story here is that Go's DCE passes aren't strong enough - or the SDK is coupled in a way that makes DCE not work (overuse of reflection?).


Reflection in general isn't the problem, but if you specifically use reflection to look up a method in your program, then dead code elimination gives up. https://go.dev/src/cmd/link/internal/ld/deadcode.go

Taught to me on HN recently: https://news.ycombinator.com/item?id=30041763


Interesting detail, didn’t realize go dead code analysis was this easily made ineffective. 52MB for ec2 SDK is a big chunk of code that must be mostly dead/unused.

Seems like eventually this should become a priority for library writers to support dead code analysis, otherwise going to get worse and worse..


These SDKs typically hang everything off of the same Client type, and since methods can be reached via reflect I'd guess none of the exported API is safe to drop. The SDKs would need to be written to be much more modular.

Those organizations also tend to spew a lot of code, most of it quite tangled together. They're definitely the largest dependencies I regularly see.


Not only are they spewing out a lot of manual code. Additionally they include a lot of auto generated and versioned code. Azure python for example has a duplicate of every method for every version of the api.


IIRC it’s because prom imports protobufs which use either Method or MethodByName, both of which put the linker in conservative mode.


We were using Prometheus for metrics (for k8s services), but switched to influxdb because we were using it to store other non-metrics data, and it’s nice not to have two time series databases with two different query languages you need to learn.

We went from using Prometheus itself to scrape metrics from services and export them to influxdb (fine, but very heavy), to using kapacitor (an utter nightmare for this particular use case), to just writing our own Prometheus metrics harvester from scratch (took about 3 days with service discovery). The current solution has been in place for more than a year and it’s perfect.

I love influxdb’s on-disk compression, but its query planner and unpredictable ram usage leave a lot to be desired (at least the 1.x versions).


I just moved companies from a pure Prometheus stack to an Influx one. Kapacitor is needlessly complex and overkill for what most people need.

There are THREE different ways to query data: InfluxQL, Flux and TICKScripts…all inferior to PromQL imo. The worse is Influx encourages you to mix them (e.g. PromQL in TICK), causing even more confusion. Documentation on advanced use cases is non existent.

Been an absolute nightmare tbh. The new company had a similar evolution too: holding non-metrics data in influx while also holding metrics. Trying to at least move metrics onto a Prom/Thanos stack here soon.


Hi halfmatthalfcat, InfluxDB team member here. I wouldn't say we encourage mixing InfluxQL, Flux, and TICK scripts. We encourage new use cases to use Flux and existing use cases to start migrating to Flux when possible. Allowing the three to interact enables existing users to migrate gradually and take advantage of newer features before they're fully migrated.

Regarding docs on advanced use cases, if you haven't already, try posting questions in the community forum: https://community.influxdata.com. Or, if you prefer Slack, there's a link at the top of that page to join our community Slack. We do our best to help with specific issues in those spaces and we also look for common themes that are causing problems for multiple users so that we can focus our efforts there, whether that's bug fixes, performance, features, or docs.


Hi physicles, member of the InfluxDB team here. What version of 1.x are you running? Also curious to know if you've tried 2.x.


We're still on 1.7.x and using TSM (not TSI), because we've found that to work well enough for the time being. Haven't tried 2.x yet because the investment needed to properly benchmark and upgrade would probably be around a week at our scale, and there are just more pressing things to do.

To be fair, we still haven't paid you guys a dime so I can't complain. But if you're interested in hearing my thoughts I can email you after next week's holiday.


I haven't looked but the latest 1.7.x is probably close to 2 years old. There have been quite a few improvements since 1.7 so you might see some improvements by upgrading to the latest 1.x.

Minor nit on "TSM (not TSI)" - TSM (.tsm files) are the data files and that format hasn't changed. TSI is the newer, although mature at this point, indexing option that spills onto disk and can therefore be larger than the original in-mem index. You're probably using in-mem indexing.

We're always interested in feedback. My name is david. You can email me at <name> at influxdata dot com.


Yeah that’s only unused until you need it at which point it doesn’t involve futzing with anything.

One of the things that kills me is running fluentd because you have to fuck around with ruby gems in containers every two minutes to get it to do something reasonable.

This is pain. Prom is not.


Great work on the part of the author. Pareto principal holds. Often all it takes is one person motivated enough to look for efficiency opportunities.

As for next steps, I can't imagine the Prometheus crew would object to a proposal + PR to make the Service Discovery an optional add-on in the next major version. It does open a can of worms around how such an add-on would be distributed if not built into the binary. (Caveat: I have no familiarity with this particular project or its unique constraints or goals.)


Easy: they can provide an option to remove the SD functionality at compile time, and if you really care about the executable size, you can compile the code with this option (and `-ldflags="-s -w"`). The standard build would still be the "all batteries included" one to avoid support issues (people downloading the smaller binary and then asking why SD isn't working).


Caddy for example does this on their website. You can pick whatever addon you want and you'll be able to download a binary with exactly what you want.


We've already talked about it at length, for many years. There's an approved development proposal to implement it. But, nobody has stepped up to contribute the code.

The main problem is, Go doesn't have any kind of reasonable loadable library system.

The current proposal is to make it easier to do compile-time plugins, similar to how Caddy and CoreDNS do things.

We don't even need a major version to add such a thing. Just the time to write the feature. The thing is, it's just not that important an improvement. When you are running Prometheus in a production environment, it will end up using gigabytes of memory and disk space to operate. The savings of a few megabytes of binary and runtime memory are just not that important.


On a side note, Prometheus seems to be built for bloat. AFAIK, it isn't even designed to consume metrics other than from apps linked to its client library. It's like a microservice, but with the footprint of an operating system.


It’s a database engine not a microservice and needs to be treated along the lines of postgresql etc.


  podman run postgres:14 du -sh /usr/lib/postgresql
  24M /usr/lib/postgresql
This includes the binary.


Try running big instances in containers.

Our smallest is 32 cores and 1TB of RAM.


> AFAIK, it isn't even designed to consume metrics other than from apps linked to its client library.

Could you elaborate? I use Prometheus to scrape from an HTTP endpoint in various Pods in Kubernetes, so the service discovery is pretty useful to me.

I could see the Kubernetes & the other SDs split out of the core binary if default size is really an issue. Or are you talking about something else?


I'm talking about the way I'm expected to provide metrics for my apps. Rather than exporting free-form JSON and then scripting Prometheus to understand it, I'm expected to use a custom client library to export the metrics. As for Kubernetes, you can only use it with Prometheus because of not insignificant amount of work on both sides. Basically, the latter is designed for vendor lock-in.


What a bizarre claim.

Prometheus scrapes the same text format as OpenMetrics 1.0 and over 700 public exporters use this format, and there are TONS of other non-Prometheus software that consume the exact same text format. Prometheus's biggest competitor, Datadog (which is not open source mind you), consumes it too. I think even Grafana consumes it directly. It's becoming an IETF standard[0].

Would I have preferred JSON over a custom text format like this? Yeah. But to claim an open source project like Prometheus with effectively no business at all is using a text format like this to have vendor lock-in? That's quite a stretch.

[0] https://github.com/OpenObservability/OpenMetrics/blob/main/s...


> Prometheus scrapes the same text format as OpenMetrics 1.0

I find the GP's claims weird - I've written a relative ton of collectors, exporters, and translators and the format is pretty OK, not worse than most that came before it and better than lots - but I think this relationship is backwards. Prometheus "scrapes OpenMetrics" because OpenMetrics was formal documentation of what Prometheus was already doing for years.

I would not have preferred JSON. That an exposed metric is also a query is also pretty close to a schematic definition is nice.


I apologize for my mistake, then. My understanding was based on reading the Prometheus docs on making exporters alone - something I needed urgently for a job.


no apology needed, I am sorry the world has a culture of mistakes being a bad thing.


Include the client library if you want, but the wire format is ridiculously simple. I'll implement it from memory in a HN comment.

    http.HandleFunc("/metrics", func(w http.ResponseWriter, req *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Header().Add("content-type", "text/plain")
        w.Write([]byte("# HELP foo_bar The numbers of foos barred.\n# TYPE foo_bar counter\nfoo_bar 42\n"))
    })
The client library is largely to keep track of running counters (and gauges, histograms, etc.), with a small amount of code to actually report those metrics when scraped. It's a very simple format.


A good example of it looking simple but having annoying corner cases.

The content type MUST be: application/openmetrics-text; version=1.0.0; charset=utf-8


Prometheus follows the OpenMetrics standard I'm not sure what you find propietary about that or specific to prometheus.

https://github.com/OpenObservability/OpenMetrics/blob/main/s...


To be precise it was the other way around. OpenMetrics is a standardization effort for the format Prometheus made up.

However Prometheus was designed before JSON was standardized itself, so I'm just glad they didn't choose XML!


And the Prometheus format is a copy of the "varz" format


Sorta, I would say it's an evolution of varz.

IIRC (it's been almost a decade since I used varz), having multiple label values would be a map of maps in varz. It got quite ugly if you wanted to have a number of dimensions.


Ah ok, I see what you mean.

The other commenters have pointed out that it _is_ based on another open standard, but admittedly one less common than say, JSON. So you'll generally have to implement your own metrics producer or use a client library, that's true.

However it's also a dead simple format and you can probably implement it with a for-loop or a shell script.


Prometheus supported a JSON representation in the beginning. It was deprecated and removed before 1.0. The current exposition format was created because it cut CPU and memory for scraping metrics in half.

JSON, especially free-form JSON, is not a good format for efficient metrics monitoring.


The prometheus format is literally just a text page. It dead simple to implement


The design consideration was not that it had to be simple to implement. It's that it had to be easy to parse by a human during an outage when nothing else works.


It's a little bit of both. Simple to implement, simple (and fast/cheap) to parse, and human readable.


There are a frustrating number of fundamental corner cases due variance to floating point text formats, and slightly more in the descriptor if you also need that. It's simple to implement an expositor for a limited set of cases. As usual, it's much more difficult to parse what you actually find in the world.


Maybe I'm being downvoted for insufficient examples? OK, here's a big one:

OpenMetrics's production rule for the format says:

    labels = "{" [label *(COMMA label)] "}"
And yet, the Prometheus Java client library exports a trailing comma with no subsequent label.

As for fp, I've seen parsers break on `e` v. `E`, and `NaN`/`Inf`/ vs. `nan`/`inf`. The latest IETF draft even has a comment,

     ; Not 100% sure this captures all float corner cases.
Control characters are allowed in descriptors and label values and no normalization form is specified.


Yea, there still some corner cases and implementation bugs out there. We spent months deliberating how to deal with some of these. Because the base libraries in some languages just don't produce string output from IEEE 754 the same way.

IIRC, Java is different from Python is different from Go. So, really, this is a standardization in languages problem. We tried to work around these as best we could in the OM format.


> It's like a microservice, but with the footprint of an operating system.

What? It's 100mb vs 25mb, like 75MB of data more... Who cares about a binary being 75MB larger in 2022?


This is a frustrating position.

Everything is layers. We build things on top of other things. If every layer had that attitude, then the bloat would be enormous. It's already getting there.

We should praise judicious effort into optimizing any of the resources used in the systems build, at every layer.


I understand where you're getting at. IMO you're barking at the wrong tree: the problem with bloatware will not be solved by prometheus shipping a lighter statically linked binary.


I suspect this is 70%+ of all features of all tools remain undiscovered by the users.


Your use-case is not completely clear to me based on the article, but you might be better off with Prometheus’ agent approach, introduced recently: https://prometheus.io/docs/prometheus/latest/feature_flags/#...



How does the author determine how "most" people use prometheus?


Simple, you notice it's capable of communicating with lots of mutually exclusive cloud services and note that it could be smaller if you remove some of the relevant dependencies.

Now whether that's a particularly useful observation I'm still not sure.


these software development kits are to a large extent a strongly typed representation of the REST API graph.

Even though the application might only need two or three endpoints in kubernetes - which would be trivial to implement in go in just a couple of lines - they favor strong typing and include the SDK which is several megabytes. And the same for AWS, Azure, ...

I'm not passing any judgment here by the way.


i'd like to see memory usage differences, load time & runtime performance impacts. i expect most of these to be small but i expect some impact.

also just worth oting that the memory impact of statically compiling in general is probably massive. most systems probably would have a good percent of these libraries in memory already if promtheus were using dynamic linking.


I've done this test in the past. The typical footprint of default Prometheus is around 100MiB of RSS.

Removing everything but flie and static configs reduces it to about 50MiB.

Interesting for more embedded use cases, but not really a big deal when you're using a few GiB of memory for TSDB ingestion buffering.


Now that Go has reflection in the upcoming 1.18 release, most HN comments about Go relate to binary size. Here we are again.


Do you mean generics? Go has had reflection for a long time.


Yup! Long day. And I just wrote a more substantive comment about reflection...


Why optimize a binary that's 109MB? That's too small to matter.


30 years later:

" Why optimize a binary that's 109GB? That's too small to matter."


I mean, for current computers (or even my 10 year old server), 128MB is so small that it's not worth optimizing. My $25 raspberry pis can run this without any problems while also running a bunch of other programs.

my first linux computer had 4MB of RAM but that doesnt' mean I try to fit anything into that (once I upgraded to 32MB, I could run g++, emacs, X11 and xterm at the same time!)


Why did you upgrade back then?

Because you wanted to be able to run more stuff, or because you wanted to be able to run the exact same executables, just bigger?


Less paging. I could swap but it was slow.

basically nobody is swapping because of a 128MB executable. if you are, get more ram or don't run prometheus.


Less paging. What causes paging? Programs using lots of memory.


A binary that doesn't fit in cache isn't too small to matter.


What's the page size on linux. Are executables (even statically linked ones) demand paged? How much of the executable that you don't use is paged in when you don't use it?


That's not running binaries works.

Only the parts of the code that are in use are paged into the page cache. So if you only use a couple of the features, it fits in cache just fine.


Outstanding!


Clutching pearls about binary size is and always will be hilarious to me.


Please keep snark, name-calling, shallow dismissal, and supercilious putdowns off this site. We're trying to avoid all of that here.

https://news.ycombinator.com/newsguidelines.html

Edit: we've had to ask you repeatedly to follow the site guidelines. Could you please review them and start following them now?


ACK :)


It's all fun and games until you're stuck for a hour downloading 600MB of updated packages over a metered LTE. The same is with RAM usage: 512MB was enough for a phone back in 2014, now a smart TV with 2GB is barely capable of multitasking. Sure, binary sizes don't matter in most contexts. But when they do, it's a PITA.


Sure, but we're talking about an application written for a cloud/hosted environment in a datacenter somewhere. nicking at the size of a statically linked binary meant for production grade environments with fast computers and fat pipes feels overly pedantic no? Especially when we're talking about a mere 100MB


All the more reason! On the cloud, you're often paying per kilobyte.


Not for ingress, and certainly not for packages that are mirrored from the cloud vendor's own repositories (which Prometheus absolutely is).


Galaxy S5 from 2014 had 2GB and that was 1080p vs 4k texture sizes for today. Seems on par.


What you're kind of missing is that the S5 was a flagship phone. Generally, one has to save for more than a month to afford a purchase like that. The idea of working an extra month so that some FAANG prick meets their KPI by cutting corners on optimization doesn't even look like feudalism. It looks like idiocracy. Paying the lip service of fat shaming code bloat is the cost-effective option by comparison :)


What does FAANG have to do with this?

Don't FANNG people obsess over bloat because they're trying to reach billions of customers? It might not seem that way since their pages are bigger but I'd be surprised if they were happy to leave 10s of millions of customers on the table.


They're just poster children for the particular brand of disdain $100k+/year "tech workers" bear for their users: they make enough for the shiniest of toys, so they're too far above spending their valuable time to make their software run smooth on our $100 crap phones. Nevermind that each Fb client update likely produces hundreds of tons of toxic trash called gadgets. Sure, sometimes they do optimizations. Generally, though, both Fb and Google keep exploring the physical limits to code bloat. Remember that one time that Fb hit the JVM class count limit?


FAANG are the worst offenders. Didn't facebook employ ungodly hacks to unload/load parts of the android app to navigate around the 65k method limit of dex? Have you looked at the js monstrosity of the Google hardware shop website?


You misunderstand the point. You're comparing a 1080p phone to a 4k television when texture memory is what will take up the vast majority of ram. Code footprint is pretty irrelevant.

Still the TV does fine with 2GB. Doesn't seem fair to complain.


I wasn't speaking of a 4k TV, but still, this doesn't check out. A single 2160p framebuffer is 8MPix, or 32MiB. Not counting the original FB size, the extra 1.5GiB are enough for 48 whole framebuffers. You don't need that much image data all at once, the number is ridiculous. No, I believe it's just that the code became that much less efficient.


Think of each app and all the texture content that needs to be loaded. App textures get 4x as big, all things being equal. You see a 4x change in ram across those devices.


Would you pay an extra 20k for your tv so it could have 500MB of memory and have all it's apps work?


I can guarantee 100% my Prometheus instance will never be running on metered LTE. If such a situation arises then my operational metrics are the least of my concern.


This one does not even make sense - 100 megs for a binary for centralized metrics? Who would even notice next to the OS and metrics storage.

By design you should not install prometheus on every server you monitor - it's designed to scrape metrics

Its a database, webui with support for email, webhooks, slack, pagerduty, aws api and many others. 100megs does not sound like a lot for all Pormetheus provides


It correlates to performance, speed to iterate, security, and design complexity, but ok


I'm unclear that it correlates to iteration speed or design complexity.

Actually, performance too.


More compact code fits in caches better.


Any hard data on any of that?


Unless a fat binary embeds pictures and some music, it's all CPU instructions.

Tenths of megabytes of CPU instructions is complexity.

This kind of bloat is the number one enemy of security, as any security engineer could confirm.


Sure, lots, go look for studies on estimation of defects based on LOC and project size/complexity (they go back to the 1970s). But you don't need to look, the principles are simple.

Unless an application is filled with JPEGs or uncompressed arbitrary data files, its size reflects lines of code (machine code, interpreted code, etc). Bigger the app, the more lines of code.

Every line of code has a non-zero bug probability. Every new line of code increases probability. More lines of code, higher probability. Bugs include security bugs; higher probability of bugs, higher probability of security bugs.

CPU cache is finite. Only so many lines of code can be cached or optimized. Larger size takes up more room in memory, which when combined with lot of other gigantic apps, means less memory for heap space, disk cache, etc. Larger size also takes up more room on disk, which adds up when you don't delete old builds on disk and loop over a build process. Since larger size means more lines of code, that means longer compile times, which means longer wait every time you change a line and need to recompile, copy an artifact somewhere, retest.

More lines of code means more code executed. If you have 10 lines of code in a function, and you add 100 lines to it, the compiler doesn't just optimize away all 100 new lines, it's going to add more machine code and code paths. Unless you only ever add new code paths, some of that new code will extend existing code paths or add instructions, and that means more CPU cycles to complete execution. (Same concept for interpreted code)

More lines of code means more code paths. More code paths increases complexity. The more code paths, the longer and more difficult testing gets to the point you can't even develop enough tests to cover all the code paths, so it's impossible to even find all the bugs. More complexity leads to difficulty in humans understanding and working with the codebase, and difficulty in understanding leads to slower and more error-prone development.

Larger means more network bandwidth, meaning file transfers take longer, increasing speed to iterate and producing worse UX. If people download your app every 10 minutes in their CI/CD pipeline, larger size means more network bandwidth used. "Free" CDNs have limits; the larger a project gets, the more file size affects network performance, reliability, and cost. If you pay for bandwidth, a 100MB file costs 100x more than a 1MB file.

The more apps you use that are big, the more every one of these effects increase. One big app you might not notice. 100 big apps lead to noticeable slowness, bugs, less memory, less disk space.


It's kind of ironic reading this comment given that at the time you posted it, I was screaming at gcc's stupid code generator for wasting bytes recreating constants that were already there in that very register! That code needs to fit in a couple hundred bytes..

And half an hour ago I was (once again) checking out hosting providers and lamenting the fact that most don't seem to offer support for loading custom ISOs so I could install a 30 megabyte distro and make the most out of the cheap plans that only offer something like 10 gigabytes of storage. Half of it is wasted after you install one of the these obese mainstream distros.


Hetzner Cloud can start instances from ISOs - here is an example for ipfire : https://wiki.ipfire.org/installation/hetzner-cloud


I got a VPS from Hetzner last year but they decided to block my home IP. After reading some anecdotes on the internet, I had to conclude they're exactly the kind of company I want to avoid (large, opaque, they employ weird algorithms/heuristics to flat out reject customers or suddenly take down their servers, no warning, you can't get an explanation, you're just fucked, just like when Google decides to arbitrarily block you; I've been there).

IMO the point of a hosting provider is supposed to be that you can have some peace of mind and not worry about your shit breaking (that's still a worry as I continue to host everything at home). Instead with providers like this, you worry about them breaking your shit.


vultr lets you install custom ISOs, can also pxeboot.


Author doesn't even say why they object to the size. Are they aware that file-backed executables are paged on demand and only the active parts of the program will be resident?


Granted these days everyone is used to applications consuming massive amounts of drive space. But perhaps they're using legacy hardware for a home lab, or a IoT device with limited disk space.

From a security stand point, reduced application code decreases risk. It was service discovery code he removed, what if it reached out to discover services on application start up, that's a potential attack vector.


Does it actually reduce the risk? Sure if you audit, its easier to identify the risks, but a windows 98 program is going to be full of vulnerabilities while being small. Being small doesn't remove the vulnerabilities


> From a security stand point, reduced application code decreases risk. It was service discovery code he removed, what if it reached out to discover services on application start up, that's a potential attack vector.

Agreed. I've see a similar pattern with certain open source libraries.

The first example I think of is the spf13/viper [1] library, used to load configuration into go applications. Viper is equipped with code for reading config from various file formats, environment variables, as well as remote config sources such as etcd, consul. If you introduce the viper library as a dependency of your application to merely read config from environment variables and YAML files in the local filesystem, then your go application suddenly gains a bunch of transitive dependencies on modules related to remote config loading for various species of remote config provider. It's not uncommon for these kind of remote config loading dependencies to have security vulnerabilities.

As well as the potential increased attack surface if a bunch of unnecessary code to load application configuration from all manner of remote config providers ends up in your application binary [2], if you work in an environment that monitors for vulnerabilities in open source dependencies, if you depend on an open source library that drags in dozens of transitive dependencies you don't really need, it adds a fair bit of additional overhead re: detecting, investigating and patching the potential vulnerabilities.

I guess there's arguably a "Hickean" simple-vs-easy tradeoff in how such libraries are designed. The "easy" design, that makes it quick for developers to get started and achieve immediate success with a config loading library, is to include code to load config from all popular supported config sources into the default configuration of the library, reducing the amount of steps a new user has to do to get the library to work for their use case. A less easy but arguably "simpler" design might be to only include a common config-provider interface in the core module and push all config-provider-specific client/adaptor code into separate modules, and force the user to think about which config sources they want to read from and then manually add and integrate the dependencies for the corresponding modules that contain the additional code they want.

edit: there has indeed been some discussion about the proliferation of dependencies, and what to do about them, in viper's issue tracker [3] [4]

[1] https://github.com/spf13/viper [2] this may or may not actually happen, depending on which function calls you actually use and what the compiler figures out. If your application doesn't call any remote-config-provider library functions then you shouldn't expect to find any in your resulting application binary, even if the dependency is there at the coarser-grain module dependency level [3] https://github.com/spf13/viper/issues/887 [4] https://github.com/spf13/viper/issues/707


Image pull size for a container is likely the concern. It could shave a few seconds off a regularly-run integration test. If it's run via on-demand build agents, then there's no image cache.


If it takes multiple seconds to pull ~35MB of compressible text into your CI environment, there may be other, larger problems to solve.


I was estimating off the speed it takes to pull images to my local computer where the limiting factor appears to be something other than my internet connection so either the image extraction process or a docker hub throttle.


That only helps if the code is well segregated by usage. Looking at the ELF symbol table for prometheus-2.33.0-rc.1.linux-amd64, it's not clear to me this is the case. Not sure how it's ordered. Lexical import order? Anyhow, without profiling how could the compiler know how to order things optimally?

I think this is one of those cases where, in the absence of profiling or some other hack (e.g. ensuring all routines within a library are cleanly segregated across page boundaries within the static binary and the I/O scheduler doesn't foil your intent), dynamic linking would prove superior, at least for such large amounts of code.


sorry not to make it obvious in the article, I'm planning to run it in small iot pi based device locally. So having something small and fast is preferable, however the runtime performance is a more important thing I haven't touch.


Prometheus is rather efficient, but it's focus is a little different than yours. Its designed to for large scale collection of metrics, scraped from many remote endpoints

You can run it locally but the "prometheus" way for iot env would be a central prometheus server that scrapes the iot devices running a prometheus exporter, which tend to be very light weight


Totally agree. Another part of it is just feeding curiosities.


Prometheus works fine as-is on Pi devices. You'll spend most of your memory on ingestion buffering. I did the same tests as you did a while back, it only saves like 25-50MiB of memory IIRC.

The only thing you really need to worry about on a Pi is that the default kernels are still 32-bit, and are set to 2GiB kernel boundary. So you'll be limited to how much TSDB storage can be mmap'd unless you switch to a 64-bit kernel.

You may want to consider agent mode on your IoT device, and stream the data to an external server/service.

https://prometheus.io/docs/prometheus/latest/feature_flags/#...


That's a good insight. Thanks


I'm curious -- is it the binary size that's a problem, or the resident size in memory? Demand paging should help, although you'd be stuck with carrying the enlarged binary.


my gut feeling tell it will be both memory and cpu utilization. cant be sure, until I can find good way to measure it.


ps RSS measurement is pretty good for a start (although note in a forked process, shared copy-on-write pages are both reported as RSS for both the parent and child process).

top reports this as RES.

IIRC, debugging information is in a separate part of the process, so it's not loaded until it's used. Does that make it free? Probably not quite, but the kernel can ignore it until the process (presumably via its debugger) looks at it.


If only there was a monitoring system you could install to measure such a thing.


Auditd_2.8-amd64.deb is 194kb on debian, rsyslog_8.32-amd64 is 411kb, and they both support centralized auditing and log collection from multiple hosts.


Do they do the metrics like Prometheus does? And include the central collector and basic graph builder?


This doesn't seem a fair comparison. Prometheus is statically linked like all Go applications, and those packages are not. You can debate the merits of that, but if you compare a "only rsyslog" server vs a "only prometheus" server the 2 will be much closer in size.


Even journalctl is only 90kb, and the entire combined systemd package is 4mb (but that's including all the documentation and a dozen other different binaries)


And oh what a beautiful bike shed it will be...


I work in a lot of situations with hard or soft resource limits where I actually do need to count bytes and/or CPU cycles, so it's bizarre to see anyone shrugging about distributing tens of megabytes of fat for literally no reason.

One thing Microsoft got right a long time ago was separating out debug symbols into their own file by default. I think that's still awkward on Linux.


>I think that's still awkward on Linux.

At least for .deb and .rpm packages, the default build process automatically extracts debug symbols into separate packages. Eg the process of building package `foo` also produces `foo-dbgsym` and `foo-debuginfo` packages respectively that contain debug symbols for every involved binary and library, while `foo` contains the stripped files.

So anyone who wants to debug a coredump / live process just installs the corresponding -dbgsym / -debuginfo package and now gdb has all the debug info it needs.

Distros have also started incorporating debuginfod into their repos so that gdb can download symbols automatically. So you don't even have to hunt for the right debuginfo package.


I also find this comical.

I'd love to know why 100MB is that big of a deal. If network is slow, cache locally. Seems like nothing here to worry about.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: