Aaaaaaand torch is not a simple easy target. You don't just want support but high-performance optimized support on a pretty-complex moving target... maybe better/easier than CUDA but not that much it seems.
But what would they use before bringing in postgis ? I'm curious about the alternatives. MongoDB for example doesn't seem to have a geospatial ecosystem, apart from basic 2d features. Clickhouse ?
Forking Firefox whenever the rug is pulled seems doable (with elbow grease), and in the meantime Europeans can invest on problems that don't have an already mature fully open-source solution.
Every sensor in the array is sampling at frequency, so - first order - you can use that sampling frequency and the sample size, you get an idea of the input bandwidth in bytes/second. There are of course bandwidth reduction steps (filtering, downsampling, beamforming)...
Sorry, not sure I follow from what I said (explaining how much data sensors produce) to 'increasing the sampling frequency' ? You're usually sampling at larger width to then put specifically taylored pass-band filter and removing aliasing effects and then downsampling. This is a classic signal acquisition pattern : https://dsp.stackexchange.com/questions/63359/obtain-i-q-com...
None of this changes the actual real amount of data you have at the end of the day though after all is said and done, that's what I mean, so long as you don't botch it and capture too little. In computing terms, the amount of real data in a compressed archive and the uncompressed original is the same, even if the file size is larger for the latter.
On SKA from what I understand they're sampling broadband but quickly beamform and downsample as the datarates would be unsustainable to store over the whole array.
Right, that makes sense, you'd be looking at an insane amount of data across the ranges that these sensors can look at. But they would still need to preserve phase information if they want to use the array for what it is best at and that alone is a massive amount of data.
I think they preserve timestamped I,Q data. Know some people looking at down-sampling, preselecting those signals for longer term storage and deeper reprocessing and they seem to have a 24h window to 'analyze and keep what you need'.
We're still in technological phase where ADCs are far more advanced than storage and online processing systems, which means throwing away a lot. But I have high hopes for a system where you upgrade computing, network, storage (and maybe ADCs...) and you get an improved sensor. Throw man-hours at some GPU kernel developers and you get new science. The limit seems more now about enough people and compute to fully exploit the data than technological...
Too late to edit: any idea of the resolution that the I,Q data is sampled at (bandwidth, bit depth)? I've been in one of these installations a while ago and the tourguide had really no clue about any of the details (I think he was the son of one of the scientists)?
BTW if you're interested in the concept of upgrading a sensor without retooling the RF part, and the impact of 'just' putting new COTS racked server hardware and engineering man-hours to get a 'new' sensor with new capabilities, have a look at Julien Plante's work on NenuFAR (which isn't like the SKA at all :-) : https://cnrs.hal.science/USN/obspm-04273804v1 . Damien Gratadour, his PhD supervisor is an amazing technologist, dedicated to improving astronomy instruments, and I was very lucky to work with him and his team... the things the French can string together with small teams and thin budgets...
rust-vmm-based environment that verifies/authenticates an image before running ? Immutable VM (no FS, root dropper after setting up network, no or curated device), 'micro'-vm based on systemd ? vmm captures running kernel code/memory mapping before handing off to userland, checks periodically it hasn't changed ? Anything else on the state of the art of immutable/integrity-checking of VMs?
And NVIDIA supposedly has the exact knowhow for reliablity, as their Jetson 'industrial' parts are qualified for 10-15 years at maximal temp. Of course Jetson is on another point of the flops and watts curve.
Just wondering, if reliability increases if you slow down your use of GPUs a bit. Like pausing more often and stopping chasing every bubble and nvlink-all-reduce optimization.
Jetson uses LPDDR though. H100 failures seem driven by HBM heat sensitivity and the 700W+ envelope. That is a completely different thermal density I guess.
Reliability also depends strongly on current density and applied voltage, even more perhaps than on thermal density itself. So "slowing down" your average GPU use in a long-term sustainable way ought to improve those reliability figures via multiple mechanisms. Jetsons are great for very small-scale self-contained tasks (including on a performance-per-watt basis) but their limits are just as obvious, especially with the recently announced advances wrt. clustering the big server GPUs on a rack- and perhaps multi-rack level.
I don't have first-hand knowledge on HBM GPUs but on the RTX Blackwell 6000 Pro Server, the perf difference between the free up-to-600W and the same GPU capped at 300W is less than 10% on any workload I could (including Tensor Core-heavy ones) throw at it.
That's a very expensive 300W and I wonder what tradeoff made them go for this, and whether capping is here a way to increase reliability. ...
Wonder whether there's any writeup on those additional 300 Watts...
> whether capping is here a way to increase reliability
Almost certainly so, and you wouldn't even need to halve the wattage; even a smaller drop ought to bring a very clear improvement. The performance profile you mention is something you see all the time on CPUs when pushed to their extremes; it's crazy to see that pro-level GPUs are seemingly being tuned the same way out of the box.
It sounds like those workloads are memory bandwidth bound. In my experience with generative models, the compute units end up waiting on VRAM throughput, so throwing more wattage at the cores hits diminishing returns very quickly.
If they were memory bandwidth bound wouldn't that in itself push the wattage and thermals down comparatively, even on a "pegged to 100%" workload? That's the very clear pattern on CPU at least.
That's my experience as well, after monitoring frequency and temp on lots of kernel on all the spectrum from memory-bound, to L2-bound to compute-bound. Hard to reach the 600W with memory-bound kernel. TensorRT manages it somehow with some small to mid networks but perf increase seems capped around 10% too even with all the magic inside.
I thought so but no, iterative small matrix multiplication kernel in tensor cores, or pure (generative) compute with ultra-late reduction and ultra-small working memory. nsight-compute says everything is in L1 or small register file, no spilling, and that I am compute bound, good ILP. Can't find a way to get more than 10% for the 300W difference. Thus asking if anyone did better and how and how reliable the HW stays.
And even if you a runtime solution with no runtime cost, you'd still need to run the code, to find the memory safety bugs. Static analysis is supposed to tell you there is no path that violates memory safety.
Before we go down that path, datacenters full of oil tubs seems to be a trend in some spaces. Very good cooling power, low-tech heat extraction (small pumps to make the oil move, car radiators with big simple plumbing... do the trick well).
reply