More

SeasonalEnnui · 2026-01-21T22:05:36 1769033136

What's the best way to get all those points from a backend into the frontend webgpu compute shader?

There doesn't seem to be a communication mechanism that has minimal memcopy or no serialization/deserialization, the security boundary makes this difficult.

I have a backend array of 10M i16 points, I want to get this into the frontend (with scale & offset data provided via side channel to the compute shader).

As it stands, I currently process on the backend and send the frontend a bitmap or simplified SVG. I'm curious to know about the opposite approach.

olau · 2026-01-21T22:55:16 1769036116

Not sure, but I solved a similar problem many years ago, and ended up concluding it was silly to send all the data to the client when the client didn't have the visual resolution to show it anyway. So I sampled it adaptively client-side by precomputing and storing multiple zoom-levels. That way the client-side chart app would get the points and you could zoom in, but you'd only ever retrieve about 1000-2000 points at the time.

SeasonalEnnui · 2026-01-22T12:40:33 1769085633

Yeah I agree, I'd like to get an idea of the order-of-magnitude of difference between the two approaches by trying it out but realistically I don't think there's an easy way to get a i16 raw array into the browser runtime with minimal overhead (WebRTC maybe?)

dapperdrake · 2026-01-22T14:01:47 1769090507

That was also my research group's approach.

fulafel · 2026-01-22T15:51:07 1769097067

Look up trasnferable objects, it's not new. The fetch api can get you ArrayBuffers that you can shuffle around zero copy, besides to webgl buffers, also to web workers.

But minimizing copying or avoiding format conversions doesn't necessarily get you best performance of course.

SeasonalEnnui · 2026-01-22T18:31:22 1769106682

I had a look, that certainly looks like part of the solution, now I need to get that array buffer from my backend into the browser runtime transferable object.

shunia_huang · 2026-01-22T06:01:18 1769061678

I'm not so good at English but points are: - Websocket to send raw point data batch by batch - Strip the float value to integer if possible or multiple it before sending if it won't exceed Number.Max_Integer or something alike - The front-end should build wrapper around the received raw data for indexing so that no need to modify the data - There should be drawing/chart libraries handling the rendering quite well with proper data format with batched data

lmeyerov · 2026-01-22T05:16:51 1769059011

Apache arrow is great here, basically the reason we wrote the initial js tier is for easier shuttling from cloud GPUs & cloud analytics pipelines to webgl in the browser

rustystump · 2026-01-22T04:37:30 1769056650

I did something similar for syncing 10m particles in a sim for a multiplayer test. The gist is that at a certain scale it is cheaper to send a frame buffer but the scale needs to be massive.

For this, compression/quantize numbers and then pass that directly to the gpu after it comes off the network. Have a compute shader on the gpu decompress before writing to a frame buffer. This is what high performance lidar streaming renderers do as lidar data is packed efficiently for transport.

SeasonalEnnui · 2025-11-29T21:16:33 1764450993

I've done both, I prefer embedded web views:

- All the work is done in my high performance backend, where I joyfully optimise my hot loops to the assembly level. The web view is a thin layer on top.

- HTML and CSS is a joy to work with in comparison to many UI toolkits. LLMs are better at supporting a web stack.

- The UI zooms/scales, and is accessible with screen readers (looking at you, imgui).

- Cross platform with low effort.

IMO you have to be extremely careful not to pull in a whole frontend stack. Stay as vanilla as possible, maybe alpine.js or tailwind, and I've got hot reload set up so the developer productivity loop is tight when editing the view.

mentalgear · 2025-11-29T21:23:19 1764451399

I can recommend svelte(kit): great API and compiles down to just js.

SeasonalEnnui · 2025-11-19T17:33:23 1763573603

It's a huge leap forwards from those days.

- Works on linux/macos, x86/ARM64.

- The mature frameworks (e.g. ASP.NET with razor pages) are great. Microsoft still have the same issue of pushing new and different ways of doing web things, but you do see a lot of that on the web platform in general.

- CLI workflow for compilation/build/deployment is now there and works smoothly. VS Code extensions for a bit of intellisense without requiring a full IDE (if that's the way you work).

The thing I enjoy most about modern C# is the depth/levels of progressive enhancement you can do. Let's say in the first instance, you write a proof of concept algorithm using basic concepts like List<T>, foreach, stream writing. Accessible to a beginner, safe code, but it'll churn memory (which is GC'd) and run using scalar CPU instructions.

Depending on your requirements you can then progressively enhance the memory churn, or the processing speed:

for(;;), async, LINQ, T[], ArrayPool<T>, Span<T>, NativeMemory.Alloc, Parallel.For, Vector<T>, Vector256<T>, System.Runtime.Intrinsics.

Eventually getting to a point where it's nearly the same as the best C code you could write, with no memory churn (or stop-the-world GC), and SIMD over all CPU cores for blisteringly fast performance, whilst keeping the all/most of the safety.

pmbanugo · 2025-11-19T21:03:55 1763586235

this performance aspects is interesting. So time to try C# again. I;m learning Zig for some of those reasons, but also because the language has a small scope and the language features will be smaller

osigurdson · 2025-11-20T00:08:06 1763597286

If you are leaning towards Zig, I don't think C# will be what you are looking for. It is a good option along with Java and Go, but not in Zig, C, Rust territory if that is what you want.

SeasonalEnnui · 2025-11-05T11:29:12 1762342152

Yes! I’m glad to see this pointed out - when working on UIs, I regularly move them between 3 monitors with varying resolution & DPI. 4k @ 200%, 2K at 125%, and 2K at 100%. This reveals not only design issues but application stack issues with DPI support.

SeasonalEnnui · 2025-09-18T11:05:33 1758193533

The thing I enjoy most about C# is the depth/levels of progressive enhancement you can do.

Let's say in the first instance, you write a proof of concept algorithm using basic concepts like List<T>, foreach, stream writing. Accessible to a beginner, safe code, but it'll churn memory (which is GC'd) and run using scalar CPU instructions.

Depending on your requirements you can then progressively enhance the memory churn, or the processing speed:

for(;;), async, LINQ, T[], ArrayPool<T>, Span<T>, NativeMemory.Alloc, Parallel.For, Vector<T>, Vector256<T>, System.Runtime.Intrinsics.

Eventually getting to a point where it's nearly the same as the best C code you could write, with no memory churn (or stop-the-world GC), and SIMD over all CPU cores for blisteringly fast performance, whilst keeping the all/most of the safety.

I think these new language features have the same virtue - I can opt into them later, and intellisense/analysers will optionally make me aware that they exist.

opticfluorine · 2025-09-18T12:06:53 1758197213

I have occasionally, just for fun, written benchmarks for some algorithm in C++ and an equivalent C# implementation, them tried to bring the managed performance in line with native using the methods you mention and others. I'm always surprised by how often I can match the performance of the unmanaged code (even when I'm trying to optimize my C++ to the limit) while still ending up with readable and maintainable C#.

iamflimflam1 · 2025-09-18T20:10:11 1758226211

JIT compilers can outperform statically compiled code by analysing at run time exactly what branches are taken and then optimising based on that.

panzi · 2025-09-21T14:03:07 1758463387

Does this include the GC at the end of it all? Because if that happens after the end timestamp it's not an exact comparison. I read something once about speeding up a C/C++ compiler by simply turning free into a no-op. Such a compiler basically allocates more and more data and only frees it all at the end of execution, so then doing all the free calls is just wasted CPU cycles.

throw-the-towel · 2025-09-18T20:48:19 1758228499

Could you please share some benchmark code? It would be incredibly useful as a learning aid!

pjmlp · 2025-09-18T14:13:52 1758204832

That is the promise we could already have had in the 1990's with languages like Eiffel, Oberon and Modula-3, and it has taken us about 30 years to finally become mainstream.

C# is not the only one offering these kind of capabilities, still big kudos to the team, and the .NET performance improvements blog posts are a pleasure to read.

SeasonalEnnui · 2025-08-22T13:59:09 1755871149

Yes, totally agree. The 2nd thing I found it great for was to explain errors, it either finds the exact solution, or sparked a thought that lead to the answer.

mexicocitinluez · 2025-08-22T14:06:20 1755871580

It's the height of absurdity to me that this is possible and devs will still say outrageous shit like "These tools have no use"

SeasonalEnnui · 2025-08-22T13:54:37 1755870877

Good blog post, I recognise much of that.

The positions of both evangelists and luddites seems mad to me, there's too much emotion involved in those positions for what amounts to another tool in the toolbox that should only be used in appropriate situations.

SeasonalEnnui · 2025-06-25T08:53:55 1750841635

Mode Control Panel? I'm not sure about "eating" the world but certainly used worldwide by aircraft.

Anecdotally I've noticed a lot of acronyms from science/technology being reused in the context of LLMs, what a curious phenomenon.

DocTomoe · 2025-06-25T10:22:42 1750846962

26^3 is only 17576 potential TLAs. Some collisions are to be expected, especially when you take into account some phrases are more likely to occur than others (such as CP for Control Panel, Certified Professional, Control Protocol, Comprehensive Plan, Child Process ...)

SeasonalEnnui · 2025-06-05T20:56:18 1749156978

I really like this. Did you investigate the options regarding CO2 sensors? I'm interested to know if you compared SCD30 to SCD41? The dual-channel design of SCD30 is supposed to offer lower drift and longer stability compared to the SCD41 (which claims to need taking outside once a week). That's the deal on paper, I'm wondering if you got any real data on this.

256dpi · 2025-06-05T21:39:55 1749159595

Thanks! The SCD30 is a great sensor and obviously better than the SCD41. But we did not look at it in more detail, as we chose the SCD41 primarily for its small size. We believe that an accuracy of +/-50ppm is enough for a device like the Air Lab. Also, we'll actively look into reminding the user to take the device outside if automatic calibration is used. On top of that, it's our plan to either factory calibrate the devices and/or offer manual recalibration that should extend beyond the 1-week interval with automatic calibration.

tyeth · 2025-06-19T12:18:59 1750335539

Maybe contact Sensirion about the STCC4, better form factor and due for release any moment...They have a contact sales option and are usually helpful. https://sensirion.com/products/catalog/STCC4 (Not sure but thought maybe used in the SEN66 which is available now and covers most things including PPM/CO2/VOC/NOx)

SeasonalEnnui · 2025-02-20T10:19:55 1740046795

This SDK update moving the RP2040 baseline to 200MHz is very welcome for the projects where you can't have an egregious overclock over manufacturer approved spec.

For hobby projects, I've achieved significant overclocks on both RP2040 and RP2350. I suspect this is mostly due to the use of TSMC's 40NM 40LP process which is a smaller process node than most microcontrollers use.