What's the best way to get all those points from a backend into the frontend webgpu compute shader?
There doesn't seem to be a communication mechanism that has minimal memcopy or no serialization/deserialization, the security boundary makes this difficult.
I have a backend array of 10M i16 points, I want to get this into the frontend (with scale & offset data provided via side channel to the compute shader).
As it stands, I currently process on the backend and send the frontend a bitmap or simplified SVG. I'm curious to know about the opposite approach.
Not sure, but I solved a similar problem many years ago, and ended up concluding it was silly to send all the data to the client when the client didn't have the visual resolution to show it anyway. So I sampled it adaptively client-side by precomputing and storing multiple zoom-levels. That way the client-side chart app would get the points and you could zoom in, but you'd only ever retrieve about 1000-2000 points at the time.
Yeah I agree, I'd like to get an idea of the order-of-magnitude of difference between the two approaches by trying it out but realistically I don't think there's an easy way to get a i16 raw array into the browser runtime with minimal overhead (WebRTC maybe?)
Look up trasnferable objects, it's not new. The fetch api can get you ArrayBuffers that you can shuffle around zero copy, besides to webgl buffers, also to web workers.
But minimizing copying or avoiding format conversions doesn't necessarily get you best performance of course.
I had a look, that certainly looks like part of the solution, now I need to get that array buffer from my backend into the browser runtime transferable object.
I'm not so good at English but points are:
- Websocket to send raw point data batch by batch
- Strip the float value to integer if possible or multiple it before sending if it won't exceed Number.Max_Integer or something alike
- The front-end should build wrapper around the received raw data for indexing so that no need to modify the data
- There should be drawing/chart libraries handling the rendering quite well with proper data format with batched data
Apache arrow is great here, basically the reason we wrote the initial js tier is for easier shuttling from cloud GPUs & cloud analytics pipelines to webgl in the browser
I did something similar for syncing 10m particles in a sim for a multiplayer test. The gist is that at a certain scale it is cheaper to send a frame buffer but the scale needs to be massive.
For this, compression/quantize numbers and then pass that directly to the gpu after it comes off the network. Have a compute shader on the gpu decompress before writing to a frame buffer. This is what high performance lidar streaming renderers do as lidar data is packed efficiently for transport.
- All the work is done in my high performance backend, where I joyfully optimise my hot loops to the assembly level. The web view is a thin layer on top.
- HTML and CSS is a joy to work with in comparison to many UI toolkits. LLMs are better at supporting a web stack.
- The UI zooms/scales, and is accessible with screen readers (looking at you, imgui).
- Cross platform with low effort.
IMO you have to be extremely careful not to pull in a whole frontend stack. Stay as vanilla as possible, maybe alpine.js or tailwind, and I've got hot reload set up so the developer productivity loop is tight when editing the view.
- The mature frameworks (e.g. ASP.NET with razor pages) are great. Microsoft still have the same issue of pushing new and different ways of doing web things, but you do see a lot of that on the web platform in general.
- CLI workflow for compilation/build/deployment is now there and works smoothly. VS Code extensions for a bit of intellisense without requiring a full IDE (if that's the way you work).
The thing I enjoy most about modern C# is the depth/levels of progressive enhancement you can do.
Let's say in the first instance, you write a proof of concept algorithm using basic concepts like List<T>, foreach, stream writing. Accessible to a beginner, safe code, but it'll churn memory (which is GC'd) and run using scalar CPU instructions.
Depending on your requirements you can then progressively enhance the memory churn, or the processing speed:
Eventually getting to a point where it's nearly the same as the best C code you could write, with no memory churn (or stop-the-world GC), and SIMD over all CPU cores for blisteringly fast performance, whilst keeping the all/most of the safety.
this performance aspects is interesting. So time to try C# again. I;m learning Zig for some of those reasons, but also because the language has a small scope and the language features will be smaller
If you are leaning towards Zig, I don't think C# will be what you are looking for. It is a good option along with Java and Go, but not in Zig, C, Rust territory if that is what you want.
Yes! I’m glad to see this pointed out - when working on UIs, I regularly move them between 3 monitors with varying resolution & DPI. 4k @ 200%, 2K at 125%, and 2K at 100%. This reveals not only design issues but application stack issues with DPI support.
The thing I enjoy most about C# is the depth/levels of progressive enhancement you can do.
Let's say in the first instance, you write a proof of concept algorithm using basic concepts like List<T>, foreach, stream writing. Accessible to a beginner, safe code, but it'll churn memory (which is GC'd) and run using scalar CPU instructions.
Depending on your requirements you can then progressively enhance the memory churn, or the processing speed:
Eventually getting to a point where it's nearly the same as the best C code you could write, with no memory churn (or stop-the-world GC), and SIMD over all CPU cores for blisteringly fast performance, whilst keeping the all/most of the safety.
I think these new language features have the same virtue - I can opt into them later, and intellisense/analysers will optionally make me aware that they exist.
I have occasionally, just for fun, written benchmarks for some algorithm in C++ and an equivalent C# implementation, them tried to bring the managed performance in line with native using the methods you mention and others. I'm always surprised by how often I can match the performance of the unmanaged code (even when I'm trying to optimize my C++ to the limit) while still ending up with readable and maintainable C#.
Does this include the GC at the end of it all? Because if that happens after the end timestamp it's not an exact comparison. I read something once about speeding up a C/C++ compiler by simply turning free into a no-op. Such a compiler basically allocates more and more data and only frees it all at the end of execution, so then doing all the free calls is just wasted CPU cycles.
That is the promise we could already have had in the 1990's with languages like Eiffel, Oberon and Modula-3, and it has taken us about 30 years to finally become mainstream.
C# is not the only one offering these kind of capabilities, still big kudos to the team, and the .NET performance improvements blog posts are a pleasure to read.
Yes, totally agree.
The 2nd thing I found it great for was to explain errors, it either finds the exact solution, or sparked a thought that lead to the answer.
The positions of both evangelists and luddites seems mad to me, there's too much emotion involved in those positions for what amounts to another tool in the toolbox that should only be used in appropriate situations.
26^3 is only 17576 potential TLAs. Some collisions are to be expected, especially when you take into account some phrases are more likely to occur than others (such as CP for Control Panel, Certified Professional, Control Protocol, Comprehensive Plan, Child Process ...)
I really like this.
Did you investigate the options regarding CO2 sensors?
I'm interested to know if you compared SCD30 to SCD41? The dual-channel design of SCD30 is supposed to offer lower drift and longer stability compared to the SCD41 (which claims to need taking outside once a week). That's the deal on paper, I'm wondering if you got any real data on this.
Thanks! The SCD30 is a great sensor and obviously better than the SCD41. But we did not look at it in more detail, as we chose the SCD41 primarily for its small size. We believe that an accuracy of +/-50ppm is enough for a device like the Air Lab. Also, we'll actively look into reminding the user to take the device outside if automatic calibration is used. On top of that, it's our plan to either factory calibrate the devices and/or offer manual recalibration that should extend beyond the 1-week interval with automatic calibration.
Maybe contact Sensirion about the STCC4, better form factor and due for release any moment...They have a contact sales option and are usually helpful.
https://sensirion.com/products/catalog/STCC4
(Not sure but thought maybe used in the SEN66 which is available now and covers most things including PPM/CO2/VOC/NOx)
This SDK update moving the RP2040 baseline to 200MHz is very welcome for the projects where you can't have an egregious overclock over manufacturer approved spec.
For hobby projects, I've achieved significant overclocks on both RP2040 and RP2350. I suspect this is mostly due to the use of TSMC's 40NM 40LP process which is a smaller process node than most microcontrollers use.
There doesn't seem to be a communication mechanism that has minimal memcopy or no serialization/deserialization, the security boundary makes this difficult.
I have a backend array of 10M i16 points, I want to get this into the frontend (with scale & offset data provided via side channel to the compute shader).
As it stands, I currently process on the backend and send the frontend a bitmap or simplified SVG. I'm curious to know about the opposite approach.
reply