More

ack_complete · 2025-11-16T20:29:34 1763324974

Cold branch comes to mind -- something like a interrupt handler, that is run often enough but not in high enough bursts.

ack_complete · 2025-11-10T05:57:10 1762754230

> I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?

If a project has a stated policy that code written with an LLM-based aid is not accepted, then it shouldn't be submitted, same as with anything else that might be prohibited. If you attempt to circumvent this by hiding it and it is revealed that you knowingly did so in violation of the policy, then it would be unsurprising for you to receive a harsh reply and/or ban, as well as a revert if the PR was committed. This would be the same as any other prohibition, such as submitting code copied from another project with an incompatible license.

You could argue that such a blanket ban is unwarranted, and you might be right. But the project maintainers have a right to set the submission rules for their project, even if it rules out high-quality LLM assisted submissions. The right way to deal with this is to ask the project maintainers if they would be willing the adjust the policy, not to try to slip such code into the project anyway.

ack_complete · 2025-10-25T18:21:15 1761416475

This was a late addition to the Windows 11 supported CPU list. The rumor is that this happened after it was pointed out that Microsoft was still selling brand new Surface Studio 2 devices that had 7th gen Intel CPUs.

ack_complete · 2025-10-19T08:24:59 1760862299

The video overlays in question are not drawn by blending into a framebuffer in memory. They're two separate display planes that are read in parallel by the display path, scaled, and blended together at scan-out time. There are only reads, no writes. Modern GPUs support alpha-blended display planes using the alpha channel that is otherwise often required to exist anyway as padding.

As OP noted, using hardware display planes can have efficiency advantages for cases like floating controls over a video or smoothly animating a plane over a static background, since it avoids an extra read+write for the composited image. However, it also has some quirks -- like hardware bandwidth limits on how small a display plane can be scaled.

ahartmetz · 2025-10-19T19:02:07 1760900527

Yeah sorry, you're right of course: hardware planes are directly scanned out without going through the main framebuffer.

ack_complete · 2025-10-19T08:13:45 1760861625

A green stripe on the right/bottom is usually due to a different issue: interpolation errors in chroma planes when decoding YCbCr video. The chroma planes use a biased encoding where 0 (no color) is mapped to 128. A sloppy YCbCr to RGB conversion without proper clamping can interpolate against 0 at edges, which is interpreted as max negative chroma red/blue -- which combined together produces green. This can happen either due to an incorrectly padded texture or failing to handle the special final sample case for 4:2:2 chroma.

This issue can happen with overlays, but also non-overlay GPU drawing or CPU conversion routines.

ack_complete · 2025-10-18T19:14:56 1760814896

Some do, some ARM-based devices have tightly coupled memory (TCM). The RP2040 in the original Raspberry Pi also has a 4K bank for each core intended for stack and per-core variables, though it is not limited to access only by that core.

The main disadvantage of such dedicated memory is inefficient usage compared to using that same amount of fast local memory to cache _all_ of main memory.

ack_complete · 2025-10-17T03:25:00 1760671500

While true in the general case, in this case the bilinear downscaling pass is being applied to a signal already previously upsampled using a known specific filter (box filter). Aliasing is therefore more limited and controlled.

CyberDildonics · 2025-10-17T13:54:28 1760709268

A filter is a weighted average and a box filter treats all pixels the same. Upscaling with a box filter barely makes sense because it will either end up sampling a single pixel (impulse filter) or blurring the image even more than normal upscaling.

"Bilinear downscaling" also doesn't make sense because scaling an image down means doing a weighted average of the multiple pixels going into a single pixel. Pixels being weighted linearly based on distance would be a triangle filter.

Aliasing is therefore more limited and controlled.

Aliasing doesn't need to happen at all with a reasonable filter width. If someone is interpolating between four pixels, that's a triangle filter with four samples.

ack_complete · 2025-10-15T01:51:42 1760493102

This is partially due to the compromises of mappingvector intrinsics into C (with C++ only being marginally better). In a more vector-oriented language, such as shader languages, this:

  s1 = vaddq_u32(s1, vextq_u32(z, s1, 2));
  s1 = vaddq_u32(s1, vdupq_laneq_u32(s0, 3));

would be more like this:

  s1.xy += s1.zw;
  s1 += s0.w;

mananaysiempre · 2025-10-15T03:11:51 1760497911

To be fair, even in standard C11 you can do a bit better than the CPU manufacturer’s syntax

  #define vaddv(A, B) _Generic((A),
      int8x8_t:    vaddv_s8((A), (B)),
      uint8x8_t:   vaddv_u8((A), (B)),
      int8x16_t:   vaddvq_s8((A), (B)),
      uint8x16_t:  vaddvq_u8((A), (B)),
      int16x4_t:   vaddv_s16((A), (B)),
      uint16x4_t:  vaddv_u16((A), (B)),
      int16x8_t:   vaddvq_s16((A), (B)),
      uint16x8_t:  vaddvq_u16((A), (B)),
      int32x2_t:   vaddv_s32((A), (B)),
      uint32x2_t:  vaddv_u32((A), (B)),
      float32x2_t: vaddv_f32((A), (B)),
      int32x4_t:   vaddvq_s32((A), (B)),
      uint32x4_t:  vaddvq_u32((A), (B)),
      float32x4_t: vaddvq_f32((A), (B)),
      int64x2_t:   vaddvq_s64((A), (B)),
      uint64x2_t:  vaddvq_u64((A), (B)),
      float64x2_t: vaddvq_f64((A), (B)))

while in GNU C you can in fact use normal arithmetic and indexing (but not swizzles) on vector types.

ack_complete · 2025-10-14T03:25:52 1760412352

The difference between interpreters and simple JITs has narrowed partly due to two factors: better indirect branch predictors with global history, and wider execution bandwidth to absorb the additional dispatch instructions. Intel CPUs starting with Haswell, for instance, show less branch misprediction impact due to better ability to predict jump path patterns through the interpreter. A basic jump table no longer suffers as much compared to tail-calling/dispatch or a simple splicing JIT.

ack_complete · 2025-10-13T18:57:41 1760381861

There was an unreal lack of awareness when the Windows Ink team engaged Reddit on this issue -- talking about "legacy" apps when such apps included the latest release version of Photoshop.

WillAdams · 2025-10-13T23:51:04 1760399464

Yeah, it was quite disheartening.

I really wish Apple would go ahead and make a replacement for my Newton MessagePad....

As it is, rPi cyberdeck seems the best option --- waiting on a Soulcircuit Pilet from Kickstarter, and am considering swapping an rPi 5 into my Raspad v3 tablet shell for the nonce.