more pornel's comments | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit | more pornel's comments

login

pornel 57 days ago | parent | context | [–] | on: EU Energy labelling will apply to phones and table...

I wonder if we get some malicious workarounds for this, like re-releasing the same phone under different SKUs to pretend they were different models on sale for a short time.

Y-bar 57 days ago | | [–]

Maybe, but the regulation tries to prevent this by separating "models" from "batches" from "individual items" and defaults to "model" when determining compatibility. Worth noting is that each new model requires a separate filing for both EcoDesign and other certifications like CE which could help reduce workarounds like model number inflation.

pornel 58 days ago | parent | context | | [–] | on: Pixel is a unit of length and area

We commonly use hardware like LCDs and printers that render a sharp transition between pixels without the Gibbs' phenomenon. CRT scanlines were close to an actual 1D signal (but not directly controlled by the pixels, which the video cards still tried to make square-ish), but AFAIK we've never had a display that is a 2D signal that we assume in image processing.

In signal processing you have a finite number of samples of an infinitely precise contiguous signal, but in image processing you have a discrete representation mapped to a discrete output. It's contiguous only when you choose to model it that way. Discrete → contiguous → discrete conversion is a useful tool in some cases, but it's not the whole story.

There are images designed for very specific hardware, like sprites for CRT monitors, or font glyphs rendered for LCD subpixels. More generally, nearly all bitmap graphics assumes that pixel alignment is meaningful (and that has been true even in the CRT era before the pixel grid could be aligned with the display's subpixels). Boxes and line widths, especially in GUIs, tend to be designed for integer multiples of pixels. Fonts have/had hinting for aligning to the pixel grid.

Lack of grid alignment, an equivalent of a phase shift that wouldn't matter in pure signal processing, is visually quite noticeable at resolutions where the hardware pixels are little squares to the naked eye.

grandempire 58 days ago | | [–]

I think you are saying there are other kinds of displays which are not typical monitors and those displays show different kinds of images - and I don’t disagree.

pornel 57 days ago | | | [–]

I'm saying "digital images" are captured by and created for hardware that has the "little squares". This defines what their pixels really are. Pixels in these digital images actually represent discrete units, and not infinitesimal samples of waveforms.

Since the pixels never were a waveform, never were sampled from such signal (even light in camera sensors isn't sampled along these axis), and don't get displayed as a 2D waveform, the pixels-as-points model from the article at the top of this thread is just an arbitrary abstract model, but it's not an accurate representation of what pixels are.

pornel 58 days ago | parent | context | | [–] | on: Apple and Meta fined millions for breaching EU law

If Apple is so bad at this that they have to charge 30%, they should have failed in the free market to a competitor that can do the same or better for 3%. However, Apple has prevented that, not by being better or cheaper, but by implementing DRM that locks users out from having a choice (and the market as a whole ended up being a duopoly with cartel-like pricing).

Whether Apple can be cheaper isn't really the point (they should be, digital services are a very high margin business). It's that they're anti-competitive to the point that the market for paid apps and in-app payments became inefficient (in a financial sense).

pornel 60 days ago | parent | context | | [–] | on: Pipelining might be my favorite programming langua...

Rust has such open extensibility through traits. The prime example is Itertools that already adds a bunch of extra pipelining helper methods.

pornel 61 days ago | parent | context | | [–] | on: Gemma 3 QAT Models: Bringing AI to Consumer GPUs

It is due to the risk of a leak.

Laundering of data through training makes it a more complicated case than a simple data theft or copyright infringement.

Leaks could be accidental, e.g. due to an employee logging in to their free-as-in-labor personal account instead of a no-training Enterprise account. It's safer to have a complete ban on providers that may collect data for training.

6510 61 days ago | | [–]

Their entire business model based on taking other peoples stuff. I cant imagine someone would willingly drown with the sinking ship if the entire cargo is filled with lifeboats - just because they promised they would.

vbezhenar 61 days ago | | | [–]

How can you be sure that AWS will not use your data to train their models? They got enormous data, probably most data in the world.

simonw 61 days ago | | | [–]

Being caught doing they would be wildly harmful to their business - billions of dollars harmful, especially given the contracts they sign with their customers. The brand damage would be unimaginably expensive too.

There is no world in which training on customer data without permission would be worth it for AWS.

Your data really isn't that useful anyway.

mdp2021 60 days ago | | | [–]

> Your data really isn't that useful anyway

? One single random document, maybe, but as an aggregate, I understood some parties were trying to scrape indiscriminately - the "big data" way. And if some of that input is sensitive, and is stored somewhere in the NN, it may come out in an output - in theory...

Actually I never researched the details of the potential phenomenon - that anything personal may be stored (not just George III but Random Randy) -, but it seems possible.

simonw 60 days ago | | | [–]

There's a pretty common misconception that training LLMs is about loading in as much data as possible no matter the source.

That might have been true a few years ago but today the top AI labs are all focusing on quality: they're trying to find the best possible sources of high quality tokens, not randomly dumping in anything they can obtain.

Andrej Karpathy said this last year: https://twitter.com/karpathy/status/1797313173449764933

> Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it's not even clear how prior LLMs learn anything at all.

mdp2021 60 days ago | | | [–]

Obviously the training data should be preferably high quality - but there you have a (pseudo-, I insisted also elsewhere citing the rights to have read whatever is in any public library) problem with "copyright".

If there exists some advantage on quantity though, then achieving high quality imposes questions about tradeoffs and workflows - sources where authors are "free participants" could have odd data sip in.

And the matter of whether such data may be reflected in outputs remains as a question (probably tackled by some I have not read... Ars longa, vita brevis).

pornel 61 days ago | parent | context | | [–] | on: Things Zig comptime won't do

There is a stark contrast in usability of self-contained/owning types vs types that are temporary views bound by a lifetime of the place they are borrowing from. But this is an inherent problem for all non-GC languages that allow saving pointers to data on the stack (Rust doesn't need lifetimes for by-reference heap types). In languages without lifetimes you just don't get any compiler help in finding places that may be affected by dangling pointers.

This is similar to creating a broadly-used data structure and realizing that some field has to be optional. Option<T> will require you to change everything touching it, and virally spread through all the code that wanted to use that field unconditionally. However, that's not the fault of the Option syntax, it's the fault of semantics of optionality. In languages that don't make this "miserable" at compile time, this problem manifests with a whack-a-mole of NullPointerExceptions at run time.

With experience, I don't get this "oh no, now there's a lifetime popping up everywhere" surprise in Rust any more. Whether something is going to be a temporary view or permanent storage can be known ahead of time, and if it can be both, it can be designed with Cow-like types.

I also got a sense for when using a temporary loan is a premature optimization. All data has to be stored somewhere (you can't have a reference to data that hasn't been stored). Designs that try to be ultra-efficient by allowing only temporary references often force data to be stored in a temporary location first, and then borrowed, which doesn't avoid any allocations, only adds dependencies on external storage. Instead, the design can support moving or collecting data into owned (non-temporary) storage directly. It can then keep it for an arbirary lifetime without lifetime annotations, and hand out temporary references to it whenever needed. The run-time cost can be the same, but the semantics are much easier to work with.

pornel 65 days ago | parent | context | | [–] | on: TLS certificate lifetimes will officially reduce t...

Browsers check the identity of the certificates every time. The host name is the identity.

There are lots of issues with trust and social and business identities in general, but for the purpose of encryption, the problem can be simplified to checking of the host name (it's effectively an out of band async check that the destination you're talking to is the same destination that independent checks saw, so you know your connection hasn't been intercepted).

You can't have effective TLS encryption without verifying some identity, because you're encrypting data with a key that you negotiate with the recipient on the other end of the connection. If someone inserts themselves into the connection during key exchange, they will get the decryption key (key exchange is cleverly done that a passive eavesdropper can't get the key, but it can't protect against an active eavesdropper — other than by verifying the active participant is "trusted" in a cryptographic sense, not in a social sense).

pornel 65 days ago | parent | context | | [–] | on: TLS certificate lifetimes will officially reduce t...

I copy the same certbot account settings and private key to all servers and they obtain the certs themselves.

It is a bit funny that LetsEncrypt has non-expiring private keys for their accounts.

pornel 65 days ago | parent | context | | [–] | on: TLS certificate lifetimes will officially reduce t...

DANE is a TLS with too-big-to-fail CAs that are tied to the top-level domains they own, and can't be replaced.

Separation between CAs and domains allows browsers to get rid of incompetent and malicious CAs with minimal user impact.

ryao 64 days ago | | [–]

DANE lets the domain owner manage the certificates issued for the domain.

pornel 63 days ago | | | [–]

This delegation doesn't play the same role as CAs in WebPKI.

Without DNSSEC's guarantees, the DANE TLSA records would be as insecure as self-signed certificates in WebPKI are.

It's not enough to have some certificate from some CA involved. It has to be a part of an unbroken chain of trust anchored to something that the client can verify. So you're dependent on the DNSSEC infrastructure and its authorities for security, and you can't ignore or replace that part in the DANE model.

pornel 66 days ago | parent | context | | [–] | on: A flowing WebGL gradient, deconstructed

Mixing of colors in an "objective" way like blur (lens focus) is a physical phenomenon, and should be done in linear color space.

Subjective things, like color similarity and perception of brightness should be evaluated in perceptual color spaces. This includes sRGB (it's not very good at it, but it's trying).

Gradients are weirdly in the middle. Smoothness and matching of colors are very subjective, but color interpolation is mathematically dubious in most perceptual color spaces, because √(avg(a+b)) ≠ avg(√(a) + √(b))

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact