More

isusmelj · 2025-06-03T05:58:12 1748930292

Is there anything like this also supporting other GPUs? Thinking of Apple Silicon or embedded ones in phones etc.

isusmelj · 2025-04-22T17:48:43 1745344123

As someone in Europe, I sometimes wonder what’s worse: letting US companies use my data to target ads, or handing it to Chinese companies where I have no clue what’s being done with it. With one I at least get an open source model. The other is a big black box.

credit_guy · 2025-04-22T19:27:59 1745350079

Both are bad. If Europe does not develop local alternatives to ChatGpt or DeepSeek, it will (slowly) lose its sovereingty.

ryoshoe · 2025-04-22T19:38:41 1745350721

Europe is developing local alternative models such as Mistral

fragmede · 2025-04-22T17:59:33 1745344773

They're not open source. It's nice of Meta and Deepseek to offer up their models for download, but that doesn't make them open source.

chvid · 2025-04-22T18:29:04 1745346544

Hard to be fully open source if you train on copyrighted material.

Anyway. Deepseek is the most open of the sota models.

MoonGhost · 2025-04-22T20:44:39 1745354679

Did they open their datasets already? Would be nice to have 'thinking' part.

anonym29 · 2025-04-22T18:34:23 1745346863

Isn't this a bit of semantic lawyering? Open model weights are not the same as open source in a literal sense, but I'd go so far as to suggest that open model weights fulfill much of the intent / "soul" of the open source movement. Would you disagree with that notion?

lxgr · 2025-04-22T18:55:11 1745348111

> open model weights fulfill much of the intent / "soul" of the open source movement

Absolutely not. The intent of the open source movement is sharing methods, not just artifacts, and that would require training code and methodology.

A binary (and that's arguably what weights are) you can semi-freely download and distribute is just shareware – that's several steps away from actual open source.

There's nothing wrong with shareware, but calling it open source, or even just "source available" (i.e. open source with licensing/usage restrictions), when it isn't, is disingenuous.

MoonGhost · 2025-04-22T20:51:08 1745355068

> The intent of the open source movement is sharing methods, not just artifacts, and that would require training code and methodology.

That's not enough. The key point was trust. When executable can be verified by independent review and rebuild. It it cannot be rebuilt it can be virus, troyan, backdoor, etc. For LLMs there is no way to reproduce, thus no way to verify them. So, they cannot be trusted and we have to trust producers. It's not that important when models are just talking, but with tools use it can be a real damage.

lxgr · 2025-04-22T21:19:10 1745356750

Hm, I wouldn't say that that's the key point of open software. There are many open source projects that don't have reproducible builds (some don't even offer any binary builds), and conversely there is "source available" software with deterministic builds that's not freely licensed.

On top of that, I don't think it works quite that way for ML models. Even their creators, with access to all training data and training steps, are having a very hard time reasoning about what these things will do exactly for a given input without trying it out.

"Reproducible training runs" could at least show that there's not been any active adversarial RHLF, but seem prohibitively expensive in terms of resources.

MoonGhost · 2025-04-23T01:23:12 1745371392

Well, 'open source' is interpreted in different ways. I think the core idea is it can be trusted. You can get Linux distribution and recompile every component except for the proprietary drivers. With that being done by independent groups you can trust it enough to run bank's systems. The other options are like Windows where you have to trust Microsoft and their supply chain.

There are different variations, of course. Mostly related to the rights and permissions.

As for big models even their owners, having all the hardware and training data and code, cannot reproduce them. Model may have some undocumented functionality pretrained or added in post process, and it's almost impossible to detect without knowing the key phrase. It can be a harmless watermark or something else.

anonym29 · 2025-04-22T21:14:58 1745356498

But there is also no publicly known way to implant unwanted telemetry, backdoors, or malware into modern model formats either (which hasn't always been true of older LLM model formats), which mitigates at least one functional concern about trust in this case, no?

It's not quite like executing a binary in userland - you're not really granting code execution to anyone with the model, right? Perhaps there is some undisclosed vulnerability in one or more of the runtimes, like llama.cpp, but that's a separate discussion.

lxgr · 2025-04-22T21:21:37 1745356897

The biggest problem is arguably at a different layer: These models are often used to write code, and if they write code containing vulnerabilities, they don't need any special permissions to do a lot of damage.

It's "reflections on trusting trust" all the way down.

anonym29 · 2025-04-23T01:57:49 1745373469

If people who cannot read code well enough to evaluate whether or not it is secure are using LLM's to generate code, no amount of model transparency will solve the resulting problems. At least not while LLM's still suffer from the the major problems they have, like hallucinations, or being wrong (just like humans!).

Whether the model is open source, open weight, both, or neither has essentially zero impact on this.

Manabu-eo · 2025-04-22T20:14:54 1745352894

I saw the argument that the source code is the preferred base to make changes and modifications in software, but in the case of those large models, the weights themselves are the preferred way.

It's much easier and cheap to make a finetune or LoRA than to train from scratch to adapt it to your use case. So it's not quite like source vs binary in software.

_aavaa_ · 2025-04-22T19:02:19 1745348539

Meta models do not, they have use restrictions. At least deepseek does not.

fragmede · 2025-04-22T19:49:10 1745351350

It does not and I totally disagree with that. Unless we can see the code that goes into the model to stop of from telling me how to make cocaine, it's not the same sort of soul.

MoonGhost · 2025-04-22T20:58:49 1745355529

> With one I at least get an open source model. The other is a big black box.

It doesn't matter much as in both cases provider has access to you ins and outs. The only question is if you trust company operating the model. (yes, you can run local model, but it's not that capable)

mrkramer · 2025-04-22T19:16:10 1745349370

US is capitalistic liberal democracy and China is one party capitalistic dictatorship. Make your choice.

PartiallyTyped · 2025-04-22T21:12:44 1745356364

The US tends towards dictatorship; due process is an afterthought, people disappearing off the streets, citizens getting arrested at the border for nothing, tourists getting deported over minute issues such as an iffy hotel booking, and that's just off the top of my head from the last 2 days.

mr90210 · 2025-04-22T19:23:41 1745349821

You make it seem so binary. If you do enough research on the US you might change your mind. YES, I would still choose the US.

isusmelj · 2025-04-15T17:59:23 1744739963

Thanks for the kind words, joelio182! Glad you see the value in making SSL more practical for real-world domain shift issues.

As liopeer mentioned, we have results for medical (DeepLesion) and agriculture (DeepWeeds) in the blog post. We haven't published specific benchmarks on satellite or industrial inspection data yet, but those are definitely the kinds of niche domains where pretraining on specific unlabeled data should yield significant benefits. We're keen to explore more areas like these.

Our goal is exactly what you pointed out - bridging the gap between SSL research and practical application where labels are scarce. Appreciate the encouragement!

isusmelj · 2025-04-15T15:10:25 1744729825

Hi HN, I’m Igor, co-founder of Lightly AI (https://www.lightly.ai/).

We just released LightlyTrain, a new open-source Python package (AGPL-3.0, free for research and educational purpose) for self-supervised pretraining of computer vision models: https://github.com/lightly-ai/lightly-train

Standard vision models pretrained on generic datasets like ImageNet or COCO often underperform on specific domains (e.g., medical, agriculture, autonomous driving). Fine-tuning helps, but performance is limited, and getting enough labeled data is expensive and slow.

LightlyTrain uses self-supervised learning (SSL) to pretrain models directly on your own unlabeled images or videos. This adapts the model to your specific visual domain before fine-tuning, leading to significantly better performance with less labeled data.

Key Features:

- No Labels Needed: Pretrain using your existing unlabeled image data.

- Better Performance: Consistently outperforms training from scratch and ImageNet-pretrained weights, especially in low-data regimes and domain-specific tasks (benchmarks in README/blog). We see gains across detection, classification, and segmentation.

- Domain Adaptation: Tailor models to your specific industry (manufacturing, healthcare, retail, etc.).

- Supports Popular Models: Works out-of-the-box with YOLO (v5-v12), RT-DETR, ResNet, ViTs, etc., integrating with frameworks like Ultralytics, TIMM, Torchvision.

- Easy to Use & Scalable: Simple pip install, minimal code to start, scales to millions of images, runs fully on-premise (single/multi-GPU). We built this because while SSL research is mature, making it easily accessible and effective for industry computer vision teams was hard. LightlyTrain aims to bridge that gap.

We’ve benchmarked it on COCO, BDD100K (driving), DeepLesion (medical), and DeepWeeds (agriculture), showing strong improvements over baselines (details in the repo/blog post linked below). For example, on COCO with only 10% labels, LightlyTrain pretraining boosted YOLOv8-s mAP by +14% over ImageNet weights and +34% over no pretraining.

- GitHub Repo: https://github.com/lightly-ai/lightly-train

- Docs: https://docs.lightly.ai/train

- Detailed Blog Post/Benchmarks: https://www.lightly.ai/blog/introducing-lightly-train

- Quick Demo Video: https://youtu.be/5Lmry1k_cA8

We’re here to answer any questions! Happy to discuss the tech, benchmarks, or use cases. Commercial licenses are also available for businesses needing different terms.

Sonnigeszeug · 2025-04-15T17:01:20 1744736480

Cool! We use yolo and have good success after labeling 1k of images but i'm happy to try it out.

Does AGPL mean i can't use my model for my image detection or does it mean i can't use your software if i would want to provide finetuning service (which i don't want to).

isusmelj · 2025-04-15T18:00:19 1744740019

Hi Sonnigeszeug, great that you're looking into LightlyTrain!

We designed LightlyTrain specifically for production teams who need a robust, easy-to-use pretraining solution without getting lost in research papers. It builds on learnings from our MIT-licensed research framework, LightlySSL (github.com/lightly-ai/lightly), but is tailored for scalability and ease of integration.

For commercial use where the AGPL terms might not fit your needs, we offer straightforward commercial licenses for LightlyTrain. Happy to chat more if that's relevant for you!

Sonnigeszeug · 2025-04-16T08:06:03 1744790763

Hey!

we are a small startup and using our own model.

I would do a benchmark to see if its worth it but whats your pricing?

And do i now assume AGPL stands in your case for training internally?

UncleEntity · 2025-04-15T15:59:39 1744732779

> AGPL-3.0, free for research and educational purpose...

...or any other purpose allowable under the AGPL like, wait for it, commercial purposes.

isusmelj · 2025-04-15T18:01:00 1744740060

You're right, UncleEntity, thanks for highlighting that. My phrasing could have been clearer. AGPL does allow various uses, including commercial, provided its terms are met.

Our intention with LightlyTrain (AGPL/Commercial license option) is to offer a streamlined, production-ready pretraining engine. This contrasts with our other library, LightlySSL (github.com/lightly-ai/lightly), which is MIT-licensed and geared towards researchers needing flexible building blocks.

We found many companies wanted a simpler "it just works" solution for pretraining, which is why LightlyTrain exists with its specific licensing options tailored for commercial teams alongside the AGPL.

Thanks again for the clarification!

isusmelj · 2025-03-30T07:31:39 1743319899

I’ve been playing around with SIMD since uni lectures about 10 years ago. Back then I started with OpenMP, then moved to x86 intrinsics with AVX. Lately I’ve been exploring portable SIMD for a side project where I’m (re)writing a Numpy-like library in Rust, mostly sticking to the standard library. Portable SIMD has been super helpful so far.

I’m on an M-series MacBook now but still want to target x86 as well, and without portable SIMD that would’ve been a headache.

If anyone’s curious, the project is here: https://github.com/IgorSusmelj/rustynum. It's just a learning exercise for learning Rust, but I’m having a lot of fun with it.

isusmelj · 2025-01-31T19:52:14 1738353134

Does anyone know why GPT4 has knowledge cutoff December 2023 and all the other models (newer ones like 4o, O1, O3) seem to have knowledge cutoff October 2023? https://platform.openai.com/docs/models#o3-mini

I understand that keeping the same data and curating it might be beneficial. But it sounds odd to roll back in time with the knowledge cutoff. AFAIK, the only event that happened around that time was the start of the Gaza conflict.

kikki · 2025-01-31T19:57:34 1738353454

I think trained knowledge is less and less important - as these multi-modal models have the ability to search the web and have much larger context windows.

isusmelj · 2024-10-16T20:23:52 1729110232

I think the results show that just in general the compute is not used well. That the CPU took 8.4ms and GPU took 3.2ms shows a very small gap. I'd expect more like 10x - 20x difference here. I'd assume that the onnxruntime might be the issue. I think some hardware vendors just release the compute units without shipping proper support yet. Let's see how fast that will change.

Also, people often mistake the reason for an NPU is "speed". That's not correct. The whole point of the NPU is rather to focus on low power consumption. To focus on speed you'd need to get rid of the memory bottleneck. Then you end up designing your own ASIC with it's own memory. The NPUs we see in most devices are part of the SoC around the CPU to offload AI computations. It would be interesting to run this benchmark in a infinite loop for the three devices (CPU, NPU, GPU) and measure power consumption. I'd expect the NPU to be lowest and also best in terms of "ops/watt"

AlexandrB · 2024-10-16T20:33:14 1729110794

> Also, people often mistake the reason for an NPU is "speed". That's not correct. The whole point of the NPU is rather to focus on low power consumption.

I have a sneaking suspicion that the real real reason for an NPU is marketing. "Oh look, NVDA is worth $3.3T - let's make sure we stick some AI stuff in our products too."

pclmulqdq · 2024-10-17T04:28:30 1729139310

The correct way to make a true "NPU" is to 10x your memory bandwidth and feed a regular old multicore CPU with SIMD/vector instructions (and maybe a matrix multiply unit).

Most of these small NPUs are actually made for CNNs and other models where "stream data through weights" applies. They have a huge speedup there. When you stream weights across data (any LLM or other large model), you are almost certain to be bound by memory bandwidth.

sounds · 2024-10-17T16:46:53 1729183613

Apple Silicon is surprisingly a good approach here -

   * On CPU: SIMD NEON
   * On CPU: custom matrix multiply accelerator, separate from SIMD unit
   * On CPU package: NPU
   * GPU

Then they go and hide it all in proprietary undocumented features and force you to use their framework to access it :c

bee_rider · 2024-10-17T06:17:00 1729145820

I’m sure we’ll get GPNPU. Low precision matvecs could be fun to play with.

touisteur · 2024-10-17T10:47:28 1729162048

SHAVE from MOVIDIUS was fun, before Intel bought them out.

hedgehog · 2024-10-17T15:45:40 1729179940

Did they become un-fun? There are a bunch on the new Intel CPUs.

touisteur · 2024-10-17T19:33:05 1729193585

Most of the toolchain got hidden behind openvino and there was no hardware released for years. Keembay was 'next year' for years. I have some code for DSP using it that I can't use anymore. Has Intel actually released new shave cores, with an actual dev environment ? I'm curious.

hedgehog · 2024-10-17T20:08:13 1729195693

The politics behind the software issues are complex. At least from the public presentation the new SHAVE cores are not much changed besides bigger vector units. I don't know what it would take to make a lower level SDK available again but it sure seems like it would be useful.

Spooky23 · 2024-10-17T02:46:38 1729133198

Microsoft needs to throw something in the gap to slow down MacBook attrition.

The M processors changed the game. My teams support 250k users. I went from 50 MacBooks in 2020 to over 10,000 today. I added zero staff - we manage them like iPhones.

pjmlp · 2024-10-17T05:24:19 1729142659

Microsoft has indeed a problem, however only in countries whose people can afford Apple level prices, and not everyone is a G7 citizen.

jocaal · 2024-10-17T06:18:49 1729145929

Microsoft is slowly being squeezed from both sides of the market. Chromebooks have silently become wildly popular on the low end. The only advantage I see windows have is corporate and gaming. But valve is slowly chopping away at the gaming advantage as well.

pjmlp · 2024-10-17T06:20:31 1729146031

Chromebooks are no where to be seen outside US school market.

Coffe shops, trains and airports in Europe? Nope, rare animal on tables.

European schools? Most countries parents buy their kids a computer, and most often it is a desktop used by the whole family, or a laptop of some kind running Windows, unless we are talking about the countries where buying Apple isn't an issue on the monthly expenses.

Popular? In Germany, the few times they get displayed on shopping mall stores, they get rountinely discounted, or bundled with something else, until finally they get rid of them.

Valve is heavily dependent on game studios producing Windows games.

vrighter · 2024-10-21T06:05:57 1729490757

I have never seen, much less interacted with, a chromebook. I don't think they're as popular as you think, in a lot of not-usa

cj · 2024-10-17T03:20:45 1729135245

Rightly so.

The M processor really did completely eliminate all sense of “lag” for basic computing (web browsing, restarting your computer, etc). Everything happens nearly instantly, even on the first generation M1 processor. The experience of “waiting for something to load” went away.

Not to mention these machines easily last 5-10 years.

morsch · 2024-10-17T06:20:50 1729146050

It's fine. For basic computing, my M3 doesn't feel much faster than my Linux desktop that's like 8 years old. I think the standard for laptops was just really, really low.

thanksgiving · 2024-10-17T10:08:30 1729159710

> I think the standard for laptops was just really, really low.

As someone who used windows laptops, I was amazed when I saw someone sitting next to me on a public transit subway on her MacBook Pro editing images on photoshop with just her trackpad. The standard for windows laptops used to be that low (about ten or twelve years ago?) that seeing a MacBook trackpad just woke someone is a part of my permanent memory.

thesuitonym · 2024-10-17T13:35:24 1729172124

I don't understand the hype around Apple trackpads. 15 years ago, sure, there was a huge gulf of difference, but today? The only difference that I can see or fee, at least between lenovo or dell and apple, is that the mac trackpad is physically larger.

nxobject · 2024-10-17T03:29:34 1729135774

As a very happy M1 Max user (should've shelled out for 64GB of RAM, though, for local LLMs!), I don't look forward to seeing how the Google Workspace/Notions/etc. of the world somehow reintroduce lag back in.

bugbuddy · 2024-10-17T04:02:09 1729137729

The problem for Intel and AMD is they are stuck with an OS that ships with a lag-inducing Anti-malware suite. I just did a simple git log and it took 2000% longer than usual because the Antivirus was triggered to scan and run a simulation on each machine instruction and byte of data accessed. The commit log window stayed blank waiting to load long enough for me to complete another tiny project. It always ruin my day.

alisonatwork · 2024-10-17T05:17:11 1729142231

Pro tip: turn off malware scanning in your git repos[0]. There is also the new Dev Drive feature in Windows 11 that makes it even easier for developers (and IT admins) to set this kind of thing up via policies[1].

In companies where I worked where the IT team rolled out "security" software to the Mac-based developers, their computers were not noticeably faster than Windows PCs at all, especially given the majority of containers are still linux/amd64, reflecting the actual deployment environment. Meanwhile Windows also runs on ARM anyway, so it's not really something useful to generalize about.

[0] https://support.microsoft.com/en-us/topic/how-to-add-a-file-...

[1] https://learn.microsoft.com/en-us/windows/dev-drive/

bugbuddy · 2024-10-17T05:48:09 1729144089

Unfortunately, the IT department people think they are literal GODs for knowing how to configure Domain Policies and lock down everything. They even refuse to help or even answer requests for help when there are false positives on our own software builds that we cannot unmark as false positives. These people are proactively antagonistic to productivity. Management could not careless…

lynx23 · 2024-10-17T07:49:03 1729151343

Nobody wants to be resonsible for giving allowing exceptions in security-matters. Its far easier to ignore the problems at hand, then to risk being wrong just once.

thesuitonym · 2024-10-17T13:44:19 1729172659

They don't think they're gods, they just think you're an idiot. This is not to say that you are, or even that they believe YOU individually are an idiot, it's just that users are idiots.

There are also insurance, compliance, and other constraints that IT folks have that make them unwilling to turn off scanning for you.

cj · 2024-10-18T17:24:55 1729272295

> they just think you're an idiot.

To be fair, the average employee doesn’t have much more than idiot-level knowledge when it comes to security.

The majority of employees would rather turn off automatic OS updates simply because it’s a hassle to restart your computer because god forbid they you loose those 250 chrome tabs waiting for you to never get around to revisiting!

xxs · 2024-10-17T18:08:05 1729188485

they are allowed to do that for the folks that produce the goods of course, it just makes a lot harder to retain the said idiots.

xxs · 2024-10-17T06:07:35 1729145255

the short answer is that you can't without the necessary permissions, and even if you do - the next roll out will wipe out your changes.

So the pro-part of the tip does not apply.

On my own machines anti-virus is one the very first things to be removed. Most of the time I'd turn off all the swap file, yet Windows doesn't overcommit and certain applications are notorious for allocating memory w/o even using it.

zdw · 2024-10-17T05:04:22 1729141462

This is most likely due to corporate malware.

Even modern macs can be brought to their knees by something that rhymes with FrowdStrike Calcon and interrupts all IO.

djur · 2024-10-17T05:28:41 1729142921

Oh, just work for a company that uses Crowdstrike or similar. You'll get back all the lag you want.

n8cpdx · 2024-10-17T05:53:10 1729144390

Chrome managed it. Not sure how since Edge still works reasonably well and Safari is instant to start (even faster than system settings, which is really an indictment of SwiftUI).

ddingus · 2024-10-17T04:38:07 1729139887

I have a first gen M1 and it holds up very nicely even today. I/O is crazy fast and high compute loads get done efficiently.

One can bury the machine and lose very little basic interactivity. That part users really like.

Frankly the only downside of the MacBook Air is the tiny storage. The 8GB RAM is actually enough most of the time. But general system storage with only 1/4 TB is cramped consistently.

Been thinking about sending the machine out to one of those upgrade shops...

lynguist · 2024-10-17T05:15:28 1729142128

Why did you buy a 256GB device for personal use in the first place? Too good of a deal? Or saving these $400 for upgrades for something else?

112233 · 2024-10-17T09:30:44 1729157444

Not OP, but by booting M1 from external thunderbolt nvme you lose less than 50% of benchmark disk throughput (3GB/s is still ridiculously fast), can buy 8TB drive for less than 1k, plus can boot it on another M1 mac if something happens. If there was "max mem, min disk" model, would def get that.

ddingus · 2024-10-17T22:58:14 1729205894

Interesting. You know I bought one of those USB 3 port expanders from TEMU and it is excellent! (I know, TEMU right? But it was so cheap!)

I could 3d print a couple of brackets and probably lodge a bigger SSD or the smaller form factor eMMC I think and pack it all into a little package one just plugs in. The port extender is currently shaped such that it fits right under the Air tilting it nicely for general use.

The Air only has external USB... still, I don't need to boot from it. The internal one can continue to do that. Storage is storage for most tasks.

ddingus · 2024-10-17T22:51:18 1729205478

I got it for a song. Literally a coupla hundred bucks a few months after release.

So yeah, great deal. And I really wanted to run the new CPU.

Frankly, I can do more and generally faster than I would expect running on those limited resources. It has been a quite nice surprise.

For a lot of what I do, the RAM and storage are enough.

bzzzt · 2024-10-17T05:41:31 1729143691

Depends on the application as well. Just try to start up Microsoft Teams.

wkat4242 · 2024-10-18T01:12:42 1729213962

In our company we see the opposite. 5 years ago all the devs wanted Mac instead of Linux. Now they want to go back.

I think part of the reason is that we manage Mac pretty strictly now but we're getting there with Linux too.

We also tried to get them to use WSL 1 and 2 but they just laugh at it :) And point at its terrible disk performance and other dealbreakers. Can't blame them.

itishappy · 2024-10-16T20:42:39 1729111359

I assume you're both right. I'm sure NPUs exist to fill a very real niche, but I'm also sure they're being shoehorned in everywhere regardless of product fit because "AI big right now."

wtallis · 2024-10-16T22:25:14 1729117514

Looking at it slightly differently: putting low-power NPUs into laptop and phone SoCs is how to get on the AI bandwagon in a way that NVIDIA cannot easily disrupt. There are plenty of systems where a NVIDIA discrete GPU cannot fit into the budget (of $ or Watts). So even if NPUs are still somewhat of a solution in search of a problem (aka a killer app or two), they're not necessarily a sign that these manufacturers are acting entirely without strategy.

brookst · 2024-10-17T02:06:56 1729130816

The shoehorning only works if there is buyer demand.

As a company, if customers are willing to pay a premium for a NPU, or if they are unwilling to buy a product without one, it is not your place to say “hey we don’t really believe in the AI hype so we’re going to sell products people don’t want to prove a point”

MBCook · 2024-10-17T02:36:02 1729132562

Is there demand? Or do they just assume there is?

If they shove it in every single product and that’s all anyone advertises, whether consumers know it will help them or not, you don’t get a lot of choice.

If you want the latest chip, you’re getting AI stuff. That’s all there is to it.

Terr_ · 2024-10-17T03:31:30 1729135890

"The math is clear: 100% of our our car sales come from models with our company logo somewhere on the front, which shows incredible customer desire for logos. We should consider offering a new luxury trim level with more of them."

"How many models to we have without logos?"

"Huh? Why would we do that?"

MBCook · 2024-10-17T03:56:23 1729137383

Heh. Yeah more or less.

To some degree I understand it, because as we’ve all noticed computers have pretty much plateaued for the average person. They last much longer. You don’t need to replace them every two years anymore because the software isn’t out stripping them so fast.

AI is the first thing to come along in quite a while that not only needs significant power but it’s just something different. It’s something they can say your old computer doesn’t have that the new one does. Other than being 5% faster or whatever.

So even if people don’t need it, and even if they notice they don’t need it, it’s something to market on.

The stuff up thread about it being the hotness that Wall Street loves is absolutely a thing too.

ddingus · 2024-10-17T04:40:47 1729140047

That was all true nearly 10 years ago. And it has only improved. Almost any computer one finds these days is capable of the basics.

bdd8f1df777b · 2024-10-17T02:43:14 1729132994

There are two kinds of buyer demands: product, buyers, and the stock buyers. The AI hype can certainly convince some of the stock buyers.

Spooky23 · 2024-10-17T02:56:14 1729133774

Apple will have a completely AI capable product line in 18 months, with the major platforms basically done.

Microsoft is built around the broken Intel tick/tick model of incremental improvement — they are stuck with OEM shitware that will take years to flush out of the channel. That means for AI, they are stuck with cloud based OpenAI, where NVIDIA has them by the balls and the hyperscalers are all fighting for GPU.

Apple will deliver local AI features as software (the hardware is “free”) at a much higher margin - while Office 365 AI is like $400+ a year per user.

You’ll have people getting iPhones to get AI assisted emails or whatever Apple does that is useful.

hakfoo · 2024-10-17T04:41:08 1729140068

We're still looking for "that is useful".

The stuff they've been trying to sell AI to the public with is increasingly looking as absurd as every 1978 "you'll store your recipes on the home computer" argument.

AI text became a Human Centipede story: Start with a coherent 10-word sentence, let AI balloon it into five pages of flowery nonsense, send it to someone else, who has their AI smash it back down to 10 meaningful words.

Coding assistance, even as spicy autocorrect, is often a net negative as you have to plow through hallucinations and weird guesses as to what you want but lack the tools to explain to it.

Image generation is already heading rapidly into cringe territory, in part due to some very public social media operations. I can imagine your kids' kids in 2040 finding out they generated AI images in the 2020s and looking at them with the same embarrassment you'd see if they dug out your high-school emo fursona.

There might well be some more "closed-loop" AI applications that make sense. But are they going to be running on every desktop in the world? Or are they going to be mostly used in datacentres and purpose-built embedded devices?

I also wonder how well some of the models and techniques scale down. I know Microsoft pushed a minimum spec to promote a machine as Copilot-ready, but that seems like it's going to be "Vista Basic Ready" redux as people try to run tools designed for datacentres full of Quadro cards, or at least high-end GPUs, on their $299 HP laptop.

jjmarr · 2024-10-17T04:51:39 1729140699

Cringe emo girls are trendy now because the nostalgia cycle is hitting the early 2000s. Your kid would be impressed if you told them you were a goth gf. It's not hard to imagine the same will happen with primitive AIs in the 40s.

defrost · 2024-10-17T05:09:52 1729141792

Early 2000's ??

Bela Lugosi Died in 1979, and Peter Murphy was onto his next band by 1984.

By 2000 Goth was fully a distant dot in the rear view mirror for the OG's

    In 2002, Murphy released *Dust* with Turkish-Canadian composer and producer Mercan Dede, which utilizes traditional Turkish instrumentation and songwriting, abandoning Murphy's previous pop and rock incarnations, and juxtaposing elements from progressive rock, trance, classical music, and Middle Eastern music, coupled with Dede's trademark atmospheric electronics.

https://www.youtube.com/watch?v=Yy9h2q_dr9k

https://en.wikipedia.org/wiki/Bauhaus_(band)

djur · 2024-10-17T05:31:14 1729143074

I'm not sure what "gothic music existed in the 1980s" is meant to indicate as a response to "goths existed in the early 2000s as a cultural archetype".

defrost · 2024-10-17T05:39:03 1729143543

That Goths in 2000's were at best second wave nostalgia cycle of Goths from the 1980s.

That people recalling Goths in that period should beware of thinking that was a source and not an echo.

In 2006 Noel Fielding's Richmond Felicity Avenal was a basement dwelling leftover from many years past.

bee_rider · 2024-10-17T06:14:19 1729145659

True Goth died our way before any of that. They totally sold out when the sacked Rome, the gold went to their heads and everything since then has been nostalgia.

defrost · 2024-10-17T06:19:43 1729145983

That was just the faux life Westside Visigoths .. what'd you expect?

#Ostrogoth #TwueGoth

carlob · 2024-10-17T06:17:10 1729145830

There was a submission here a few months ago about the various incarnations of goth starting from the late Roman empire.

https://www.the-hinternet.com/p/the-goths

defrost · 2024-10-17T06:25:24 1729146324

Was there? This one: https://news.ycombinator.com/item?id=41232761 ?

Nice: https://www.youtube.com/watch?v=VZvSqgn_Zf4

HelloNurse · 2024-10-17T14:42:18 1729176138

I expect this sort of thing to go out of fashion and/or be regulated after "AI" causes some large life loss, e.g. starting a war or designing a collapsing building.

Spooky23 · 2024-10-17T12:54:40 1729169680

The product isn’t released, so I don’t think we know what is or isn’t good.

People are clearly finding LLM tech useful, and we’re barely scratching the surface.

nxobject · 2024-10-17T04:22:36 1729138956

I hope that once they get a baseline level of AI functionality in, they start working with larger LLMs to enable some form of RAG... that might be their next generational shift.

wkat4242 · 2024-10-18T01:15:47 1729214147

> while Office 365 AI is like $400+ a year per user

And I'm pretty sure this is only Introductory pricing. As people get used to it and use it more it won't cover the cost. I think they rely on the gym model currently; many people not using the ai features much. But eventually that will change. Also, many companies figured that out and pull the copilot license from users that don't use it enough.

im3w1l · 2024-10-17T05:47:44 1729144064

Until AI chips become abundant, and we are not there yet, cloud AI just makes too much sense. Using a chip constantly vs using it 0.1% of the time is just so many orders of magnitude better.

Local inference does have privacy benefits. I think at the moment it might make sense to send most of queries to a beefy cloud model, and send sensitive queries to a smaller local one.

justahuman74 · 2024-10-17T04:21:45 1729138905

Who is getting $400/y of value from that?

xp84 · 2024-10-19T02:51:03 1729306263

Apple hasn’t shipped any ai features besides betas. I trust the people responsible for the useless abomination that is Siri to deliver a useful ai tool as much as I would trust Joe Biden to win a breakdancing competition.

brookst · 2024-10-20T15:29:30 1729438170

Well fortunately for all of us the people delivering client side ML today are totally different from the people who implemented a server side rule base assistant 10 years ago.

xp84 · 2024-10-25T15:53:15 1729871595

The fact that they couldn’t deliver something even approaching kindergarten levels of understanding a year ago makes me worry that either zero of the people who know what they’re doing in present-day “AI” work at Apple, or, plenty of great minds do but they can’t get anything done because Apple’s management is too conservative to release something that would be vastly more powerful than Siri but might possibly under certain circumstances hurt someone’s feelings or otherwise embarrass Apple.

Nothing would make me happier than to finally be wrong betting against “Siri,” though.

conradev · 2024-10-17T03:26:02 1729135562

The real consumers of the NPUs are the operating systems themselves. Google’s TPU and Apple’s ANE are used to power OS features like Apple’s Face ID and Google’s image enhancements.

We’re seeing these things in traditional PCs now because Microsoft has demanded it so that Microsoft can use it in Windows 11.

Any use by third party software is a lower priority

shermantanktop · 2024-10-17T14:40:50 1729176050

That’s how we got an explosion of interesting hardware in the early 80s - hardware companies attempting to entice consumers by claiming “blazing 16 bit speeds” or other nonsense. It was a marketing circus but it drove real investments and innovation over time. I’d hope the same could happen here.

kmeisthax · 2024-10-16T20:41:43 1729111303

You forget "Because Apple is doing it", too.

rjsw · 2024-10-16T22:49:11 1729118951

I think other ARM SoC vendors like Rockchip added NPUs before Apple, or at least around the same time.

acchow · 2024-10-16T23:17:47 1729120667

I was curious so looked it up. Apple's first chip with an NPU was the A11 bionic in Sept 2017. Rockchip's was the RK1808 in Sept 2019.

j16sdiz · 2024-10-17T01:11:41 1729127501

Google TPU was introduced around same time as apple. Basically everybody knew it can be something around that time, just don't know exactly how

Someone · 2024-10-17T05:29:10 1729142950

https://en.wikipedia.org/wiki/Tensor_Processing_Unit#Product... shows the first one is from 2015 (publicly announced in 2016). It also shows they have a TDP of 75+W.

I can’t find TDP for Apple’s Neural Engine (https://en.wikipedia.org/wiki/Neural_Engine), but the first version shipped in the iPhone 8, which has a 7 Wh battery, so these are targeting different markets.

GeekyBear · 2024-10-17T00:48:32 1729126112

Face ID was the first tent pole feature that ran on the NPU.

bdd8f1df777b · 2024-10-17T02:43:55 1729133035

Even if it were true, they wouldn’t have the same influence as Apple has.

Dalewyn · 2024-10-17T01:18:58 1729127938

There are no nerves in a neural processing unit, so yes: It's 300% bullshit marketing.

brookst · 2024-10-17T02:11:51 1729131111

Neural is an adjective. Adjectives do not require their associated nouns to be present. See also: digital computers have mo fingers at all.

-mlv · 2024-10-17T02:38:47 1729132727

I always thought 'digital' referred to numbers, not fingers.

bdd8f1df777b · 2024-10-17T02:46:11 1729133171

The derivative meaning has been use so widely that it has surpassed its original one in usage. But it doesn’t change the fact that it originally refers to the fingers.

jcgrillo · 2024-10-17T01:26:52 1729128412

Maybe the N secretly stands for NFT.. Like the tesla self driving hardware only smaller and made of silicon.

WithinReason · 2024-10-17T06:48:39 1729147719

yeah I'm not sure being 1% utilised helps power consumption

theresistor · 2024-10-16T23:26:50 1729121210

> Also, people often mistake the reason for an NPU is "speed". That's not correct. The whole point of the NPU is rather to focus on low power consumption.

It's also often about offload. Depending on the use case, the CPU and GPU may be busy with other tasks, so the NPU is free bandwidth that can be used without stealing from the others. Consider AI-powered photo filters: the GPU is probably busy rendering the preview, and the CPU is busy drawing UI and handling user inputs.

cakoose · 2024-10-17T00:01:45 1729123305

Offload only makes sense if there are other advantages, e.g. speed, power.

Without those, wouldn't it be better to use the NPUs silicon budget on more CPU?

mapt · 2024-10-17T11:19:45 1729163985

For PC CPUs, there are already so many watts per square millimeter that many of the top tiers of the recent generations are running thermally throttled 24/7; More cooling improves performance rather than temperatures because it allows more of the cores to run at 'full' speed or at 'boost' speed. This kills their profitable market segmentation.

In this environment it makes some sense to use more efficient RISC cores, and to spread out cores a bit with dedicated bits that either aren't going to get used all the time, or that are going to be used at lower power draws, and combining cores with better on-die memory availability (extreme L2/L3 caches) and other features. Apple even has some silicon in the power section left as empty space for thermal reasons.

Emily (formerly Anthony) on LTT had a piece on the Apple CPUs that pointed out some of the inherent advantages of the big-chip ARM SOC versus the x86 motherboard-daughterboard arrangement as we start to hit Moore's Wall. https://www.youtube.com/watch?v=LFQ3LkVF5sM

theresistor · 2024-10-17T01:52:32 1729129952

If you know that you need to offload matmuls, then building matmul hardware is more area efficient than adding an entire extra CPU. Various intermediate points exist along that spectrum, e.g. Cell's SPUs.

heavyset_go · 2024-10-17T00:09:41 1729123781

More CPU means siphoning off more of the power budget on mobile devices. The theoretical value of NPUs is power efficiency on a limited budget.

avianlyric · 2024-10-17T01:58:57 1729130337

Not really. To get extra CPU performance that likely means more cores, or some other general compute silicon. That stuff tends to be quite big, simply because it’s so flexible.

NPUs focus on one specific type of computation, matrix multiplication, and usually with low precision integers, because that’s all a neural net needs. That vast reduction in flexibility means you can take lots of shortcuts in your design, allowing you cram more compute into a smaller footprint.

If you look at the M1 chip[1], you can see the entire 16-Neural engine has a foot print about the size of 4 performance cores (excluding their caches). It’s not perfect comparison, without numbers on what the performance core can achieve in terms of ops/second vs the Neural Engine. But it seems reasonable to be that the Neural Engine and handily outperform the performance core complex when doing matmul operations.

[1] https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...

kmeisthax · 2024-10-16T20:41:21 1729111281

> I think some hardware vendors just release the compute units without shipping proper support yet

This is Nvidia's moat. Everything has optimized kernels for CUDA, and maybe Apple Accelerate (which is the only way to touch the CPU matrix unit before M4, and the NPU at all). If you want to use anything else, either prepare to upstream patches in your ML framework of choice or prepare to write your own training and inference code.

noduerme · 2024-10-17T10:17:45 1729160265

I'm not sure why this is a moat. Isn't it just a matter of translation from CUDA to some other instruction set? If AMD or someone else makes cheaper hardware that does the same thing, it doesn't seem like a stretch for them to release a PyTorch patch or whatever.

david-gpu · 2024-10-17T11:56:09 1729166169

Most of the computations are done inside NVidia proprietary libraries, not open-source CUDA. And if you saw what goes inside those libraries, I think you would agree that it is a substantial moat.

theGnuMe · 2024-10-17T15:47:00 1729180020

There are clean room approaches like AMDs and Scale.

caeril · 2024-10-17T16:29:34 1729182574

Geohot has multiple (and ongoing) rants about the sheer instability of AMD RDNA3 drivers. Lisa Su engaged directly with him on this, and she didn't seem to give a shit about their problems.

AMD is not taking ML applications seriously, outside of their marketing hype.

fvv · 2024-10-17T21:44:19 1729201459

Rdna3 is not cdna

david-gpu · 2024-10-17T19:45:03 1729194303

Are you suggesting that Scale can take cuDNN kernels and run them at anything resembling peak performance on AMD GPUs?

Because functional compatibility is hardly useful if the performance is not up to par, and cuDNN will run specific kernels that are particularly tuned to not only a specific model of GPU, but also to the specific inputs that the user is submitting. NVidia is doing a ton of work behind the scenes to both develop high-performance kernels for their exact architecture, but also to know which ones are best for a particular application.

This is probably the main reason why I was hesitant to join AMD a few years ago and to this day it seems like it was a good decision.

blharr · 2024-10-17T15:34:21 1729179261

Sure you can probably translate rough code and get something that "works" but all the thousands of small optimizations that are baked in are not trivial to just translate.

noduerme · 2024-10-18T06:13:44 1729232024

I like the take that small optimizations, taken together, amount to a moat. I feel like this could be a profoundly understated paradigm.

spookie · 2024-10-16T23:28:00 1729121280

I've been building an app in pure C using onnxruntime, and it outperforms a comparable one done with python by a substancial amount. There are many other gains to be made.

(In the end python just calls C, but it's pretty interesting how much performance is lost)

dacryn · 2024-10-17T08:09:29 1729152569

agree there, but then again using ort in Rust is faster again.

You cannot compare python with a onxx executor.

I don't know what you used in Python, but if it's pytorch or similar, those are built with flexibility in mind, for optimal performance you want to export those to onxx and use whatever executor that is optimized for your env. onxxruntime is one of them, but definitely not the only one, and given it's from Microsoft, some prefer to avoid it and choose among the many free alternatives.

rerdavies · 2024-10-17T08:53:02 1729155182

Why would the two not be entirely comparable? PyTorch may be slower at building the models; but once the model is compiled and loaded on the NPU, there's just not a whole lot of Python involved anymore. A few hundred CPU cycles to push the input data using python; a few hundred CPU cycles to receive the results using python. And everything in-between gets executed on the NPU.

noduerme · 2024-10-17T10:21:09 1729160469

I really wish Python wasn't the language controlling all the C code. You need a controller, in a scripting language that's easy to modify, but it's a rather hideous choice. It would be like choosing to build the world's largest social network in PHP or something. lol.

robertlagrant · 2024-10-17T10:46:35 1729161995

> it's a rather hideous choice

Why?

rerdavies · 2024-10-23T02:27:35 1729650455

Be careful for what you wish for.

I've just spent a week writing Neural net code in C++, so i have direct insight into what a C++ implementation might look like.

Much as I dislike python, and having to deal with endless runtime errors when your various inputs and outputs are mismatched, the inconvenience pales in comparison with having to wade through three pages of error messages that a C++ compiler generates when you have a single mismatched templated Eigen matrix with the incorrect dimension. The "Required from here" message you are actually interested in is typically the 3rd or fourth "Required from here", amongst a massive stack of cascading errors, each of which wraps to about 4 lines when displayed. You know what I mean. Sometimes you don't get a "Required from here" at all, which is horrifying. And it's infuriating to find and parse the template arguments of classes involved.

Debugging Python runtime errors is kind of horrible, and rubs me the wrong way on principle. But it is sweetness and light compared to debugging C++ compile-time error messages, which are unimaginably horrible.

The project: converting a C++ Neural Amp Modeller LV2 plugin to use fixed-size matrices (Eigen::Matrix<float,N,M>) instead of dynamic matrixes (Eigen::MatrixXf) to see if doing so would improve performance. (It does. Signficantly). So a substantial and realistic experiment in doing ML work in C++. Not directly comparable to working in Pytorch, but directly analogous in that it involves hooking up high-level ML constructs, like Conv1D, LayerT, WaveNet ML chunks.

johnny22 · 2024-10-17T21:16:56 1729199816

isn't that the case? Which then became a dialect of php with a custom interpreter (and then compiler) as they scaled.

noduerme · 2024-10-18T06:11:19 1729231879

Yes, that was the case. I was being sarcastic. Zuck wrote facebook in PHP and spent millions of dollars then writing a custom interpreter to let his janky code run slightly faster than normal.

rerdavies · 2024-10-23T02:31:45 1729650705

Zuck's obvious mistake: he should have written the PHP compiler to precompile chunks of GPU code that would be the only code that actually runs when serving web pages. </sarcasm>

Facebook isn't really a comparable problems, because ALL of the performance-critical code in PyTorch does runs on a GPU.

godelski · 2024-10-16T23:21:13 1729120873

They definitely aren't doing the timing properly, but also what you might think is timing is not what is generally marketed. But I will say, those marketed versions are often easier to compare. One such example is that if you're using GPU then have you actually considered that there's an asynchronous operation as part of your timing?

If you're naively doing `time.time()` then what happens is this

  start = time.time() # cpu records time
  pred = model(input.cuda()).cuda() # push data and model (if not already there) to GPU memory and start computation. This is asynchronous
  end = time.time() # cpu records time, regardless of if pred stores data

You probably aren't expecting that if you don't know systems and hardware. But python (and really any language) is designed to be smart and compile into more optimized things than what you actually wrote. There's no lock, and so we're not going to block operations for cpu tasks. You might ask why do this? Well no one knows what you actually want to do. And do you want the timer library now checking for accelerators (i.e. GPU) every time it records a time? That's going to mess up your timer! (at best you'd have to do a constructor to say "enable locking for this accelerator") So you gotta do something a bit more nuanced.

If you want to actually time GPU tasks, you should look at cuda event timers (in pytorch this is `torch.cuda.Event(enable_timing=True)`. I have another comment with boilerplate)

Edit:

There's also complicated issues like memory size and shape. They definitely are not being nice to the NPU here on either of those. They (and GPUs!!!) want channels last. They did [1,6,1500,1500] but you'd want [1,1500,1500,6]. There's also the issue of how memory is allocated (and they noted IO being an issue). 1500 is a weird number (as is 6) so they aren't doing any favors to the NPU, and I wouldn't be surprised that this is a surprisingly big hit considering how new these things are

And here's my longer comment with more details: https://news.ycombinator.com/item?id=41864828

artemisart · 2024-10-17T00:49:15 1729126155

Important precision: the async part is absolutely not python specific, but comes from CUDA, indeed for performance, and you will have to use cuda events too in C++ to properly time it.

For ONNX the runtimes I know of are synchronous as we don't do each operation individually but whole models at once, there is no need for async, the timings should be correct.

godelski · 2024-10-17T01:13:39 1729127619

Yes, it isn't python, it is... hardware. Not even CUDA specific. It is about memory moving around and optimization (remember, even the CPUs do speculative execution). I say a little more in the larger comment.

I'm less concerned about the CPU baseline and more concerned about the NPU timing. Especially given the other issues

fennecfoxy · 2024-10-17T13:29:09 1729171749

I think it's definitely possibly now (or very soon) for an LLM to write native GPU/NPU code to get itself to run on different hardware.

hulitu · 2024-10-17T15:30:15 1729179015

> The whole point of the NPU is rather to focus on low power consumption

You know which chip has the lowest power consumption ? The one which is turned off. /s

isusmelj · 2024-08-23T07:43:18 1724398998

I think that is similar to what Yann LeCun outlined: https://bdtechtalks.com/2022/03/07/yann-lecun-ai-self-superv...

isusmelj · on April 19, 2024

Don't forget that this is 24k H100. They are getting 10x the compute: https://www.cnbc.com/2024/01/18/mark-zuckerberg-indicates-me...

So gpt-4 level 8B models running on phones and notebooks seems feasible within the next 5 years. I imaging having (voice) assistans running locally. Crazy how fast we progress.

isusmelj · on April 1, 2024

Is there somewhere an overview of the progress we made on the software side for training and inference of LLMs? It feels like we squeezed 10-100x more out of the hardware since llama appeared. This crazy progress will probably saturate though as we reach theoretical limits, no?