More

daniel-cussen · on July 8, 2023

Well so one issue w both GPUs n CPUs which make them bad platforms for this algorithm is that, in both, FLOPS are such an important metric for sales that multiplication is highly subsidized in both those chip types. So huge amounts of area is dedicated to floating point multiplication, meaning the advantage of fgemm (the name of the algorithm is the same as the name of the company) is purely one of energy.

Which is great because if it were software it would be impossible to protect the IP. USPTO is very clear in that sense, i believe in both in re Bilski and in the Alice Corp. case which reached SCOTUS, that algorithms need to be implemented physically, typically meaning in a chip, to be patentable. So because it needs a chip to work, it is good business, if it did not it would be bad business. A chip provides every form of IP protection, all four forms, trade secret, copyright, patent, n even trademark. No other medium has that to my knowledge.

So if you have a CPU or a GPU n want it to do more work in the same amount of time, this paper promises nothing, n it keeps that promise. Nonetheless i'm advancing rapidly to the point of creating the hardware that can cut off 70% of the cost of GEMM. I considered 50% off, same thing at half the price, but it wouldn't be fair to the consumer w my economics. You see 50% discounts all the time, who cares? 70% off, you don't see that all the time. On something you actually want? Especially on a commodity, n it's still good business for me as the lowest-cost producer.

david-gpu · on July 8, 2023

> Well so one issue w both GPUs n CPUs which make them bad platforms for this algorithm is that, in both, FLOPS are such an important metric for sales that multiplication is highly subsidized in both those chip types. So huge amounts of area is dedicated to floating point multiplication, meaning the advantage of fgemm (the name of the algorithm is the same as the name of the company) is purely one of energy.

I'm having trouble understanding this. Are you saying that GPUs invest area on floating-point multipliers because FLOPS are an important marketing metric? The only thing that mattered to us was: how can we make these operations faster within the area and power constraints we have? Reducing energy consumption was thus a major goal.

I wish you luck. If I were in your shoes, I would approach NVidia or Google -- and expect to be hammered with tough questions.

daniel-cussen · on July 8, 2023

> Are you saying that GPUs invest area on floating-point multipliers because FLOPS are an important marketing metric?

Yes. That is precisely what i'm saying. If i'm mistaken in saying that that's one thing, but as far as it being what i'm saying, it very much is. It's been an important guiding principle for some time now in the project, that recent chips--including FPGA's--tend to have hard IP for floating point multiplication.

Now spending a lot of chip area on getting more FLOPS is not necessarily a bad decision if there is no alternative for achieving fast matrix multiplication. Almost any method is sensible if there was no better alternative available when the decision to use that method was made. In addition, fgemm only really makes sense when matrices contain over 1000 elements per row or column, not sure how much more than 1000 per vector but more than that. Small and in particular small and dense matrices are still best multiplied exactly the way GPUs multiply them, with many floating-point multiplier circuits in parallel. It's not stupid in the least.

Yeah so NVidia n Google have the same business model i'm going for, Google having TPUs in its datacenters that do work that cannot be reverse engineered. Google does not sell TPUs. You can use them by sending Google the work, and you'll benefit from much lower cost and faster speed. NVidia has a similar offering, just not as well-known. That's the correct business model in my analysis, and what fgemm will sell. Sell the work.

david-gpu · on July 8, 2023

> Google having TPUs in its datacenters that do work that cannot be reverse engineered

Help me understand: TPUs cannot be reverse engineered because the user doesn't have access to the physical device, but other devices like GPUs can?

Can you show some examples of reverse-engineering of GPUs that has been performed on the basis of having physical access to the dies? Are you aware of any reverse engineering done on them using other means? How much has this reverse engineering prevented e.g. NVidia from being financially successful? Finally, since patents are freely available to the public once they have been granted, does that nullify some concerns regarding reverse engineering?

I'm not an entrepeneur, so take this with a fistful of salt, but having worked at places like NVidia, I would never try to compete head to head with them, as a startup. Very few semiconductor startups achieve any success, and the ones that do start by finding a very particular market niche where the established players aren't even trying to play.

Again, I wish you good luck.

imtringued · on July 8, 2023

What about us peasants who need multiplication to actually get work done instead of playing FLOPs status games? Not everyone is bottlenecked on something as specific as matrix multiplication.

Also the claims about huge amounts of area being dedicated to multiplication are false. ALU size is mostly irrelevant.

adgjlsfhk1 · on July 8, 2023

games are like 60% matrix multiplication (especially with ray tracing)

kragen · on July 8, 2023

this is an interesting idea; in some sense rotating a point in space is only multiplying a 3-item or 4-item vector (where this idea wouldn't be useful), but rotating n points is multiplying a 3×n or 4×n matrix by the transformation vector, so if the algorithm pans out, you should be able to do that kind of stuff too; n can be pretty large

kopecs · on July 8, 2023

> A chip provides every form of IP protection, all four forms, trade secret, copyright, patent, n even trademark. No other medium has that to my knowledge.

IANAL, but I do not believe semiconductor masks are copyrightable under US law (my limited understanding is that there is essentially due to the fact that the mask is inherently functional and/or aspects of the merger doctrine). There is a separate sui generis mask work protection via 17 U.S.C. §§ 901-914.

Edit: Moreover, I'm unsure how you figure a chip itself is protected by trade secret, since reverse engineering an IC is not terribly difficult.

daniel-cussen · on July 8, 2023

I don't know why trade secret applies, but i remember reading it does. Perhaps in the rationale, or the preimage. It doesn't make all that much sense, come to think. I think Intel tried it? Intel for sure used copyright to protect chips. Hey thanks, i did not know about 17 U.S.C. §§ 901-914.

kragen · on July 8, 2023

mask works

daniel-cussen · on July 8, 2023

Hi HN, this paper is my first proper academic publication, it's on arxiv only for now--this is a pre-print--but is being considered for publication by peer-reviewed journals concurrently. Open-access journals, of course.

I'm totally disinterested in tenure or academic recognition. For my goals being a Stanford dropout is better than any other amount of academic recognition. So i don't care about journals uh prestige numbers the impact factors i know that term but anything paywalled is bad for what i do care about, which is my business, fgemm. Means Fast/Faster/Fastest GEneral Matrix-Matrix multiplication. gemm is an acronym already used in BLAS libraries, Basic Linear Algebra Subprograms, which is what most of the time n money spent on ML goes to.

I'm going to be available to answer questions insofar as i can.

blast · on July 8, 2023

This is cool! How did you end up working with Ullman? I guess "being a Stanford dropout" explains how you met him, but there must be an interesting story here. Can you share how that happened and what the process was?

quickthrower2 · on July 8, 2023

Hi Daniel. Thanks for the inspiration. Something I have thought about too is sticking some papers out there without needing to go through expensive gates (PhD etc.).

daniel-cussen · on July 8, 2023

It's brutally hard. I had an easier time buying skylinesort.com n posting the skylinesort algorithm there, than publishing through professors n academia. Typically not feasible for undergraduates, least of all anybody not paying tuition. Same way professors are expected to have an undergraduate degree at the very least (4 profs at Stanford have just an undergraduate degree), a Master's degree (a handful have that and no more), but typically a PhD is required (literally all the other professors have PhD's). Is required. Who requires it? Who says, "I require a PhD."? Is expected? Who expects it? Who says, "I expect a PhD."? Passive voice is typical in academia. Very rare to get around the gatekeeping, frankly. I couldn't publish on arxiv for years because of lack of academic affiliation alone.

Took years to get to this point in terms of the effort I dedicate to getting recognition for my work.

Vasniktel · on July 8, 2023

Thanks Daniel. Could you expand on this comment? What did you have to do to be able to publish a paper on arxiv?

daniel-cussen · on July 8, 2023

At the time, i needed academic affiliation, meaning be in college or more likely have a professor vouch for me. What i ended up doing was return to Stanford undergrad n take classes related to algorithms, show my algorithm portfolio in office hours, then get referred to other profs, one of them being Jeffrey Ullman, in 2019. N then after emails back n forth we met in person in the Gates building, it went from there.

Very happy to have met Professor Jeffrey Ullman.

jmhimara · on July 8, 2023

Not sure if this is what you're talking about, but you don't typically pay to get a PhD (in fact you get paid in the US).

daniel-cussen · on July 9, 2023

Yeah n how much do you get paid, n for what? You get paid to take on professorial duties, TA'ing, lab assistant, that sort of thing usually. Pretty rough in many ways.

bishop77 · on July 11, 2023

How does this compare to Strassen's algorithm? Could you please provide a reference implementation?

unlikelymordant · on July 8, 2023

Could you adapt this to finding fast inverses?

daniel-cussen · on July 8, 2023

I looked at that, i concluded yes because the bottleneck of inverting a matrix is matrix multiplication. Spesh since fgemm targets 32-bit floating-point format, n has high accuracy (not saying how high but much better than Strassen, at least as good as naive matrix multiplication).

henistein · on July 9, 2023

Is there any python implementation of it? I would really like to try it out

daniel-cussen · on July 5, 2023

Yeah that's my business, http://fgemm.com , coming soon. Paper is coming out v soon however.

daniel-cussen · on July 5, 2023

That's the least of it. In Lisp the distinction between code n data is blurred all the time. In F18 assembly i frequently have "double entendres" which are used as code or as literals depending on the entry point. I think at least once there was code and data in the same entry point. Assembly n Lisp are both homoiconic, after all. N verb at the end of the sentence, are you transliterating German, or a two-foot green Jedi master full of wisdom?

daniel-cussen · on July 2, 2023

Worst is when they insult you by changing a legitimate name like "Polo" to the glaringly knock-off name "Pölo" after the fact, to manipulate users (this happened to me) into blame themselves for falling for such a blatant typo. Like i had to stoy using them entirely, n if they make huge piles of money, like at least they don't make that n a tiny additional amount from me. Like in many ways not crappy but very hit n miss, like too problematic.

daniel-cussen · on June 11, 2023

Yeah that was saying if you crushed n injected it would be addictive. No. It was addictive in literally any form. They knew that too, n it wasn't on the packaging.

You kind of have to be a monster to say something that callous.

fweimer · on June 11, 2023

As far as I understand it, crushing/injecting it makes it more likely to induce pronounced euphoria and other side effects that encourage abuse. But it was known from the start that it was addictive (causing a physical dependency) even if used as intended, simply because it is an opioid. Here's an older package insert:

https://www.accessdata.fda.gov/drugsatfda_docs/label/2009/02...

Among other things, it says “Physical dependence and tolerance are not unusual during chronic opioid therapy.” I'm no expert, but I think that explains why warnings about dependence weren't more pronounced. The drug was supposed to be distributed in a tightly controlled fashion due to these risks, after all.

In retrospect, the warnings at the start, and the information for patients read like recipes for abuse, though.

KnobbleMcKnees · on June 12, 2023

Both sides benefit from the average Joe not knowing the facts. This is, in a nutshell, why so much FUD exists about drug addiction.

For dealers, it gets people through the gateway. For the anti drugs crowd, it's the equivalent of preventing teenage sex by not telling them how it's done

daniel-cussen · on June 9, 2023

I think on the contrary Russians alone don't get hacked by Russians.

Russians get hacked by the Terman lab. NSA...CIA...NRLO i think...yeah those guys.

daniel-cussen · on May 8, 2023

They're fucking up on purpose. This is a request for consent that they steal from you. Microsoft intentionally, n w a criminal mind (in light of all the what in light of their stealing from me was harm they were doing in court) fucked up like oops oops whoopsie this button oh why doesn't this do it right, oh it's your job to do this this n this. No. They stole it from me in the hopes i would buy it again.

Or maybe they're shit at writing code, right? It's not either or. It's intentional. N then they whine about piracy n intellectual property, yeah.

doodlesdev · on May 8, 2023

Honestly, at this point I think it's both incompetence AND malice.

CatWChainsaw · on May 8, 2023

Incompetence IS malice.

daniel-cussen · on April 27, 2023

I consider the launch a pure unadulterated success. They ran simulations, took the precautions, this was NOT the Apollo I. N it's the nature of the business, fuck, same as the best chemists have lost fingers n accepted the nobel prize with the remaining fingers.

It's j the media hates Elon Musk now even more than ever.

penjelly · on April 27, 2023

no. its perfectly reasonable to criticize this instance. They saw it coming and ignored it, other people saw it and called it out too.

"pure unadulterated success" is just willful ignorance, there is a world where the launch goes well AND the surrounding area and equipment doesnt get wrecked

wkat4242 · on April 28, 2023

One of those that called it out before: https://news.ycombinator.com/item?id=35590279

Kapura · on April 28, 2023

Wow, this is incredible. The bit near the end that there was no deluge/flame trench turned out to be spot on.

kmbfjr · on April 28, 2023

A success until someone dies?

They broke their rocket with this plan, it is not unreasonable to assume they could have damaged it in other ways to turn a mostly controlled situation into one that sends a (very) low altitude ballistic missile into a populated area.

They need their permits pulled until they can act responsibly.

daniel-cussen · on April 26, 2023

There were fires then too.