As I understand it the sanctioned way of sharing code added to UE is to fork it on github and publish changes to your fork.
Being a fork it will only be available to other people in the Epic Games github org which is only people who have agreed to Epic's licensing terms, and your modified engine remains under that same license.
Starts with building micrograd to build an understanding of how pytorch understands how to calculate gradients, then it proceeds all the way to making a gpt-2 clone.
Looks like this is an effort to reorganize and build on that existing work.
The extension needs to be signed by mozilla for the normal production builds of firefox to let you load it on startup. If it isn't signed, you need to manually load it in using about:debugging each time you restart firefox.
Mozilla is not preventing from signing anything here (and the "security checks" on who can sign are so weak it might as well not exist in the first place).
Same applies to Chrome as well by that logic; it allows you to sideload unverified extensions too at the cost of annoyingly making you set it up at every startup.
That's you're pedantically language-lawyering my post while not engaging with the far greater falsehood that the previous poster was perpetuating is not a good look.
And the reality is Mozilla can always block any extension they want. They can just change the Firefox source code. It doesn't matter what functionality does or doesn't exist now or what the policy they do or don't have – everything can always be changed. That's true for almost anything.
So what they "could do" is a complete distraction in the first place because the "could do" anything. What they ARE doing matters.
No, pointing out that your claims are conceptually false is a fine look.
It's not about things Mozilla could theoretically do to block you, it's that they require you to proactively get their permission to run an extension (in a prod version of the browser on an ongoing basis, which I think is reasonable table stakes). Here's their official docs for self-distribution, i.e. not using the AMO at all: https://extensionworkshop.com/documentation/publish/submitti... Notice that step 1 starts with giving Mozilla your extension to approve of, step 4 goes so far as to say that if your extension doesn't pass their checks then
> The message informs you of what failed. You cannot continue. Address these issues and return to step 1.
then step 7 is make sure Mozilla reviewers can read your source code, step 9 is wait for them to get back to you, and step 13 is download the XPI that Mozilla has approved to be allowed to run in their browser.
So yes, you absolutely need Mozilla's approval to publish an extension, even if you self-publish the XPI after they've blessed it. If they do not perform the action of signing it, they don't need to change any source code, it won't install. It may be true that in this case they have given that approval, but that doesn't invalidate the general point, and this is a fundamental restriction, not "language-lawyering".
I have to disagree that I'm perpetuating any falsehood here. Mozilla literally needs to approve an addon for it to behave normally. That you are satisfied with the process they have for approving doesn't change that.
To me it seems absurd for a company that claims to be so pro-privacy to not allow any genuinely private extensions to exist. Anyone who wants to make a 'real' addon has to share their code with mozilla.
I actually mostly had the top poster in mind, not you, sorry for the confusion.
What you're saying is technically true, but also not relevant, as explained. They can have the best system in place today, and just change Firefox tomorrow. So it doesn't really matter how the system works now. This is true for anything from Mozilla to XFree86 to Redis to left-pad.
De-facto reality is that right now anyone can create an account and just create a signing key and distribute their extensions $anywhere. Approval is little more than rubber stamp. Mozilla not going around granting "approval" or anything like that.
And they certainly didn't revoke the very weak "approval" here; people can distribute and install it. It's just not listed on the Russian add-on store. So that makes it doubly irrelevant.
I'm not aware of any other openly-licensed model of comparable size to 54b. That seems like a worthwhile addition to what is already available, imo.
The closest is mixtral 8x7b but that one only uses a fraction of its parameters for each pass. This one should produce better but slower results at roughly the same memory requirement.
Mixtral 8x7B has 13B activations (2 experts/pass) on 47B weights, so not so different from the Qwen 2 MoE (14B activations on 57B weights). I'd agree that the new model is probably the new strongest option in this "middle-sized" weight class, although Yi 1.5 34B isn't bad (a dense model, so 2.4X slower inference, but also almost half the weights).
One nice thing is that all three of these models are Apache 2.0 licensed.
It is ~260GB with presumably fp16 weights. Should fit into 64GB at 3-bit quantization (~49GB).
Edit: To add to this, I've had good luck getting solid output out of mixtral 8x7b at 3-bit, so that isn't small enough to completely kill the model's quality.
Most of the distribution for this is via torrents/magnet links and in person hard drive exchanges. I'd go look at some public trackers if you want a copy and don't know someone that already has it.
Do be aware that it does include copyrighted content so distribution is piracy.
Almost all LLM training datasets include copyrighted content so almost all open source LLM distribution is piracy and almost all API based LLMs, including ChatGPT, are also piracy and copyright laundering.
Also, most image-text dataset pairs contain far worse than that. You might want to check out LAION-5B and what stanford researchers have found in there. Technically, anyone who even touched that could in theory be in some serious, serious trouble. I find it quite remarkable that nothing has happened yet.
> almost all open source LLM distribution is piracy and almost all API based LLMs, including ChatGPT, are also piracy and copyright laundering
That's an amplification of copyright, original expression is protected, but not the ideas themselves, those are free. And don't forget when we actually get to use these models we feed them questions, data, we give corrections - so they are not simply replicating the training set, they learn and do new things with new inputs.
In fact if you think deeply about it, it is silly to accuse AI of copyright violation. Copying the actual book or article is much much faster and cheaper, and exact. Why would I pay a LLM provider to generate it for me from the title and starting phrase? If I already have part of the article, do I still need to generate it with AI? it's silly. LLM regurgitation are basically attacks with special key, entrapments. They don't happen in normal use.
Models are not information archives. The size of the final model is orders of magnitude smaller than the size of the training data.
Somehow people are just not able to get this through their heads. Stable diffusion is like 12GB or something and you have people convinced it's a tool that is cutting and pasting copyrighted works from an enormous image archive.
The courts (in the US) have not found LLM model weights to be piracy, nor the outputs, but it’s really surprising that LAION was used for so long consider the content you allude to.
There exists databases of “the hash of problematic photos” (CSAM), so it seems trivial to search your billions of photos against them before training an AI. You can’t catch everything, but this seems like an obvious miss considering the explicitly tried to scrape pornography.
These hashes is exactly how researchers later discovered this content, so it’s clearly not hard.
The Stanford researchers also found a substantial number of CSAM images in the LAION-5B dataset which were not recognized by PhotoDNA, probably because the images in question were not in wide distribution prior to their inclusion in LAION.
You are uploading 5 billion examples of <something>. You cannot filter it manually, of course, because there are five billion of it. Given that it is the year 2024, how hard is it to be positive that a well-resourced team at Stanford in 2029 will not have better methods of identifying and filtering your data, or a better reference dataset to filter it against, than you do presently?
You don’t have to do it manually. There is a database of file hashes.
And this isn’t just “one engineer”. Companies like StabilityAI, Google, etc have used LAION datasets. If you built a dataset you should expend some resources on automated filtering. Don’t include explicit imagery as an intentional choice if you can’t do basic filtering.
It'll be some epic lawsuit like google-v-samsung that will get drawn out for a decade, awarded, and reduced, appealed, etc. where the only winners will be both party's lawyers.
- OpenAI and others will just settle with MPAA, RIAA and the likes for a revenue stream (a single digit billion a year, likely) + some kind of control over what people can and cannot do with the AI + the access to the technology to produce their own content.
- artists will see peanuts from the deal, and the big names are going to be able to stop doing any kind of business with artists which are just expenses in their eyes. They will have been replaced by machines that where trained using their art with no compensation whatsoever.
IP is already predatory capitalism, AI will definitely be weaponized against the workers by the owners of the means of “production”.
Also good to note that that the Pile contains lots of curated sources and recent trends have been to take curated data sources and combine them with filtered webcrawls (i.e. commoncrawl with heavy processing). See dolma or the stack v2 (for code models) as others have mentioned.
Don't see it mentioned in the post, can any of these problems exist for models in the safetensors format? Can't say I know enough about model serialization to understand exactly how much safer it is.
Everything you type in the start menu gets sent to microsoft by default and this can only be changed with Group Policy, so anyone on home edition is stuck with that.
Thanks for giving me an answer, all the other posts in this chain just seem to be people naming Microsoft products (I get it, people love bashing Microsoft...)
Being a fork it will only be available to other people in the Epic Games github org which is only people who have agreed to Epic's licensing terms, and your modified engine remains under that same license.