Hacker News new | past | comments | ask | show | jobs | submit | beeboobaa's comments login

Depending on the license, how is this legal?


In the US: Fair use disregards licenses. Fair use can be found to apply, or found not apply, by a court of law. Archival of information is generally felt to be Fair use, as is Search indexing.


The real argument is when is AI re-use of copyrighted material a violation of copyright. That is a large grey area that will probably be determined in favor of large corporations and not in favor of individuals. (As in, Disney can use AI writers to copy you, but you won't be allowed to copy Disney)


Software Heritage aggressively insists on French law... which does not have fair use.


Which bit? The archiving on SoftwareHeritage, the gathering of that data into the Stack or the subsequent training of models?


that’s a good question… There seems to be two problems.

The definition of open source depends on a license existing in a repo. Without a license it’s not legal to copy and distribute.

Public vs Private repo is a platforms issue not the code maintainers.

If a public repo does not have a license, it does not mean it free to copy and distribute.

If a private repo has an open source license like MIT, then the crawler has a right to copy and distribute that repo. Regardless if it has authorization to access the repo or not.


> Without a license it’s not legal to copy and distribute.

Yes it is. Due to both the terms you agree when you use GitHub and the general Implied License that covers everything public on the internet.

https://en.wikipedia.org/wiki/Field_v._Google,_Inc.


Looking at that ruling, it seems the case you linked to hinged on a fact not applicable with the Stack:

>Field had actual knowledge of the Googlebot. He also was aware of the ways to prevent Google from either listing his site at all or listing it but not providing a link to the cached version. Instead of opting out, however, he chose to allow Google to both index and provide a link to the cached version.

For the AI dataset, (A) did the person know their work was being collected by this group and for this purpose, and (B) did they know of a way to prevent that collection?


It is not clear to me if they are _only_ using GitHub as source. The Stack explicitly mentions they are using Software Heritage as source and Software Heritage definitely sources from repositories that are NOT stored in GitHub (and never have been).


I don’t think that “implied license” you’re referring to holds up in the courts.


Hopefully the crawler smart enough to properly handle edge cases...

e.x. if the repo has some sort of /used-licenses/ folder where the licenses for packages and the like are included, it could make a bad decision.


> Without a license it’s not legal to copy and distribute.

Is this true? When you post anything publicly, from sticking a poster on the street to making artwork like banksy, isn’t the default set to “it’s legal to copy, unless explicitly stated otherwise”?


The default in the majority of the world is that most creative works (including software code) are by-default copyrighted by the author, and the author must explicitly license away those rights. Some jurisdictions (e.g. France) put limits on what rights the author is allowed to give up. I.e., the default is it is illegal to copy (subject to exemptions like “fair use”).


Note that this archive project is French.


Banksy apparently runs a licensing program. Their artwork is most definitely under copyright, and they rely on trademark protection as well.

There is also the practical issue that a lot of content is posted publicly without consent of the copyright owner. It's simply not true that just because someone else committed a copyright violation first, you can commit further violations without impunity based on that first violation.


> If a public repo does not have a license, it does not mean it free to copy and distribute.

Whether or not it is free to copy and distribute, it should be free to copy and distribute. (My opinion is that copyright is no good; if the file is public then you should be allowed to copy and distribute it.)

> If a private repo has an open source license like MIT, then the crawler has a right to copy and distribute that repo. Regardless if it has authorization to access the repo or not.

I should not think so. The license would only apply if you have a copy of it anyways. If you are not authorized to access it because it is private, then you would have to get a copy from somewhere else, and if nobody else is providing a copy, that shouldn't give you the right to unauthorized access. However, if it has been done, then it is done, so now there is a copy, and the license (if it is a license that allows copying it in this way) would authorize you to continue to use and distribute the copy that you have.


i’m not saying I agree or care about any of it. A sane company would never allow the use of source code from a third party without a license.

If repo is forked and the license is deleted the source code would need to be hashed to verify its the exact version of an open source repo. Mainly they don’t want copyleft or “malcious” license infecting their IP

If the hashes don’t match then it’s not technically the same code, so a company can’t safely use it without a license.


If something works on OS release version 1 then it should still work on OS release version 2.

Or in apple vernacular, it should just work.


First day on the internet?


How could it be done better?


Could they be made from under spec for other applications steel?

There's not yet an awesome-coral restoration markdown README.md; or any mentions of both "MARRS" and "Reef Cubes".

Do coral prefer steel to other materials like concrete, sargassumcrete, hempcrete, sugarcrete, formed CO2, or IDK cellulose; and can you just add iron to the mix or what do coral prefer?

Can RUVs and robots deploy coral scaffolding safely at scale underwater?

What other shapes would solve?


"Researchers create green steel from toxic red mud in 10 minutes" (2024) https://newatlas.com/materials/toxic-baulxite-residue-alumin... :

> Researchers have turned the red mud waste from aluminum production into green steel

Perhaps green steel for steel coral reef scaffolding


PMTiles is great for a static background. You can easily do a layer on top with geojson like you say, or even just by storing your own data in postgres and using postgis ST_AsMVT to turn it into a vector tile layer. Stick a cachebuster in the URL & a http cache in front and you can call it a day.


AI has always meant Artificial Intelligence. Intelligent and capable of learning, like a person.

LLMs are not AI.


> LLMs are not AI.

Neither are neural networks, by that definition. Or 'machine learning' in general. They all have been called "AI" at different points in time. Even expert systems – that are glorified IF statements – they were supposed to replace doctors.


People thought those techniques would ultimately become something intelligent, so AI, but they fizzled out. That isn't the doubters moving the goalposts, that is the optimists moving the goal posts always thinking what we have now is the golden ticket to truly intelligent systems.


Correct, we don't have AI yet.


Some people are incapable of learning. Therefore, LLMs are AI?

As far as I recall, the turing test was developed long ago to give a practical answer to what was and was not practically artificial intelligence because the debate over the definition is much older than we are


Everyone is capable of learning else they'd have died as a toddler, or any time since when they tried to cross the road.


I think the Turing test is subjective, because the result depends on who was giving the test and for how long.


> Ideally developers could let the user know their caps lock key is activated.

That would be up to the User Agent (the browser), not the website.


I dream of a parallel universe where browsers took the lead in crafting innovative UI’s for standard web forms, with things like password prompts behaving intelligently, dropdowns supporting advanced autocomplete, excellent date pickers, caps lock reminders on password dialogs, etc etc.

Websites could have been simple to make with basic markup, leaving UX niceties to the browser vendors.

The world we live in is about as far from that as you get, with the stock UI for <input> elements being about par for 1992 UI toolkits, if even that.


The mobile platforms were a chance to reboot that part, and have browser do a lot more UI wise with custom handling of the different data types (dates, passwords, phone numbers, ranges etc.)

It just didn't pan out to tablets and desktop computers. But it might not be too late ?


> It just didn't pan out to tablets and desktop computers. But it might not be too late ?

Safari has a caps lock indicator on password fields on every platform, and has had it for several years - at least since Safari 12 on macOS 10.12 (circa 2018), possibly longer, but I don't have an older VM to test on.

Gecko has an open feature request for this, from 2 years ago: https://bugzilla.mozilla.org/show_bug.cgi?id=1757348

Chromium has an open feature request for this, from 3.5 years ago: https://issues.chromium.org/issues/40722752


In particular, the input select multiple is atrocious with default styling.


Well, they're not, which means web developers are picking up the slack.


Stay your lane, and if you really feel the itch go contribute to firefox or something


If you think I'm a front-end developer, you are mistaken. I just empathize with them.



Ah, so you were merely replying at me, rather than to me.


This is a public forum, we're not DMing


How would the browser know when/how to display the capslock status? It doesn't know what any given web site is doing wrt to keyboard input. Firefox adds a capslock indicator on the text cursor but not all pages use standard input fields. They might use custom UI elements, or no visual elements at all. Some sites may not even care about capslock (e.g. an arcade game).


`<input type='password'`, just like TFA was talking about?


> That would be up to the User Agent (the browser), not the website.

Relevant: Safari has done this for ages.


Maybe I don't want to have to worry about if a PWA is good enough, and will remain good enough?


It shouldn't be a problem if you only train on legally acquired data. You will know the authors name and can contact them if you so wish.


There aren't any laws that require "acquiring" something in a way that "knows the author's name".


I don't think any of the major players could do that for all their data and they are acquiring it legally.


What? How do you know the data your buying isn't AI generated by the sellers?

If they are scamming and you contact them, of course they will lie.

So how does this work?


> What's actually missing that's stopping this from working?

Proper support on all platforms. No point working on PWAs that have janky tooling (reason: see previous sentence) when they're only going to work decently on Android devices anyway.


So why would you build a native Android app if PWAs work better? There’s way more web developers than Android developers, and you would avoid the Play Store fees. Sound cheaper to me. What part of iOS is invalidating the value proposition for Android here?

You also didn’t answer what is missing. What is missing? What’s this insurmountable problem that’s solved everywhere else? Why is janky tooling attributable to Apple?


Try reading my post again, maybe? The tooling is pretty janky because no one does this yet. No point to torture yourself with janky tooling when you only get to target android anyway...


Again, not answering a thing but a making up a claim you aren't willing or able to support. How is this supposed web development tooling jankiness attributable to Apple today? Feature detection is a solved problem. What tooling are you even referring to? You aren't even trying to support this with a concrete claim. This is nonsense and you know it.

PWAs are the perfect scapegoat of infinite nebulous whining. The definition of a progressive web app might as well be "whatever Chrome has but Safari doesn't, no matter what year it is or how those features change, and no matter how terrible of an idea they might be even on Chrome".


> Proper support on all platforms. No point working on PWAs that have janky tooling (reason: see previous sentence) when they're only going to work decently on Android devices anyway.

If you need it spelled out for you:

* WebUSB

* WebBLE

* WebSerial

* WebGL

* Many more standards Apple refuses to implement because it would let developers break free of their walled garden

Without being able to target apple devices why would I, or anyone, bother using these technologies and invest in their tooling? Just make a native android app with quality tooling that's been around for a decade and be done with it.


Right, I wanted you to spell it out because I was expecting you to write that exact sort of nonsense. Your first three examples aren't even web standards they're experimental features in Chrome that not even Firefox supports. The fourth actually is supported by Safari/iOS. What missing standards are stopping you from writing Progressive WEB Apps? Be exact please.

And.... even if you wanted to build a serial-port enabled "Works only in Chrome" PWA today (lol, we both know you're not) there's no tooling jankiness stopping you from doing so, checking for `if ("serial" in navigator) { ... }` requires no tooling at all it's just plain javascript, you'd just choose to show an error message for browsers like Safari and Firefox that don't support it.

I'm not convinced you're even arguing in good faith here. Well, I never was because PWA whiners never are, but you've proven you're not.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: