I tried my standard "trick" question I use for LLMs:
"Give me five papers with code demonstrating the state of the art of machine learning which uses geospatial data (e.g. GeoJSON) as both input and output."
There is no such state of the art. My hand-wavey understanding is that GIS data is non-continuous, which makes it useless for transformers, and also contextual, which makes it useless for anything else. Will defer to actual ML people for better explanations.
Point is, LLMs invariably give five papers with code that don't actually exist - it's a guaranteed hallucination.
Phind was able to give me five links that do in fact exist, as well as contextual information as to why these five links were not papers with code doing ML with GIS data. This is by far the best answer to this question from an LLM I've received yet.
I don't see how this would be relevant for a code model?
The code model isn't trained to retrieve papers/articles, it's meant to complete code. Whether or not you find hallucination in a unrelated task isn't particularly interesting.
Damn, this is how I learn that HN doesn't have a block function. What a shame.
My friend, can you do me a favour and actually click the link and have a play with the app? If you do, you will discover that what you're dealing with there is an LLM. That's literally why it's being compared to other LLMs.
No idea what you were trying to achieve with this comment. "The code model isn't trained to retrieve articles." a) neither is any other LLM, what's your point? and b) the app on the other end of that URL retrieves articles - it's not even tangential to the app, it's key functionality.
Yeah, all of OpenAI's stuff gets better much quicker than I'm used to. So does the general performance ceiling of all open source models, even if individual models don't improve as much.
> the state of the art of machine learning which uses geospatial data (e.g. GeoJSON) as both input and output
> There is no such state of the art
Some GIS work uses vector data: points/lines/polygons representing features (e.g., the location of roads or the outlines of buildings), which can be stored in formats like GeoJSON or WKT.
But other work uses remote sensing data/satellite imagery that can be stored in raster formats like GeoTIFF - essentially TIFF image files with additional information stored to georeference them.
You can totally do machine learning on satellite imagery where both the input and output are geospatial data (e.g. to categorise land use - the inputs are multispectral images and the outputs can be images where the value of each pixel represents the identified land use).
You can also use machine learning for tasks like building footprint detection/delineation (e.g., [1]) based on satellite imagery. The output from such a pipeline can be a set of polygons, which could be saved as GeoJSON.
I'd consider either of theses to be examples of "machine learning which uses geospatial data (e.g. GeoJSON) as both input and output".
I use English according to defined grammar rules. It marks me as upper middle class and therefore trustworthy in administrative professional positions, greatly improving my life outcomes compared to someone who is not able to do so.
I think this is a brilliant idea and is no doubt useful for discovering variations in pronunciation as well. The first thing I tested it with was "mischievous" which notoriously has a correct - "MIS-chu-vus" - as well as an "incorrect" but very common pronunciation - "mis-CHEE-vee-us" - both of which are represented.
+1 for linguee.fr which I used extensively while I was doing my DELF B2. Unimaginably useful way to learn how words are actually used, not just grammatically but in what kinds of sentences, what's the connotative value of the word, what kind of mood does it imply. I imagine Youglish is useful for this too.
Amazing to see a "What is a roundabout?" article living in UK. The way they are put in over here you get the idea traffic engineers not only love them, they are in love with them. I largely detest them, if I'm very honest. They can be done really well, but most aren't.
Am on free version and the reported cut off is "2022" (no month is given - if I ask explicitly, it says December 2022) - I believe this in fact means December 2021 as it's not aware of any events that happened in 2022 (e.g. death of QEII) that I've tried so far.
It sounds like you want the semantic average, or, in other words, a centroid in semantic vector space. The approach I've used to do this in the past is Word2Vec, which excels at handling individual words. Word2Vec isn't going to be able to give you what you want per se but should give you somewhere to start in your search.
Incidentally this is the kind of thing LLMs are very good at. Have you tried just plugging them into ChatGPT/Bard?
Honestly this just makes me want to give GeoGuessr a go again. Back when, I jumped in and the first shot I saw was like "What? No-one could guess where this is." Now I know someone can, I realise the game is playable.
I've started writing fiction (again for the first time in many years) as a side project cos I had an idea which I think is kind of cool. Reading this has filled me with joy and I will absolutely be using all of the techniques here, especially the big "SHITE" in red pen.
ETA: the funny thing is I'm a software engineer by trade, have been for 18 years, and I gotta say, there is nothing more satisfying than deleting code. Give me a PR filled with seas of red. I love it. Makes a tonne of sense writing would be the same.
"There is however a way to cheat our way into getting it right the first time: instead of designing a (piece of) program once, we can design it two or three times over, compare, and keep the best approach. Nobody has to know about our embarrassing failures. This takes time and effort, but I believe all significant projects deserve it. That said, I understand why in practice most of the code I see is just a rushed first draft: stopping as soon as it “works” is just too damn tempting." [1]
"Give me five papers with code demonstrating the state of the art of machine learning which uses geospatial data (e.g. GeoJSON) as both input and output."
There is no such state of the art. My hand-wavey understanding is that GIS data is non-continuous, which makes it useless for transformers, and also contextual, which makes it useless for anything else. Will defer to actual ML people for better explanations.
Point is, LLMs invariably give five papers with code that don't actually exist - it's a guaranteed hallucination.
Phind was able to give me five links that do in fact exist, as well as contextual information as to why these five links were not papers with code doing ML with GIS data. This is by far the best answer to this question from an LLM I've received yet.