context-dependent, or "reified" assertions are a pain point for sure. I come from the perspective of cultural heritage data, where context is king. Which expert made this attribution for this painting? Who owned it _when_? According to which archival document? etc.
Almost all the engineering problems cited in the original post are still basically there, but graphical models are still the least painful way of doing this, particularly when trying to share data between institutions. Example: https://linked.art/model/assertion/
The OP mentions property graphs as a way around this problem. They can be seen as natural extensions of "RDF quads" which in turn are based on common RDF triples (Subject / Property / Object)
I find that a fascinating reaction given how rapidly %>% have been taken up across a large segment of the R universe, to great excitement! Personally, I find it far MORE legible than endlessly-nested function calls.
It results in code that more closely resembles executed order of operations (e.g. filter -> mutate -> group -> summarize). Context is also key: it's most often used for data processing pipelines in specific analytical scripts or literate-code documents - less so used when defining generalizable/testable functions in packages (again, just a personal perspective - YMMV of course)
you nailed it. dplyr is better the further you are from doing heavy duty data analysis or creating production code. if you're writing some simple transforms to put data into a report, fine. someone is probably going to want to look at that at some point and it's much, much easier to understand. but for anything else i stick with data.table.
A related aside: while forgeries - deliberate imitations to mislead and deceive - are exciting, they only represent a very tiny portion of art attribution questions. In reality, these tend to deal more with discerning between artists working in the same period, rather than those attempting to fool the eye at several centuries' remove.
For example, the Rembrandt Research Project infamously set out to identify genuine vs. fake Rembrandt paintings in his corpus of known works under the false assumption that there would be a lot of 18th/19th/20th century forgeries. In fact, most of the "non-Rembrandt" cases they found were not later imitations, but instead works done by his own students or contemporaries - or works co-produced by Rembrandt and another. The result - deconstructing the project's original false assumption - proved revolutionary for our understanding of artistic studio practice from the period, but failed to locate many "forgeries" as such.
I'm working on pulling the images now, like I did for the Rijksmuseum CC0 dump. FWIW a good place to host that torrent is the Internet Archive - it's great for discoverability.
Almost all the engineering problems cited in the original post are still basically there, but graphical models are still the least painful way of doing this, particularly when trying to share data between institutions. Example: https://linked.art/model/assertion/