Hacker News new | past | comments | ask | show | jobs | submit | mdlincoln's comments login

context-dependent, or "reified" assertions are a pain point for sure. I come from the perspective of cultural heritage data, where context is king. Which expert made this attribution for this painting? Who owned it _when_? According to which archival document? etc.

Almost all the engineering problems cited in the original post are still basically there, but graphical models are still the least painful way of doing this, particularly when trying to share data between institutions. Example: https://linked.art/model/assertion/


The OP mentions property graphs as a way around this problem. They can be seen as natural extensions of "RDF quads" which in turn are based on common RDF triples (Subject / Property / Object)


You may be interested in what Birkbeck has been developing: https://github.com/BirkbeckCTP/janeway


Very interesting - thanks! For completeness' sake: GitHub-based publishing is possible too: http://www.theoj.org


Just commenting to add this wonderfully succinct summary of the post by John Overholt:

>It takes a tremendous amount of work to make the work that goes into photographing this goblet invisible.

https://twitter.com/john_overholt/status/991110369082068992


Several hundred publications from the Getty Museum (and the other research arms of the Getty) are available for free download as well: https://www.getty.edu/publications/virtuallibrary/

(I work at the Getty Research Institute)


Just since I've got your eye, and you'll get a chuckle out of the bug - it appears that the title listing pages don't handle unicode correctly:

"Fernand Khnopff: Portrait of Jeanne K\xE9fer"

http://www.getty.edu/Search/VirtualLibrary?title=&author=Mic...

Click on the title, and the book details page renders correctly.


I find that a fascinating reaction given how rapidly %>% have been taken up across a large segment of the R universe, to great excitement! Personally, I find it far MORE legible than endlessly-nested function calls.

It results in code that more closely resembles executed order of operations (e.g. filter -> mutate -> group -> summarize). Context is also key: it's most often used for data processing pipelines in specific analytical scripts or literate-code documents - less so used when defining generalizable/testable functions in packages (again, just a personal perspective - YMMV of course)


you nailed it. dplyr is better the further you are from doing heavy duty data analysis or creating production code. if you're writing some simple transforms to put data into a report, fine. someone is probably going to want to look at that at some point and it's much, much easier to understand. but for anything else i stick with data.table.


A related aside: while forgeries - deliberate imitations to mislead and deceive - are exciting, they only represent a very tiny portion of art attribution questions. In reality, these tend to deal more with discerning between artists working in the same period, rather than those attempting to fool the eye at several centuries' remove.

For example, the Rembrandt Research Project infamously set out to identify genuine vs. fake Rembrandt paintings in his corpus of known works under the false assumption that there would be a lot of 18th/19th/20th century forgeries. In fact, most of the "non-Rembrandt" cases they found were not later imitations, but instead works done by his own students or contemporaries - or works co-produced by Rembrandt and another. The result - deconstructing the project's original false assumption - proved revolutionary for our understanding of artistic studio practice from the period, but failed to locate many "forgeries" as such.

A review (paywalled, sorry!): http://www.sciencedirect.com/science/article/pii/02604779899...

And a Met exhibition: http://www.metmuseum.org/art/metpublications/Rembrandt_Not_R...


I'm working on pulling the images now, like I did for the Rijksmuseum CC0 dump. FWIW a good place to host that torrent is the Internet Archive - it's great for discoverability.


How did it go?



The NYPL has posted metatdata about these collections on GitHub as well

https://github.com/NYPL-publicdomain/data-and-utilities



<3 this! Are there any plans for some type of export utility, e.g. some type of JSON serialization of a finished model?


That's definitely on the agenda. Curious, what are you interested in a JSON serialization for?


Oh, it wouldn't have to be that format in particular, I was just guessing how you might represent the worksheet in some useful, repurpose-able manner.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: