Many thanks for the reply. Internet Archive Scholar—like everything else the Internet Archive does—is fantastic, and I am very grateful for all the efforts you and your colleagues make.
Just for reference for anyone else reading this, here is an excerpt from an e-mail I sent you in March 2021, after IA Scholar was first mentioned on HN:
“I contacted the people at [a large Japanese academic library]. ... I showed them your HN post [1] about the data you've already collected through J-STAGE, and, contrary to my own impression, they said you have probably already captured most of the metadata for Japanese academic journals that would be easily available. They also pointed out that J-STAGE includes a fair amount of publications from the humanities side of things, also contrary to my own impression.
“The main sticking point, they said, is journals that are published by universities or academic societies and have not been listed on J-STAGE. Many of those journals have never been digitized, they said, and those that are available in digital form are likely to be available only on those universities’ or societies’ individual websites. The library people didn’t know of any aggregators or indexes for such sites. The only way to find them, they suggested, would be for someone to hunt for the sites by hand.
“Over the years, I myself have been involved with the publication of several such journals and have set up websites for a couple, too. The ones published by departments at [a particular Japanese university] are included in [the university’s online repository] but not yet, it seems, on J-STAGE. A couple published by small academic societies are available only on those societies' websites. [Addendum: The Japanese academic societies I have been involved with—mostly in the humanities—would have difficulty getting DOIs or other persistent identifies for the papers they publish; it would take some effort even to convince them of the necessity. They are volunteer-run organizations, and just maintaining their websites is often a challenge for them.]
“Yet another impression of mine (also perhaps wrong) is that a higher percentage of academic research in Japan is published through such journals than in the U.S. It would be very valuable to have that research findable through IA Scholar, but the barriers to collecting it seem high.”
It is true that some might need to be done manually, but Google Scholar shows that it can be done, with some level of accuracy, via HTML and PDF scraping. PIDs and more formalized metadata make things much easier. But Google Scholar did result in pressure on platforms/publishers/repositories to put at least minimal metadata in HTML meta tags, and this can be machine-extracted. And there is a ton of content and metadata available via OAI-PMH. Neither of these technologies cost anything to publishers on the margin, once they get them implemented, and many have to reap the discovery benefits of large search indices.
Just for reference for anyone else reading this, here is an excerpt from an e-mail I sent you in March 2021, after IA Scholar was first mentioned on HN:
“I contacted the people at [a large Japanese academic library]. ... I showed them your HN post [1] about the data you've already collected through J-STAGE, and, contrary to my own impression, they said you have probably already captured most of the metadata for Japanese academic journals that would be easily available. They also pointed out that J-STAGE includes a fair amount of publications from the humanities side of things, also contrary to my own impression.
“The main sticking point, they said, is journals that are published by universities or academic societies and have not been listed on J-STAGE. Many of those journals have never been digitized, they said, and those that are available in digital form are likely to be available only on those universities’ or societies’ individual websites. The library people didn’t know of any aggregators or indexes for such sites. The only way to find them, they suggested, would be for someone to hunt for the sites by hand.
“Over the years, I myself have been involved with the publication of several such journals and have set up websites for a couple, too. The ones published by departments at [a particular Japanese university] are included in [the university’s online repository] but not yet, it seems, on J-STAGE. A couple published by small academic societies are available only on those societies' websites. [Addendum: The Japanese academic societies I have been involved with—mostly in the humanities—would have difficulty getting DOIs or other persistent identifies for the papers they publish; it would take some effort even to convince them of the necessity. They are volunteer-run organizations, and just maintaining their websites is often a challenge for them.]
“Yet another impression of mine (also perhaps wrong) is that a higher percentage of academic research in Japan is published through such journals than in the U.S. It would be very valuable to have that research findable through IA Scholar, but the barriers to collecting it seem high.”
[1] https://news.ycombinator.com/item?id=26408897