I did something similar with RoBERTa and my own Kindle library to graph (with D3.js) all mentions/citations between my books (which books cites another books I have). I sorted the final graph by publication date to see some cool historical patterns of books citing another older books [1]
I also manually annotated ~1000 book mentions, but I combine RoBERTa with string search (I list all titles I want to search a priori) to reduce the number of false positives. I also augumented the dataset with thousands of books titles and metadata from goodreads.
The medium post is amazingly written! I basically did the same thing - and you beat me with the data augmentation piece. I tried using nlpaug [0] but it didn't improve the model performance. I'll definitely try swapping book titles around.
I did something similar with RoBERTa and my own Kindle library to graph (with D3.js) all mentions/citations between my books (which books cites another books I have). I sorted the final graph by publication date to see some cool historical patterns of books citing another older books [1]
I also manually annotated ~1000 book mentions, but I combine RoBERTa with string search (I list all titles I want to search a priori) to reduce the number of false positives. I also augumented the dataset with thousands of books titles and metadata from goodreads.
I explain all the process on a blog post[2]
[1] https://thiagolira.blot.im/_projects/book_graph/main.html [2] https://medium.com/mlearning-ai/graphing-citations-between-b...