Comparison of image moderation APIs

speeq · on Aug 10, 2018

It would be nice if someone could compare these commercially available APIs with Yahoo's open_nsfw model in terms of accuracy: https://github.com/yahoo/open_nsfw

I'm currently building an API wrapper around it and running it on a Hetzner server with a GTX 1080 - prediction takes about 0.25 seconds and while I haven't optimised it for parallel execution, I think it should be able to handle at least +10 images/sec comfortably. I'm also testing video moderation by using ffmpeg to slice the video into screenshots and predicting the min/avg/high scores.

Moderating 25 million explicit images using Google Cloud Vision would cost around $19,500/mo vs €99/mo on Hetzner.

mohi13 · on Aug 10, 2018

Makes a lot of sense, actually its really difficult to get a large enough dataset for moderation tasks to make a decent inhouse model for a fair enough comparison.

Sure, we can try scraping that from pornhub etc but fee then the negative classes would be very domain specific, using stock images may not provide a good measure.

Also, its really weird to assign such a task to any of your employees, feels kinda strange :)

speeq · on Aug 10, 2018

Yahoo's model could be fine-tuned: http://caffe.berkeleyvision.org/gathered/examples/finetune_f...

Yeah, it's definitely not a nice task but what's stopping someone (well, besides potential legal issues) from using these commercial APIs to create datasets programatically and training a cloned model from that?

I'm curious what the profit margins are on these APIs because I think they are way overpriced.

konradzikusek · on Aug 10, 2018

I really hope you haven’t picked a dataset that features only women nudity, but sample images suggest otherwise: https://dataturks.com/projects/Mohan/NSFW(Nudity%20Detection...

konradzikusek · on Aug 10, 2018

Here are all 90 nude images: http://jsbin.com/hufulewaji/2/edit?html,js,output . Truly professional take on the topic.

mohi13 · on Aug 10, 2018

:)..There are a few examples of males as well.

konradzikusek · on Aug 10, 2018

No, there are not. http://jsbin.com/bewinalezu/1/edit?html,js,output

yaseen-rob · on Aug 10, 2018

I tried out various image labeling APIs, including Google Vision (Safe Search) for exactly this use case (moderation). I was honestly astonished at the pricing of these APIs. Google is somewhere at 1.50€ for 1000 images which is - imo - very expensive. I tried out the default models that come with Tensorflow but well, they are trained on scientific datasets which typically involve species and flowers - no luck there either. Any good tips for pre-trained models that solve this (for tensorflow)?

dahernan · on Aug 10, 2018

You can use nudebox from https://machinebox.io, is an API in a docker container (disclaimer: I built it)

milesokeefe · on Aug 10, 2018

I've found promising results trying this one out:

https://github.com/yahoo/open_nsfw

dannyw · on Aug 10, 2018

Download 1000 NSFW images, 1000 racy images and 100 completely SFW images. Train a model and publish it on GitHub?

mohi13 · on Aug 10, 2018

Would be really interested to see the results of this.

Also, consider using Dataturks to create and host the dataset.

symisc_devel · on Aug 10, 2018

For those interested, PixLab let you analyze 50K images or video frames via its /nsfw endpoint for $25. They charge $0.9 per 1000 requests after you reach this quota.

https://pixlab.io/cmd?id=nsfw

https://pixlab.io/pricing

jaequery · on Aug 10, 2018

what is the verdict? i been just scrolling down to see which is best and cant find any.

edent · on Aug 10, 2018

The sample images shown only feature white women. Is that a limitation of the dataset?

specializeded · on Aug 10, 2018

Tila Tequila is Vietnamese, and features in the majority of the sample images shown.

drb91 · on Aug 10, 2018

At the same time, the data set is atrocious.

ummonk · on Aug 10, 2018

She's also a white nationalist and neo-Nazi...

invalidusernam3 · on Aug 10, 2018

Not relevant to the conversation

mohi13 · on Aug 10, 2018

BTW we had a question in general to anyone who can help us, Does hosting such a dataset cause issues with SEO etc? Anything else we should be aware of?

judge2020 · on Aug 10, 2018

Probably wouldn't be wise to host a dataset like this with "example" images embedded or linked where a search engine could find them.

mohi13 · on Aug 10, 2018

thanks, will see how can I block crawl on this dataset. BTW how does it hurt exactly, couldn't find much except in case of safe-search mode.