A NSFW detector with CoreML

@pexavc@lemmy.world · edit-2 10 months ago

A NSFW detector with CoreML

@OhNoMoreLemmy@lemmy.ml · 10 months ago

Because any detector has to be based on machine learning you can open source all code providing you keep model weights and training data private.

But there’s a fundamental question here, that comes from Lemmy being federated. How can you give csam detecting code/binaries to every instance owner without trolls getting access to it?

Some instances will be run by trolls, and blackbox access is enough to create adversarial examples that will bypass the model, you don’t need source code.

Scrubbles · 10 months ago

That discussion is happening, right now the prevailing idea is that it’s an instance admin opt-in feature, where you can host it yourself or use a hosted tool elsewhere to prevent it. on top of that, instance admins should be allowed to block federating images, so things uploaded on other instances are not federated to us and instead those images are requested directly from your instance. That would help cut down on the spread of bad material, and if something was purged on the home instance it could be purged everywhere

@toothbrush@lemmy.blahaj.zone · edit-2 10 months ago

Just chiming in here to say that this is very much like security through obscurity. In this context the “secure” part is being sure that the images you host are ok.

Bad actors using social engineering to get the banlist is much easier than using open source AI and collectivly fixing the bugs when the trolls manage to evade it. Its not that easy to get around image filters like this, and having to do wierd things to the pictures to be able to pass the filter barrier could be work enough to keep most trolls from posting. Using a central instance that filters all images is also not good, because now the person operating the service is now responsible for a large chunk of your images, creating a single point of failure in the fediverse(and something that could be monetised to our detriment) Closed source can not be the answer either because if someone breaks the filter, the community cant fix it, only the developer can. So either the dev team is online 24/7 or its paid, making hosting a fediverse server dependent on someones closed source product.

I do think however that disabling image federation should be an option. Turning image federation off for some server for a limited time could be a very effective tool against these kinds of attacks!

A NSFW detector with CoreML

A NSFW detector with CoreML

GitHub - lovoo/NSFWDetector: A NSFW (aka porn) detector with CoreML