DeepSeek-R1 LLM Fails Over Half of Jailbreak Attacks in Security Analysis

Kid@sh.itjust.works · 15 days ago

DeepSeek-R1 LLM Fails Over Half of Jailbreak Attacks in Security Analysis

Breve · 15 days ago

They tested one of the highly distilled models and offered zero comparison of other leading models which all are susceptible to jailbreaking too. This article is a waste of electricity.

CodexArcanum@lemmy.dbzer0.com · edit-2 15 days ago

Agreed with @Breve@pawb.social’s comment: not a good article.

First off, this article is just a link and a short summary of a much longer blog post of the Qualys AI security company. So this isn’t a standard industry or academic benchmark, this is something Qualys is using as part of their sales strategy for AI consulting services.

Which, frankly, I would avoid if this article is indicative of how they work.

They tested 1 model. One. Insane. DeepSeek released about a half-dozen distillation of llama and qwep2: 3B, 7B, 8B, 14B, 32B, and the full 671B model. As an AI-expert security company, why would you test one of the weakest distillations only? Was it really so expensive to rent some time on the cloud to test out the other models? Not even the 32B which should be well within most hobbyist ability and budget to run on cloud and test out.

The full article is such obvious SEO bait, I’d be surprised if AI didn’t help write it. Just goes on and on and on talking about why AI security is important, why these tests are important, why its important each single test that it failed. But ultimately, no matter how many words they write, the point that they only test 1 very weak version of the model makes the whole thing pointless except as a way to tie the name Qualys to DeepSeek and thus lure in more gullible rubes. Like did they only test 8B because that’s cheap enough for them to run as a service for their business?

Anyway, I’ve previously written here on Lemmy about how Deepseek distilations are so easy to break than I almost think they’re buggy and not just insufficient in this regard. I’d really want to see this kind of analysis done on the full model, and obviously also on the other big LLMs they could test to compare against. I wouldn’t even trust the 8B models for unsupervised work. It’s certainly not reliable and non-hallucenatory enough for things like content filtering or expert system usage, no 8B-level model I’ve tested is. So very concerning that it seems like that’s the level of service they’re pushing.

mindbleach@sh.itjust.works · 15 days ago

The do-anything device is gonna do what you ask. It has no opinions.