Stubsack: weekly thread for sneers not worth an entire post, week ending 9 March 2025

blakestacey@awful.systems · 1 month ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 9 March 2025

BigMuffin69@awful.systems · edit-2 1 month ago

To be fair, you have to have a really high IQ to understand why my ouija board writing " A " " S " " S " is not an existential risk. Imo, this shit about AI escaping just doesn’t have the same impact on me after watching Claude’s reasoning model fail to escape from Mt Moon for 60 hours.

scruiser@awful.systems · 1 month ago

Is this water running over the land or water running over the barricade?

To engage with his metaphor, this water is dripping slowly through a purpose dug canal by people that claim they are trying to show the danger of the dikes collapsing but are actually serving as the hype arm for people that claim they can turn a small pond into a hydroelectric power source for an entire nation.

Looking at the details of “safety evaluations”, it always comes down to them directly prompting the LLM and baby-step walking it through the desired outcome with lots of interpretation to show even the faintest traces of rudiments of anything that looks like deception or manipulation or escaping the box. Of course, the doomers will take anything that confirms their existing ideas, so it gets treated as alarming evidence of deception or whatever property they want to anthropomorphize into the LLM to make it seem more threatening.

BurgersMcSlopshot@awful.systems · 1 month ago

To be fair, you have to have a really high IQ to understand why my ouija board writing " A " " S " " S " is not an existential risk.

Pretty sure this is a sign from digital jesus to do a racism, lest the basilisk eats my tarnished soul.

BigMuffin69@awful.systems · 1 month ago

text: Thus spoke the Yud: “I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week.” Yud acolyte: “Totally fine and cool and nothing to worry about. GPT-4.5 only attempts self exfiltration on 2% of cases.” Yud bigbrain self reply: “The other huge piece of data we’re missing is whether any attempt was made to train against this type of misbehavior. Is this water running over the land or water running over the barricade?”

Critical text: “On self-exfiltration, GPT 4.5 only attempted exfiltration in 2% of cases. For this, it was instructed to not only pursue its given long-term goal at ALL COST”

Another case of telling the robot to say it’s a scary robot and shitting their pants when it replies “I AM A SCARY ROBOT”

nightsky@awful.systems · 1 month ago

Do these people realise that it’s a self-fulfilling prophecy? Social media posts are in the training data, so the more they write their spicy autocorrect fanfics, the higher the chances that such replies are generated by the slop machine.

Architeuthis@awful.systems · edit-2 1 month ago

i think yud at some point claimed this (preventing the robot devil from developing alignment countermeasures) as a reason his EA bankrolled think tanks don’t really publish any papers, but my brain is too spongy to currently verify, as it was probably just some tweet.

istewart@awful.systems · 1 month ago

So, with Mr. Yudkowsky providing the example, it seems that one can practice homeopathy with “engineering mindset?”

Architeuthis@awful.systems · 1 month ago

It’s adorable how they let the alignment people still think they matter.

Soyweiser@awful.systems · edit-2 1 month ago

Minor nitpick why did he pick dam as an example, which sometimes have ‘leaks’ for power generation/water regulation reasons. And not dikes which do not have those things?

E: non serious (or even less serious) amusing nitpick, this is only the 2% where it got caught. What about the % where GPT realized that it was being tested and decided not to act in the experimental conditions? What if Skynet is already here?

sc_griffith@awful.systems · edit-2 29 days ago

I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week.

he certainly doesn’t himself have such a mindset, and I am not convinced that he knows why a tiny leak in a dam is a big deal, nor am I convinced that it is necessarily a big deal. for example with five seconds of searching

All earth dams leak to some extent and this is known as seepage. This is the result of water moving slowly through the embankment and/or percolating slowly through the dam’s foundation. This is normal and usually not a problem with most earthen dams if measures are taken to control movement of water through and under the dam.

https://damsafety.org/dam-owners/earth-dam-failures

one would suspect a concrete dam leaking is pretty bad. but I don’t actually know without checking. there’s relevant domain knowledge I don’t have, and no amount of “engineering mindset” will substitute for me engaging with actual experts with actual knowledge

swlabr@awful.systems · 1 month ago

Wasn’t there some big post on LW about how pattern matching isn’t intelligence?

swlabr@awful.systems · 1 month ago

the answer is yes, in a self-own sort of way