Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

phoneymouse@lemmy.world · 1 month ago

Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

1stTime4MeInMCU@mander.xyz · 1 month ago

I’m convinced people who can’t tell when a chat bot is hallucinating are also bad at telling whether something else they’re reading is true or not. What online are you reading that you’re not fact checking anyway? If you’re writing a report you don’t pull the first fact you find and call it good, you need to find a couple citations for it. If you’re writing code, you don’t just write the program and assume it’s correct, you test it. It’s just a tool and I think most people are coping because they’re bad at using it

BluesF@lemmy.world · 1 month ago

Yeah. GPT models are in a good place for coding tbh, I use it every day to support my usual practice, it definitely speeds things up. It’s particularly good for things like identifying niche python packages & providing example use cases so I don’t have to learn shit loads of syntax that I’ll never use again.

Aceticon@lemmy.world · 1 month ago

In other words, it’s the new version of copying code from Stack Overflow without going to the trouble of properly understanding what it does.

Rekorse@sh.itjust.works · 1 month ago

Pft you must have read that wrong, its clearly turning them into master programmer one query at a time.

BluesF@lemmy.world · 1 month ago

I know how to write a tree traversal, but I don’t need to because there’s a python module that does it. This was already the case before LLMs. Now, I hardly ever need to do a tree traversal, honestly, and I don’t particularly want to go to the trouble of learning how this particular python module needs me to format the input or whatever for the one time this year I’ve needed to do one. I’d rather just have something made for me so I can move on to my primary focus, which is not tree traversals. It’s not about avoiding understanding, it’s about avoiding unnecessary extra work. And I’m not talking about saving the years of work it takes to learn how to code, I’m talking about the 30 minutes of work it would take for me to learn how to use a module I might never use again. If I do, or if there’s a problem I’ll probably do it properly the second time, but why do it now if there’s a tool that can do it for me with minimum fuss?

sp3ctr4l@lemmy.zip · edit-2 1 month ago

I just tried out Gemini.

I asked it several questions in the form of ‘are there any things of category x which also are in category y?’ type questions.

It would often confidently reply ‘No, here’s a summary of things that meet all your conditions to fall into category x, but sadly none also fall into category y’.

Then I would reply, ‘wait, you don’t know about thing gamma, which does fall into both x and y?’

To which it would reply ‘Wow, you’re right! It turns out gamma does fall into x and y’ and then give a bit of a description of how/why that is the case.

After that, I would say ‘… so you… lied to me. ok. well anyway, please further describe thing gamma that you previously said you did not know about, but now say that you do know about.’

And that is where it gets … fun?

It always starts with an apology template.

Then, if its some kind of topic that has almost certainly been manually dissuaded from talking about, it then lies again and says ‘actually, I do not know about thing gamma, even though I just told you I did’.

If it is not a topic that it has been manually dissuaded from talking about, it does the apology template and then also further summarizes thing gamma.

…

I asked it ‘do you write code?’ and it gave a moderately lengthy explanation of how it is comprised of code, but does not write its own code.

Cool, not really what I asked. Then command ‘write an implementation of bogo sort in python 3.’

… and then it does that.

…

Awesome. Hooray. Billions and billions of dollars for a shitty way to reform web search results into a coversational form, which is very often confidently wrong and misleading.

taladar@sh.itjust.works · 1 month ago

And then more money spent on adding that additional garbage filter to the beginning and the end of the process which certainly won’t improve the results.

pyre@lemmy.world · 1 month ago

copilot did the same with basic math. just to test it I said “let’s say I have a 10x6 rectangle. what number would I have to divide width and height by, in order to end up with a rectangle that’s half the area?”

it said “in order to make it half, you should divide them by 2. so [pointlessly lengthy steps explaining the divisions]”

I said “but that would make the area 5x3 = 15 units which is not half the area of 60”

it said “you’re right! in order to … [fixing the answer to √2 using approximation”

I don’t know if I said it then, or after some other fucking nonsense but when I said “you’re useless” it had the fucking audacity to take offense and end the conversation!

like fuck off, you don’t get to have fake pride if you don’t have basic fake intelligence but use it in your description.

sp3ctr4l@lemmy.zip · edit-2 1 month ago

Its a perfect encapsulation of the corpo mindset:

Whatever I do is profound, meaningful, with endless possibilities for future greatness…

… even though I’m just talking out of my ass 99% of the time…

… and if you have the audacity, the nerve, to have a completely normal reaction when you determine that that is what I am doing, pshaw, how uncouth, I won’t stand for your abuse!

…

They’ve done it. They’ve made a talking (not thinking) machine in their own image.

And it was not good.

You start a conversation you can’t even finish it You’re talkin’ a lot, but you’re not sayin’ anything When I have nothing to say, my lips are sealed Say something once, why say it again?

Psycho Killer Qu’est-ce que c’est

Knock_Knock_Lemmy_In@lemmy.world · 1 month ago

please further describe thing gamma that you previously said you did not know about, but now say that you do know about.’

It’s quite amusing to ask it about conspiracy theories. There’s a huge amount in it’s training set (not because the theories are true, just that they are often written about) that it has been dissuaded from discussing.

UnderpantsWeevil@lemmy.world · edit-2 1 month ago

Cool, not really what I asked. Then command ‘write an implementation of bogo sort in python 3.’

… and then it does that.

Alright, but… it did the thing. That’s a feature older search engines couldn’t reliably perform. The output is wonky and the conversational style is misleading. But its not materially worse than sifting through wrong answers on StackExchange or digging through a stack of physical textbooks looking for Python 3 Bogo Sort IRL.

I agree AI has annoying flaws and flubs. And it does appear we’re spending vast resources doing what a marginal improvement to Google five years ago could have done better. But this is better than previous implementations of search, because it gives you discrete applicable answers rather than a collection of dubiously associated web links.

sp3ctr4l@lemmy.zip · edit-2 1 month ago

But this is better than previous implementations of search, because it gives you discrete applicable answers rather than a collection of dubiously associated web links.

Except for when you ask it to determine if a thing exists by describing its properties, and then it says no such thing exists while providing a discrete response explaining in detail how there are things that have some, but not all of those properties…

… And then when you ask it specifically about a thing you already know about that has all those properties, it tells you about how it does exist and describes it in detail.

What is the point of a ‘conversational search engine’ if it cannot help you find information unless you already know about said information?!

The whole, entire point of formatting it into a conversational format is to trick people into thinking they are talking to an expert, an archivist with encyclopedaeic knowledge, who will give them accurate answers.

Yet it gatekeeps information that it does have access to but omits.

The format of providing a bunch of likely related links to a query is a format much more reminiscent of doing actual research, with no impression that you will immediately find what you want right away, that this is a tool to aide you in your research process.

This is only an improvement if you want to further unteach people how to do actual research and critical thinking.

UnderpantsWeevil@lemmy.world · 1 month ago

Except for when you ask it to determine if a thing exists by describing its properties

Basic search can’t answer that either. You’re describing a task neither system is well equipped to accomplish.

sp3ctr4l@lemmy.zip · edit-2 1 month ago

With basic search, it is extremely obvious that that feature does not exist.

With conversational search, the search itself gaslights you into believing it has this feature, as it understands how to syntactically parse the question, and then answers it confidently with a wrong answer.

I would much rather buy a car that cannot fly, knowing it cannot fly, than a car that literally talks to you and tells you it can fly, and sometimes manages to glide a bit, but also randomly nose dives into the ground whilst airborne.

Zeppo@sh.itjust.works · 1 month ago

I don’t feel like off-the-cuff summaries by AI can replace web sites and detailed articles written by knowledgeable humans. Maybe if you’re looking for a basic summary of a topic.

UnderpantsWeevil@lemmy.world · 1 month ago

I don’t feel like off-the-cuff summaries by AI can replace web sites and detailed articles written by knowledgeable humans

No. But that’s not what a typical search result returns.

There’s also no guarantee the “detailed articles” you get back are well-informed or correct. Lots of top search results are just ad copy or similar propaganda. YouTube, in particular, is rife with long winded bullshitters.

What you’re looking for is a well-edited trustworthy encyclopedia, not a search engine.

Zeppo@sh.itjust.works · 1 month ago

Sure. It depends on the topic. It also depends how good you are at searching. Personally, I don’t have any difficulty finding quality websites via Google or DuckDuckGo. Sometimes it requires refining the search terms, and many people don’t know how to do that properly. For certain topics, I might not. However, i stand by my assertion that some statistically generated text which you can trust at all is less useful than a good article on a topic. Is Wikipedia useful also? Yes, it is.

tired_n_bored@lemmy.world · 1 month ago

I beg someone to help me. There is this new guy at my workplace, officially as a developer who can’t write code at all. He has pasted an entire project I did into ChatGPT with “optimize this” and pull requested it. I swear.

wizardbeard@lemmy.dbzer0.com · 1 month ago

Report up the chain, if it’s safe to do so and they are likely to understand.

Also, check what your company’s rules regarding data security and LLM use are. My understanding is that at many places putting private company or customer data into an outside LLM is seen as shouting company secrets out to the open internet. At least that’s the policy where I’m at. Pasting an entire project in would definitely violate things for my workplace.

In general that’s rude as hell. New guy comes in, grabs an entire project they have no background with, and just chucks it at an LLM? No actual review of it themselves, just an assumption that your code is so shit that a general use text generator will do better? Doesn’t sound like a “team player” to me (management eats that kind of talk up).

Maybe couch it as “I want to make sure that as a team, we’re utilizing the tools available to us in the best way possible to multiply our strengths. That said, I’m concerned the approach that [LLM idiot] is using will only result in more work for the team. Using chatGPT as he has is an explosive approach, when I feel that a more scalpel-like approach to address specific areas for improvement would be the best method moving forward. We should be using these tools to address specific concerns, not chucking everything at the wall in some never ending chase of an undefined idea of ‘more optimized’.”

Perhaps frame it in terms of man hours? The immediateness of 5 minutes in chatGPT can cost the team multiple workdays in reviewing the output, whereas more focused code review up front can reduce the man hour cost significantly.

There’s also a bunch of articles out there online about how overuse of LLMs is leading to a measurable decrease in code quality and increase in security issues in code bases.

tired_n_bored@lemmy.world · 1 month ago

Such a great answer, thank you lots!

JackbyDev@programming.dev · 1 month ago

Because of I haven’t found anyone asking the same question on a search index, ChatGPT won’t tell me to just use Google or close my question as a duplicate when it’s not a duplicate.

WalnutLum@lemmy.ml · edit-2 1 month ago

Reminder that all these Chat-formatted LLMs are just text-completion engines trained on text formatted like a chat. You’re not having a conversation with it, it’s “completing” the chat history you’re providing it. By randomly(!) choosing the next text tokens that seems like they best fit the text provided.

If you don’t directly provide, in the chat history and/or the text completion prompt, the information you’re trying to retrieve, you’re essentially fishing for text in a sea of random text tokens that seems like it fits the question.

It will always complete the text, even if the tokens it chooses minimally fit the context, it chooses the best text it can but it will always complete the text.

This is how they work, and anything else is usually the company putting in a bunch of guide bumpers to reformat prompts into coaxing the models to respond in a “smarter” way (see GPT-4o and “chain of reasoning”)

HackerJoe@sh.itjust.works · 1 month ago

They were trained on reddit. How much would you trust a chatbot whose brain consists of the entirety of reddit put in a blender?

I am amazed it works as well as it does. Gemini only occasionally tells people to kill themselves.

hoshikarakitaridia@lemmy.world · 1 month ago

Because in a lot of applications you can bypass hallucinations.

getting sources for something
as a jump off point for a topic
to get a second opinion
to help argue for r against your position on a topic
get information in a specific format

In all these applications you can bypass hallucinations because either it’s task is non-factual, or it’s verifiable while promoting, or because you will be able to verify in any of the superseding tasks.

Just because it makes shit up sometimes doesn’t mean it’s useless. Like an idiot friend, you can still ask it for opinions or something and it will definitely start you off somewhere helpful.

ms.lane@lemmy.world · 1 month ago

Also just searching the web in general.

Google is useless for searching the web today.

fibojoly@sh.itjust.works · 1 month ago

Not if you want that thing that everyone is on about. Don’t you want to be in with the crowd?! /s

WalnutLum@lemmy.ml · 1 month ago

All LLMs are text completion engines, no matter what fancy bells they tack on.

If your task is some kind of text completion or repetition of text provided in the prompt context LLMs perform wonderfully.

For everything else you are wading through territory you could probably do easier using other methods.

burgersc12@mander.xyz · 1 month ago

I love the people who are like “I tried to replace Wolfram Alpha with ChatGPT why is none of the math right?” And blame ChatGPT when the problem is all they really needed was a fucking calculator

leftzero@lemmynsfw.com · 1 month ago

The fucking problem is they stole my damn calculator and now they’re trying to sell me an LLM as a replacement.

LLMs are an interesting if mostly useless toy (an excessively costly one, though; Eliza achieved mostly the same results at a fraction of the cost).
The massive scam bubble that’s been built around them, however, and its absurd contribution to enshittification and global warming, is downright monstrous, and makes anyone defending commercial LLMs worthy of the utmost contempt, just like those who defended cryptocurrencies before LLMs became the latest fad.

ohwhatfollyisman@lemmy.world · 1 month ago

so, basically, even a broken clock is right twice a day?

onionsinmypores@sh.itjust.works · edit-2 1 month ago

No, maybe more like, even a functional clock is wrong every 0.8 days.
https://superuser.com/questions/759730/how-much-clock-drift-is-considered-normal-for-a-non-networked-windows-7-pc

The frequency is probably way higher for most LLMs though lol

dev_null@lemmy.ml · 1 month ago

Yes, but for some tasks mistakes don’t really matter, like “come up with names for my project that does X”. No wrong answers here really, so an LLM is useful.

ohwhatfollyisman@lemmy.world · 1 month ago

great value for all that energy it expends, indeed!

dev_null@lemmy.ml · 1 month ago

deleted by creator

Rekorse@sh.itjust.works · 1 month ago

How is that faster than just picking a random name? Noone picks software based on name.

dev_null@lemmy.ml · edit-2 1 month ago

And yet virtually all of software has names that took some thought, creativity, and/or have some interesting history. Like the domain name of your Lemmy instance. Or Lemmy.

And people working on something generally want to be proud of their project and not name it the first thing that comes to mind, but take some time to decide on a name.

Rekorse@sh.itjust.works · 1 month ago

Wouldnt they also not want to take a random name off an AI generated list? How is that something to be proud of? The thought, creativity, and history behind it is just that you put a query into chatgpt and picked one out of 500 names?

Maybe its just a difference of perspective but thats not only not a special origin story for a name, its taking from others in a way you won’t be able to properly credit them, which is essential to me.

I would rather avoid the trouble and spend the time with a coworker or friend throwing ideas back and forth and building an identity intentionally.

I suppose AI could be nice if I was alone nearly all the time.

dev_null@lemmy.ml · edit-2 1 month ago

The process of throwing ideas back and forth usually doesn’t include just choosing one, but generating ideas as jumping off points, usually with some existing concept in mind. Talking with friends, looking at other projects, searching for inspiration online and in the real world, and now also generating some more ideas with an LLM to add to the mix. Using one source and just picking a suggestion probably won’t get you a good result.

Nurse_Robot@lemmy.world · 1 month ago

sigh people do talk about this, they complain about it non-stop. These same people probably aren’t using it as intended, or are deliberately trying to farm a “gotcha” response. AI is a very neat tool which can do a lot of things well, but it’s important to recognize its limitations. I don’t use it for things I don’t understand because I won’t recognize if it’s spitting out nonsense, but for topics I do understand it’s hard to overstate how efficient and time saving it is.

ByteOnBikes@slrpnk.net · 1 month ago

The FuckAI people are valid for their concerns.

Unfortunately, their anger seems to constantly be misdirected at the weirdest things, instead of root issues.

taladar@sh.itjust.works · 1 month ago

Oh, there is plenty of hate for the hype cycle in general which is about as close to the root of the issue as you can get.

brucethemoose@lemmy.world · edit-2 1 month ago

My take is they should be fighting the corporate API vs open source models war, instead of just “screw all AI” which really means “screw open source AI and let Sam Altman enshittify everything”

Especially on Lemmy.

It’d be like blanket railing against social media and ultimately getting the Fediverse banned, while Facebook and X walk away.

zarkanian@sh.itjust.works · 1 month ago

“Give me a vegan recipe using <ingredient>” has been flawless. The recipes are decent, although they tend to use the same spices over and over.

Paradigm_shift@sh.itjust.works · 1 month ago

I sometimes use it to “convert” preexisting bulletpoints or informal notes into a professional sounding business email. I already know all the information so proofreading the final product doesn’t take a lot of time.

I think a lot of people who shit on AI forget that some people struggle with putting their thoughts into words. Especially if they aren’t writing in their native language.

Rekorse@sh.itjust.works · 1 month ago

Efficiency depends on the cost doesnt it?

Nurse_Robot@lemmy.world · 1 month ago

The cost to me, the user, is nothing

taladar@sh.itjust.works · 1 month ago

Sorry to hear that you consider your time worthless. Have you tried therapy for that?

Nurse_Robot@lemmy.world · 1 month ago

There’s something so uniquely funny about being too stupid to insult someone properly. Thanks for the chuckle

TrickDacy@lemmy.world · 1 month ago

Probably because they’re not checking them

snooggums@lemmy.world · 1 month ago

Because most people are too lazy to bother with making sure the results are accurate when they sound plausible. They want to believe the hype, and lack critical thinking.

Chip_Rat@lemmy.world · 1 month ago

I don’t want to believe any hype! I just want to be able to ask “hey Chatgtp, I’m looking for a YouTube video by technology connections where he discusses dryer heat pumps.” And not have it spit out "it’s called “the neat ways your dryer heat pumps save energy!”

And it is not, that video doesn’t exist. And it’s even harder to disprove it on first glance because the LLM is mimicing what Alex would have called the video. So you look and look with your sisters very inefficient PS4 controller-to-youtube interface… And finally ask it again and it shy flowers you…

But I swear he talked about it ?!?! Anyone?!?

gamermanh@lemmy.dbzer0.com · edit-2 1 month ago

He hasn’t

I think in a recent video he mentioned he will soon, but he hasn’t done a video with even a segment on heat pumps in dryers yet

Fairly confident in this, recently finished a rewatch of basically all his content

Chip_Rat@lemmy.world · 1 month ago

Damn it… I was sure he mentioned them briefly in one of his heat pump videos but I trust you over Chatgtp…

He should do a video! I am constantly enchanted by his heat pump explainers… I don’t know why but it’s one of those concepts that’s just a bit out of my wheelhouse. So I always “knew” how it worked. But the lightbulb moment. The aha! Pure crack.

ms.lane@lemmy.world · 1 month ago

This sound awfully familiar, like almost exactly what people were saying about Wikipedia 20 years ago…

julietOscarEcho@sh.itjust.works · 1 month ago

Pretty weak analogy. Wikipedia was technologically trivial and did a really good job of avoiding vested interests. Also the hype is orders of magnitude different, noone ever claimed Wikipedia was going to lead to superhuman intelligences or to replacement of swathes of human creative/service workers.

Actually since you mention it, my hot take is that Wikipedia might have been a more significant step forward in AI than openAI/latest generation LLMs. The creation of that corpus is hugely valuable in training and benchmarking models of natural language. Also it actually disrupted an industry (conventional encyclopedias) in a way that I’m struggling to think of anything that LLMs has replaced in the same way thus far.

snooggums@lemmy.world · 1 month ago

Those people were wrong because wikipedia requires actual citations from credible sources, not comedic subreddits and infowars. Wikipedia is also completely open about the information being summarized, both in who is presenting it and where someone can confirm it is accurate.

AI is a presented to the user as a black box and tries to be portray it as equivalent to human with terms like ‘hallucinations’ which really mean ‘is wrong a bunch, lol’.

bl_r@lemmy.dbzer0.com · 1 month ago

My job uses a data science platform that has a special ai assistant trained on its own docs.

The first time I tried using it, it used the wrong language. The second time I used it, it was hallucinating its own functions, but after looking up the docs I told it what function to use and it gave me code that worked

I have not used it a third time. I don’t think i will.

surph_ninja@lemmy.world · 1 month ago

Depending on the task, it’s quicker to verify the AI response than work through the blank page phase.

Snowclone@lemmy.world · edit-2 1 month ago

I only use it for complex searches with results I can usually parse myself like ‘‘list 30 typical household items without descriptions or explainations with no repeating items’’ kind of thing.

ohwhatfollyisman@lemmy.world · 1 month ago

great value for all that energy it expends, indeed!

Varyk@sh.itjust.works · 1 month ago

it’s because everyone stopped using it, right?

at least months ago?

Kushan@lemmy.world · 1 month ago

They don’t give you the answer, they give you a rough idea of where to look for the answer.

I’ve used them to generate chunks of boilerplate code that was 80% of what I needed, because I knew what I needed and wanted to save time.

BakerBagel@midwest.social · 1 month ago

There are ways of doing that which dont require burning an acre of rainforest

wizardbeard@lemmy.dbzer0.com · 1 month ago

Yep. The overwhelming majority of IDEs have support for making templates/snippets.

VScode/VScodium has a very robust snippet system where you can set parts as “fill in the blank” that you can tab between, with optional drop down menus for choices. You can even link different “fill in” sections so you can do stuff like type in an argument name and have it propagate that same name through multiple places in your snippet.

If that’s too much, how the fuck can any dev (or even someone hacking together scripts) survive without at least one file of common shit they made before that they can copy paste from? I really feel like that’s bare minimum.

Either it’s boilerplate you can already copy from somewhere else (documentation or previous work), or it’s something you should probably review (at least briefly) and make into a template or snippet you can copy and paste later. That’s part of the magic of programming: you get to build your own toolbox over time.

callcc@lemmy.world · 1 month ago

It’s usually good for ecosystems with good and loads of docs. Whenever docs are scarce the results become shitty. To me it’s mostly a more targeted search engine without the crap (for now)

ugjka@lemmy.world · 1 month ago

The only reason i use ChatGPT for some quick stuff is just that search engines suck so bad.

brucethemoose@lemmy.world · 1 month ago

Perplexity (or open source equivalents) are much better for this.