Even if they plateaued in place where they are right now it would lead to major shakeups in humanity’s current workflow
Like which one? Because it’s now 2 years we have chatGPT and already quite a lot of (good?) models.
Which shakeup do you think is happening or going to happen?
I quit my previous job in part because I couldn’t deal with the influx of terrible, unreliable, dangerous, bloated, nonsensical, not even working code that was suddenly pushed into one of the projects I was working on. That project is now completely dead, they froze it on some arbitrary version.
When junior dev makes a mistake, you can explain it to them and they will not make it again. When they use llm to make a mistake, there is nothing to explain to anyone.
I compare this shake more to an earthquake than to anything positive you can associate with shaking.
And so, the problem wasn’t the ai/llm, it was the person who said “looks good” without even looking at the generated code, and then the person who read that pull request and said, again without reading the code, “lgtm”.
If you have good policies then it doesn’t matter how many bad practice’s are used, it still won’t be merged.
The only overhead is that you have to read all the requests but if it’s an internal project then telling everyone to read and understand their code shouldn’t be the issue.
I hardly see it changed to be honest. I work in the field too and I can imagine LLMs being good at producing decent boilerplate straight out of documentation, but nothing more complex than that.
I often use LLMs to work on my personal projects and - for example - often Claude or ChatGPT 4o spit out programs that don’t compile, use inexistent functions, are bloated etc.
Possibly for languages with more training (like Python) they do better, but I can’t see it as a “radical change” and more like a well configured snippet plugin and auto complete feature.
LLMs can’t count, can’t analyze novel problems (by definition) and provide innovative solutions…why would they radically change programming?
That is my experience, it’s generally quite decent for small and simple stuff (as I said, distillation of documentation). I use it for rust, where I am sure the training material was much smaller than other languages. It’s not a matter a prompting though, it’s not my prompt that makes it hallucinate functions that don’t exist in libraries or make it write code that doesn’t compile, it’s a feature of the technology itself.
GPTs are statistical text generators after all, they don’t “understand” the problem.
It’s also pretty young, human toddlers hallucinate and make things up. Adults too. Even experts are known to fall prey to bias and misconception.
I don’t think we know nearly enough about the actual architecture of human intelligence to start asserting an understanding of “understanding”. I think it’s a bit foolish to claim with certainty that LLMs in a MoE framework with self-review fundamentally can’t get there. Unless you can show me, materially, how human “understanding” functions, we’re just speculating on an immature technology.
As much as I agree with you, humans can learn a bunch of stuff without first learning the content of the whole internet and without the computing power of a datacenter or consuming the energy of Belgium. Humans learn to count at an early age too, for example.
I would say that the burden of proof is therefore reversed. Unless you demonstrate that this technology doesn’t have the natural and inherent limits that statistical text generators (or pixel) have, we can assume that our mind works differently.
Also you say immature technology but this technology is not fundamentally (I.e. in terms of principle) different from what Weizenabum’s ELIZA in the '60s. We might have refined model and thrown a ton of data and computing power at it, but we are still talking of programs that use similar principles.
So yeah, we don’t understand human intelligence but we can appreciate certain features that absolutely lack on GPTs, like a concept of truth that for humans is natural.
Exactly this. Things have already changed and are changing as more and more people learn how and where to use these technologies. I have seen even teachers use this stuff who have limited grasp of technology in general.
No, not that either. Unless you consider “use LLM to summarize the changes/errors/inaccuracies, then have a human read the whole thing again” an improvement over “just have a human read the whole thing”.
Because LLM will do all these things:
point you toward issues
point you toward non-issues
not point you toward issues
change stuff even when “instructed” not to
If there is one thing you don’t want to throw an LLM at without full, unbiased review, it’s documents where the wording is legally binding. And if you have to do a full, unbiased review to begin with, where you can’t even trust your tool to have highlighted all the important parts, you may as well not bother with the tool.
I really can’t see this being done by any sane person. Why would you have a generator of text reviewing stuff (besides grammar)?
Do you have any reference of some companies doing this, perhaps?
Its complex pattern matching and looking up existing case law online. This work has been outsourced to contracting companies for at least 7 years that I’m aware of. If it is something that can be documented in a run book for non professionals to do for twenty cents on the dollar then there is no reason it can’t be done by a script for .002.
Like which one? Because it’s now 2 years we have chatGPT and already quite a lot of (good?) models. Which shakeup do you think is happening or going to happen?
Computer programming has radically changed. Huge help having llm auto complete and chat built in. IDEs like Cursor and Windsurf.
I’ve been a developer for 35 years. This is shaking it up as much as the internet did.
I quit my previous job in part because I couldn’t deal with the influx of terrible, unreliable, dangerous, bloated, nonsensical, not even working code that was suddenly pushed into one of the projects I was working on. That project is now completely dead, they froze it on some arbitrary version.
When junior dev makes a mistake, you can explain it to them and they will not make it again. When they use llm to make a mistake, there is nothing to explain to anyone.
I compare this shake more to an earthquake than to anything positive you can associate with shaking.
And so, the problem wasn’t the ai/llm, it was the person who said “looks good” without even looking at the generated code, and then the person who read that pull request and said, again without reading the code, “lgtm”.
If you have good policies then it doesn’t matter how many bad practice’s are used, it still won’t be merged.
The only overhead is that you have to read all the requests but if it’s an internal project then telling everyone to read and understand their code shouldn’t be the issue.
This is a problem with your team/project. It’s not a problem with the technology.
I hardly see it changed to be honest. I work in the field too and I can imagine LLMs being good at producing decent boilerplate straight out of documentation, but nothing more complex than that.
I often use LLMs to work on my personal projects and - for example - often Claude or ChatGPT 4o spit out programs that don’t compile, use inexistent functions, are bloated etc. Possibly for languages with more training (like Python) they do better, but I can’t see it as a “radical change” and more like a well configured snippet plugin and auto complete feature.
LLMs can’t count, can’t analyze novel problems (by definition) and provide innovative solutions…why would they radically change programming?
You’re missing it. Use Cursor or Windsurf. The autocomplete will help in so many tedious situations. It’s game changing.
ChatGPT 4o isn’t even the most advanced model, yet I have seen it do things you say it can’t. Maybe work on your prompting.
That is my experience, it’s generally quite decent for small and simple stuff (as I said, distillation of documentation). I use it for rust, where I am sure the training material was much smaller than other languages. It’s not a matter a prompting though, it’s not my prompt that makes it hallucinate functions that don’t exist in libraries or make it write code that doesn’t compile, it’s a feature of the technology itself.
GPTs are statistical text generators after all, they don’t “understand” the problem.
It’s also pretty young, human toddlers hallucinate and make things up. Adults too. Even experts are known to fall prey to bias and misconception.
I don’t think we know nearly enough about the actual architecture of human intelligence to start asserting an understanding of “understanding”. I think it’s a bit foolish to claim with certainty that LLMs in a MoE framework with self-review fundamentally can’t get there. Unless you can show me, materially, how human “understanding” functions, we’re just speculating on an immature technology.
As much as I agree with you, humans can learn a bunch of stuff without first learning the content of the whole internet and without the computing power of a datacenter or consuming the energy of Belgium. Humans learn to count at an early age too, for example.
I would say that the burden of proof is therefore reversed. Unless you demonstrate that this technology doesn’t have the natural and inherent limits that statistical text generators (or pixel) have, we can assume that our mind works differently.
Also you say immature technology but this technology is not fundamentally (I.e. in terms of principle) different from what Weizenabum’s ELIZA in the '60s. We might have refined model and thrown a ton of data and computing power at it, but we are still talking of programs that use similar principles.
So yeah, we don’t understand human intelligence but we can appreciate certain features that absolutely lack on GPTs, like a concept of truth that for humans is natural.
Exactly this. Things have already changed and are changing as more and more people learn how and where to use these technologies. I have seen even teachers use this stuff who have limited grasp of technology in general.
Review of legal documents.
Oh boy…what can possibly go wrong for documents where small minutiae like wording can make a huge difference.
Creating legal documents, no. Reviewing legal documents for errors and inaccuracies totally.
No, not that either. Unless you consider “use LLM to summarize the changes/errors/inaccuracies, then have a human read the whole thing again” an improvement over “just have a human read the whole thing”.
Because LLM will do all these things:
If there is one thing you don’t want to throw an LLM at without full, unbiased review, it’s documents where the wording is legally binding. And if you have to do a full, unbiased review to begin with, where you can’t even trust your tool to have highlighted all the important parts, you may as well not bother with the tool.
I really can’t see this being done by any sane person. Why would you have a generator of text reviewing stuff (besides grammar)? Do you have any reference of some companies doing this, perhaps?
Its complex pattern matching and looking up existing case law online. This work has been outsourced to contracting companies for at least 7 years that I’m aware of. If it is something that can be documented in a run book for non professionals to do for twenty cents on the dollar then there is no reason it can’t be done by a script for .002.
Aside from a handful of business that tried to do that and failed miserably, some of them failing in actual court, you mean?