The only reason I can think of is for more on device ai. LLMs like ChatGPT are extremely greedy when it comes down to RAM. There are some optimizations that squeeze them into a smaller memory footprint at the expense of accuracy/capability. Even some of the best phones out there today are barely capable of running a stripped down generative ai. When they do, the output is nowhere near as good as when it is run in an uncompressed mode on a server.
The only reason I can think of is for more on device ai. LLMs like ChatGPT are extremely greedy when it comes down to RAM. There are some optimizations that squeeze them into a smaller memory footprint at the expense of accuracy/capability. Even some of the best phones out there today are barely capable of running a stripped down generative ai. When they do, the output is nowhere near as good as when it is run in an uncompressed mode on a server.