These kinds of things. For a simple example, captioning pictures… Basically impossible, now you can convince a home computer to do it in a weekend. All running locally with a few hundred lines of code
Imagine what that would do for someone using a screen reader
The massive advances in machine learning-based image recognition (which have been fueled, among other things, by global south underpaid labor) have been a wonder for AT users, & predate the current generative AI craze by years.
So, yes - this has been slowly improving for like the last decade, it’s a great example of something impossible becoming possible
Generative AI is polarizing among AT users, with image recognition joining auto-generated audio craptions as a love/hate tool.
Sounds about right… I’ve seen pretty impressive demos on huggingfaces, but most of them are pretty basic.
But it opens the door - so now multimodal models are starting to spread. They turn the image into tokens, so you can use this intermediate output with unstructured language. For example, a meme and a diagram are very different - a meme you’d probably want the text and the description of the meme layout, a chart you’d probably want a description of the axises and highlights.
I use local AI - even the small models can do a lot if you combine and structure them with conventional code. For a lot of reasons. It requires custom code for each thing you want it to do, but it’s a lot more reliable
Here’s what’s magic to me. So instead of just spitting out an answer, you can have a back and forth. First, you might classify it as a chart, then you might ask it to describe the type of chart, ask it to read the axises (or feed in OCR if the models aren’t great readers, and let them interpret it). You can ask to describe/interpret the contents of the graph. You can ask it to note any missing data, or whatever else. Then you can take all of that, and have it summarize it for something more helpful.
Better yet, you can make the AI drive itself. Code the first step of classification, then ask it what relevant details should be included. Then run through the list, feed it back through for a summary, and you get something more useful
That’s why I care so much about AI outreach. Because without learning anything about how neural networks work, a single individual could build something like this. Microsoft/OpenAI, Google and the rest of the tech giants are trying to brute force their way to making an LLM system that replaces workers. I don’t trust them (for hopefully obvious reasons) and I’d cheer if we broke them up, but they’re not all there is to AI
There’s so many building blocks out there free for the taking - you can download models and build things with them, you can just treat them as a black box and
As a species, we don’t understand how to use LLMs. They’re not useless, they’re misused. The only way that will change is if people start using these tools - and it’s much easier than it sounds if a technical person is motivated enough to learn. The initial configuration is painful… From there, it’s just passing in text/images/audio, and there’s exampls and libraries everywhere
I just don’t understand. A few minutes ago I saved myself a good hour of reading on something that would’ve brought me nothing but frustration, now I’m back to what I love. Just now I had a conversation with my ai about the reactions people have to AI here to help process my thoughts. It does nothing but let me be more, to be closer to who I want to be.
I’ve been following everything neural networks for a decade, I’m clearly biased. Even so, beneath all the hype is something that has opened so many doors
I don’t think I’m going to change your mind, but can you help me understand? Do you worry about what effects it’ll have? Are there any uses for it you find worthwhile? Are you just so sick of hearing people say how it’ll change everything that you don’t want to hear anything about it?
These kinds of things. For a simple example, captioning pictures… Basically impossible, now you can convince a home computer to do it in a weekend. All running locally with a few hundred lines of code
Imagine what that would do for someone using a screen reader
deleted by creator
Ok, great…I guess I picked a good example
So, yes - this has been slowly improving for like the last decade, it’s a great example of something impossible becoming possible
Sounds about right… I’ve seen pretty impressive demos on huggingfaces, but most of them are pretty basic.
But it opens the door - so now multimodal models are starting to spread. They turn the image into tokens, so you can use this intermediate output with unstructured language. For example, a meme and a diagram are very different - a meme you’d probably want the text and the description of the meme layout, a chart you’d probably want a description of the axises and highlights.
I use local AI - even the small models can do a lot if you combine and structure them with conventional code. For a lot of reasons. It requires custom code for each thing you want it to do, but it’s a lot more reliable
Here’s what’s magic to me. So instead of just spitting out an answer, you can have a back and forth. First, you might classify it as a chart, then you might ask it to describe the type of chart, ask it to read the axises (or feed in OCR if the models aren’t great readers, and let them interpret it). You can ask to describe/interpret the contents of the graph. You can ask it to note any missing data, or whatever else. Then you can take all of that, and have it summarize it for something more helpful.
Better yet, you can make the AI drive itself. Code the first step of classification, then ask it what relevant details should be included. Then run through the list, feed it back through for a summary, and you get something more useful
That’s why I care so much about AI outreach. Because without learning anything about how neural networks work, a single individual could build something like this. Microsoft/OpenAI, Google and the rest of the tech giants are trying to brute force their way to making an LLM system that replaces workers. I don’t trust them (for hopefully obvious reasons) and I’d cheer if we broke them up, but they’re not all there is to AI
There’s so many building blocks out there free for the taking - you can download models and build things with them, you can just treat them as a black box and
As a species, we don’t understand how to use LLMs. They’re not useless, they’re misused. The only way that will change is if people start using these tools - and it’s much easier than it sounds if a technical person is motivated enough to learn. The initial configuration is painful… From there, it’s just passing in text/images/audio, and there’s exampls and libraries everywhere
deleted by creator
deleted by creator
I just don’t understand. A few minutes ago I saved myself a good hour of reading on something that would’ve brought me nothing but frustration, now I’m back to what I love. Just now I had a conversation with my ai about the reactions people have to AI here to help process my thoughts. It does nothing but let me be more, to be closer to who I want to be.
I’ve been following everything neural networks for a decade, I’m clearly biased. Even so, beneath all the hype is something that has opened so many doors
I don’t think I’m going to change your mind, but can you help me understand? Do you worry about what effects it’ll have? Are there any uses for it you find worthwhile? Are you just so sick of hearing people say how it’ll change everything that you don’t want to hear anything about it?