• theneverfox
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    Ok, great…I guess I picked a good example

    The massive advances in machine learning-based image recognition (which have been fueled, among other things, by global south underpaid labor) have been a wonder for AT users, & predate the current generative AI craze by years.

    So, yes - this has been slowly improving for like the last decade, it’s a great example of something impossible becoming possible

    Generative AI is polarizing among AT users, with image recognition joining auto-generated audio craptions as a love/hate tool.

    Sounds about right… I’ve seen pretty impressive demos on huggingfaces, but most of them are pretty basic.

    But it opens the door - so now multimodal models are starting to spread. They turn the image into tokens, so you can use this intermediate output with unstructured language. For example, a meme and a diagram are very different - a meme you’d probably want the text and the description of the meme layout, a chart you’d probably want a description of the axises and highlights.

    I use local AI - even the small models can do a lot if you combine and structure them with conventional code. For a lot of reasons. It requires custom code for each thing you want it to do, but it’s a lot more reliable

    Here’s what’s magic to me. So instead of just spitting out an answer, you can have a back and forth. First, you might classify it as a chart, then you might ask it to describe the type of chart, ask it to read the axises (or feed in OCR if the models aren’t great readers, and let them interpret it). You can ask to describe/interpret the contents of the graph. You can ask it to note any missing data, or whatever else. Then you can take all of that, and have it summarize it for something more helpful.

    Better yet, you can make the AI drive itself. Code the first step of classification, then ask it what relevant details should be included. Then run through the list, feed it back through for a summary, and you get something more useful

    That’s why I care so much about AI outreach. Because without learning anything about how neural networks work, a single individual could build something like this. Microsoft/OpenAI, Google and the rest of the tech giants are trying to brute force their way to making an LLM system that replaces workers. I don’t trust them (for hopefully obvious reasons) and I’d cheer if we broke them up, but they’re not all there is to AI

    There’s so many building blocks out there free for the taking - you can download models and build things with them, you can just treat them as a black box and

    As a species, we don’t understand how to use LLMs. They’re not useless, they’re misused. The only way that will change is if people start using these tools - and it’s much easier than it sounds if a technical person is motivated enough to learn. The initial configuration is painful… From there, it’s just passing in text/images/audio, and there’s exampls and libraries everywhere

        • theneverfox
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          I just don’t understand. A few minutes ago I saved myself a good hour of reading on something that would’ve brought me nothing but frustration, now I’m back to what I love. Just now I had a conversation with my ai about the reactions people have to AI here to help process my thoughts. It does nothing but let me be more, to be closer to who I want to be.

          I’ve been following everything neural networks for a decade, I’m clearly biased. Even so, beneath all the hype is something that has opened so many doors

          I don’t think I’m going to change your mind, but can you help me understand? Do you worry about what effects it’ll have? Are there any uses for it you find worthwhile? Are you just so sick of hearing people say how it’ll change everything that you don’t want to hear anything about it?