Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youā€™ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cutā€™nā€™paste it into its own post ā€” thereā€™s no quota for posting and the bar really isnā€™t that high.

The post Xitter web has spawned soo many ā€œesotericā€ right wing freaks, but thereā€™s no appropriate sneer-space for them. Iā€™m talking redscare-ish, reality challenged ā€œculture criticsā€ who write about everything but understand nothing. Iā€™m talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyā€™re inescapable at this point, yet I donā€™t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldnā€™t be surgeons because they didnā€™t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canā€™t escape them, I would love to sneer at them.

(Semi-obligatory thanks to @dgerard for starting this.)

  • FredFig@awful.systems
    link
    fedilink
    English
    arrow-up
    13
    Ā·
    23 hours ago

    The conspiracy theorist who lives in my brain wants to say its intentional to make us more open to blatant cheating as something thatā€™s just a ā€œcost of doing business.ā€ (I swear I saw this phrase a half dozen times in the orange site thread about this)

    The earnest part of me tells me no, these guys are just clowns, but I dunno, they canā€™t all be this dumb right?

    • self@awful.systems
      link
      fedilink
      English
      arrow-up
      10
      Ā·
      23 hours ago

      holy shit, thatā€™s the excuse theyā€™re going for? they cheated on a benchmark so hard the results are totally meaningless, sold their most expensive new models yet on the back of that cheated benchmark, further eroded the scientific process both with their cheating and by selling those models as better for scientific researchā€¦ and these weird fucks want that to be fine and normal? fuck them

      • David Gerard@awful.systemsM
        link
        fedilink
        English
        arrow-up
        9
        Ā·
        23 hours ago

        they canā€™t even sell o3 really - in o3 high mode, needed to do this level of query, itā€™s about $1000 per query lol

        • self@awful.systems
          link
          fedilink
          English
          arrow-up
          3
          Ā·
          4 hours ago

          do you figure itā€™s $1000/query because the algorithms they wrote with their insider knowledge to cheat the benchmark are very expensive to run, or is it $1000/query because theyā€™re grifters and all high mode does is use the model trained on frontiermath and allocate more resources to the query? and like any good grifter, theyā€™re targeting whales and institutional marks who are so invested that throwing away $1000 on horseshit feels like a bargain

          • froztbyte@awful.systems
            link
            fedilink
            English
            arrow-up
            3
            Ā·
            edit-2
            3 hours ago

            so, for an extremely unscientific demonstration, here (warning: AWS may try hard to get you to engage with Explainer[0]) is an instance of an aws pricing estimate for big handwave ā€œsome gpu computeā€

            and when I say ā€œextremely unscientificā€, I mean ā€œI largely pulled the numbers out of my assā€. even so, theyā€™re not entirely baseless, nor just picking absolute maxvals and laughing

            parameters assumptions made:

            • ā€œsomewhat beefyā€ gpu instances (g4dn.4xlarge, selected through the tried and tested ā€œsquint until it looks rightā€ method)
            • 6-day traffic pattern, excluding sunday[1]
            • daily ā€œ4h peakā€ total peak load profile[2]
            • 50 instances mininum, 150 maximum (letā€™s pretend weā€™re not openai but are instead some random fuckwit flybynight modelfuckery startup)
            • us west coast
            • spot instances, convertible spot reserves, 3y full prepay commit (yeah I know full vs partial is a big diff; once again, snore)

            (and before we get any fucking ruleslawyering dumb motherfuckers rolling in here about accuracy or whatever: get fucked kthx. this is just a very loosely demonstrative example)

            so youā€™d have a variable buffer of 50ā€¦150 instances, featuring 3.2ā€¦9.6TiB of RAM for working set size, 800ā€¦2400 vCPU, 50ā€¦150 nvidia t4 cores, and 800ā€¦2400GiB gpu vram

            letā€™s presume a perfectly spherical ops team of uniform capability[3] and imagine that we have some lovely and capable active instance prewarming and correct host caching and whatnot. yā€™know, things to reduce user latency. letā€™s pretend weā€™re fully dynamic[4]

            so, by the numbers, then

            1y times 4h daily gives us 1460h (in seconds, thatā€™s 5256000). this extremely inaccurate full-of-presumptions number gives us ā€œservice-capable life timeā€. the times your concierge is at the desk, the times you can get pizza delivered.

            x3 to get to lifetime matching our spot commit, x50ā€¦x150 to get to ā€œtotal possible instance hoursā€. which is the top end of our sunshine and rainbows pretend compute budget. which, of course, we still have exactly no idea how to spend. because we donā€™t know the real cost of servicing a query!

            but letā€™s work backwards from some made-up shit, using numbers The Poor Public gets (vs numbers Free Microsoft Credits will imbue unto you), and see where we end up!

            so that means our baseline:

            • upfront cost: $4,527,400.00
            • monthly: $1460.00 (x3 x12 = $52560)
            • whatever the hell else is incurred (s3, bandwidth, ā€¦)
            • >=200k/y per ops/whatever person we have

            3y of 4h-daily at 50 instances = 788400000 seconds. at 150 instances, 2365200000 seconds.

            so we can say that, for our deeply Whiffs Ever So Slightly values, a secondā€™s compute on the low instance-count end is $0.01722755 and $0.00574252 at the higher instance-count end! which gives us a bit of a handle!

            this, of course, entirely ignores parallelism, n-instance job/load/whatever distribution, database lookups, network traffic, allllllll kinds of shit. which we canā€™t really have good information on without some insider infrastructure leaks anyway. if we pretend to look at the compute alone.

            so what does $1000/query mean, in the sense of our very ridiculous and fantastical numbers? since the units are now The Same, we can simply divide things!

            at the 50 instance mark, weā€™d need to hypothetically spend 174139.68 instance-seconds. thatā€™s 2.0154 days of linear compute!

            at the 150 instance mark, 522419.05 instance-seconds! 6.070 days of linear compute!

            so! what have we learned? well, weā€™ve learned that we couldnā€™t deliver responses to prompts in Reasonable Time at these hardware presumptions! which, again, are linear presumptions. and thereā€™s gonna be a fair chunk of parallelism and other parts involved here. but even so, turns out itā€™d be a bit of a sizable chunk of compute allocated. to even a single prompt response.

            [0] - a product/service whose very existence I find hilarious; the entire suite of aws products is designed to extract as much money from every possible function whatsoever, leading to complexity, which they then respond to byā€¦ producing a chatbot to ā€œguide usersā€

            [1] - yes yes I know, the world is not uniform and the fucking promptfans come from everywhere. Iā€™m presuming amerocentric design thinking (which imo is probably not wrong)

            [2] - letā€™s pretend that the calculatorsā€™ presumption of 4h persistent peak load and our presumption of short-duration load approaching 4h cumulative are the same

            [3] - oh, who am I kidding, you know itā€™s gonna be some dumb motherfuckers with ansible and k8s and terraform and chucklefuckery

            • froztbyte@awful.systems
              link
              fedilink
              English
              arrow-up
              1
              Ā·
              2 hours ago

              when digging around I happened to find this thread which has some benchmarks for a diff model

              itā€™s apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf. strawberry), then any Use Deserving Of o3 (letā€™s say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a ā€œreasonableā€ response latency (ā€œ5-ish secondsā€)

              so youā€™d need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in ā€œquickā€ time. which, again, isnā€™t even whatever the fuck people are doing with openaiā€™s garbage.

              utter, complete, comprehensive clownery. era-redefining clownery.

              but some dumb motherfucker in a bar will keep telling me itā€™s the future. and I get to not boop 'em on the nose. le sigh.

      • FredFig@awful.systems
        link
        fedilink
        English
        arrow-up
        6
        Ā·
        edit-2
        23 hours ago

        They understand that all of the major model providers is doing it, but since the major model providers are richer than they are, they canā€™t possibly ask OpenAI and friends to stop, so in their heads, it is what it is and therefore must be allowed to continue.

        Or at least, thatā€™s my face value read of it, I certainly hope Iā€™m simplifying things too much.