Agreed, and the chance of it backfiring on them is indeed pleasingly high. If the compute moat for initial training gets lower (e.g. trinary/binary models) or distributed training (Hivemind etc) takes off, or both, or something new, all bets are off.
The compute moat for the initial training will never get lower. But as the foundation models get better, the need for from-scratch training will be less frequent.
Agreed, and the chance of it backfiring on them is indeed pleasingly high. If the compute moat for initial training gets lower (e.g. trinary/binary models) or distributed training (Hivemind etc) takes off, or both, or something new, all bets are off.
The compute moat for the initial training will never get lower. But as the foundation models get better, the need for from-scratch training will be less frequent.