Category: Models

  • Choosing the right model

    Claude had a ton of new talks on youtube after their big conference. I thought the below talk had informative take-aways for model selection. If you’re on a Pro subscription you’re likely stuck mostly on Sonnet, but if you’re on Max, then you have a lot more flexibility on what model you get to run full-time.

    The video had some nice summary slides. Like the block below. Tell me that you knew all this already and you get a shiny reward.

    It was also interesting (and slightly frustrating) to know that sometimes using a newer model, was more efficient. Meaning, just because you run the Sonnet model, doesn’t mean it optimizes the tokens and you get the best result. You could run Opus 4.7, be faster, and save tokens. The explanation made sense. The newer models are smarter, so they reason better and can use tokens more efficiently because they may skip tasks, or death spiral less frequently. But… not always. So you have to evaluate.

    While this makes sense in the context of recurring tasks or prompts, what is still unclear is how might one know what model to use… prior to doing an evaluation. I find this part of the equation partly why it seems easier to just always use Opus 4.7 xhigh. How do I know what result I will get between high vs. xhigh and how my choice of this setting will impact my outcome? This part of model selection is user unfriendly. I’ve noticed that thinking is now “adaptive” rather than an option of yes/no in Opus 4.7. I imagine having the LLM decide based on prompt is likely the best path.

  • Gemma4 MTP

    Google released Gemma4 MTP which incorporates a new feature, speculative decoding. Another lightweight model does token prediction speeding up the work for the larger model making the token speed up to 2-3x.

    I saw an cute ELI5:

    Imagine two bears, a big slow bear and a little nimble bear looking for berries. The little bear runs off first and finds a bunch of berry trees and yells for the big bear. Big bear comes and decides which berry tree is most delicious and makes the final call to grab it.

    Unfortunately for me, my system still cant run it.