Gemma4 MTP

Written by

in

Google released Gemma4 MTP which incorporates a new feature, speculative decoding. Another lightweight model does token prediction speeding up the work for the larger model making the token speed up to 2-3x.

I saw an cute ELI5:

Imagine two bears, a big slow bear and a little nimble bear looking for berries. The little bear runs off first and finds a bunch of berry trees and yells for the big bear. Big bear comes and decides which berry tree is most delicious and makes the final call to grab it.

Unfortunately for me, my system still cant run it.