back
mail@jonahv.com
running unsloth llms locally
After running local llms in emacs with strix halo, I realised unsloth provides documentation on running their quantised models, e.g.:
With the docs, I can set the recommended values for model temperature, context size, and sampling.
There’s also great guidance on offloading parts of a model to the CPU, particularly useful in the case of MoE models like Qwen3.
With only 64Gb of memory, this wouldn’t apply to me so much. But Qwen3-Next-80B-A3B-Instruct seems likely to change that.