running unsloth llms locally

After running local llms in emacs with strix halo, I realised unsloth provides documentation on running their quantised models, e.g.:

With the docs, I can set the recommended values for model temperature, context size, and sampling.

There’s also great guidance on offloading parts of a model to the CPU, particularly useful in the case of MoE models like Qwen3.

With only 64Gb of memory, this wouldn’t apply to me so much. But Qwen3-Next-80B-A3B-Instruct seems likely to change that.