minus-squareminipasila@lemmy.fmhy.mltoLocalLLaMA@sh.itjust.works•Any way to prune LLMs?linkfedilinkEnglisharrow-up1·1 year agoI don’t know about that, but you could try GGML (llama.cpp). It has quantization up to 2-bits so that might be small enough. linkfedilink
I don’t know about that, but you could try GGML (llama.cpp). It has quantization up to 2-bits so that might be small enough.