A groundbreaking collaboration between researchers from MIT, KAUST, ISTA, and Yandex has produced HIGGS, a revolutionary method for compressing large language models (LLMs) that eliminates the need for specialized hardware.
The new technique, named HIGGS (Hadamard Incoherence with Gaussian MSE-optimal GridS), allows developers to compress massive AI models directly on consumer devices like laptops and smartphones in just minutes - a process that previously required industrial-grade servers and could take weeks.
"This breakthrough democratizes access to advanced AI models," says the research team. "Now, startups and independent developers can leverage compressed models without investing in expensive infrastructure."
HIGGS has been successfully tested on popular models including LLaMA, DeepSeek, and Qwen families. The method outperforms existing compression approaches like NF4 and HQQ while maintaining model quality.
A key innovation is that HIGGS doesn't require additional training data or complex parameter optimization. This makes it particularly valuable when suitable calibration data isn't available.
The practical implications are substantial. For example, DeepSeek R1 with 671B parameters and Llama 4 Maverick with 400B parameters - models that were previously impossible to compress effectively - can now be optimized for consumer devices while preserving performance.
The research team will present their findings at the upcoming NAACL conference, one of the premier AI research venues. The method is already available on Hugging Face for developers to implement.
This development builds on previous compression innovations from these institutions, including AQLM and PV-Tuning, which together can reduce model size by up to 8x while maintaining 95% response quality.
For the AI community, HIGGS represents a major step toward making advanced language models accessible to everyone - from major corporations to individual developers working on standard laptops.
The technology is already being put to practical use, with Yandex implementing it to accelerate product development and testing. The broader impact could reshape how AI models are deployed across industries, particularly in resource-constrained environments.