ByteDance researchers up image generation efficiency through compression findings

Researchers at ByteDance have published 1.58-bit FLUX which is a new approach to AI model image generation compression as they aim to address current challenges of text-to-image models.

In the published paper, the people involved state that while many of the popular text-to-image models have “demonstrated remarkable generative capabilities,” they have immense parameter counts and high memory requirements which pose challenges for deployment.

This is highlighted as a potential barrier or difficulty on resource-constrained devices like mobile platforms.

To overcome this, the team compressed the FLUX system to three values which reduced storage by 8x.

“This work introduced 1.58-bit FLUX, in which 99.5% of the transformer parameters are quantized to 1.58 bits. With our custom computation kernels, 1.58-bit FLUX achieves a 7.7× reduction in model storage and more than a 5.1× reduction in inference memory usage,” stated the researchers in the paper.

Image generation in compression format is found to be comparable to full model

Although new compression has been reached, industry benchmarks suggest the images are of comparable quality to the full model as they maintain high visual quality.

The hope is that these findings “inspires the community to develop more robust models for mobile devices.”

While the team has made substantial headway on current issues which could prevent wide-spread bottlenecks for real-world use cases, they have highlighted some limitations which they aim to address in future work.

This includes limitations on speed improvements. “Although 1.58-bit FLUX reduces model size and memory consumption, its latency is not significantly improved due to the absence of activation quantization and lack of further optimized kernel implementations.

“Given our promising results, we hope to inspire the community to develop custom kernel implementation for 1.58-bit models.”

Another highlighted issue is the limitations on visual qualities. The 1.58-bit FLUX is able to generate vivid images that are closely aligned with the text prompt, but the researchers suggest it “still lags behind the original FLUX model in rendering fine details at very high resolution. We aim to address this gap in future research.”

Featured image credit: Linked research paper