2026-04-24
Quantization aware training for RGB image enhancement task
I would like to sincerely thank all of the members in my team: TA Tinh Anh, TA Tien Huy, Nguyen Trong Nghia, Vo Hoang and Bui Minh Hieu.
Disclaimer: This blog is not an official post for the paper or the project, this post is based on my experience and is a reflect of what I learned.
Image Enhancement is a general task consisting of a lot of subtasks such as improve the quality of a low-resolution image (Image Super-resolution), de-blur an image (Image Deblurring) or improve lightning condition (Low-light Image Enhancement). This task is a subset of Image Signal Processing (ISP): ISP aims to use signal to enhance raw image from image sensor, while Image Enhancement aims to enhance RGB images using algorithms or Deep Learning models.
In 2017, a research paper from ETH Zurich propose a task related to image enhancement, which is (s)RGB Image Enhancement. This task is about enhancing RGB images from old phones such as iPhone 3GS, BlackBerry Passport and Sony Xperia Z to match the quality of DLSR Canon Camera. The models trained on this task can applied on any image resolution and the methods can be applied to adapt to match any type of digital camera. The application is clear: "developing a method to take high quality photos using only an old smartphone". Imagine using only iPhone 6 and you are able to capture photos with quality as good as an expensive camera.
The original task was proposed in this paper together with the dataset DPED and a simple model.
Related paper in image enhancement are abundant. A notable paper is MobileIE, which introduces a tiny model of 4K parameters when inference and achieves SOTA (State-Of-The-Art aka the best) in LOL datasets. An important idea from the paper is that model use "Reparameterization method" to condense the weights of multiple Convolutional layers to a single layer, which reducing the model's size by a significant amount while keeping the performance unchanged. LL-UNet++ tackles low-light image enhancement task and inspires us to design Multi-Scale Refinement blocks (see next section).
Our model looks similar to UNet (like many image enhancement do), and differ in Down and Up blocks' designs. Inspired from DaHua-IIG team, we branch the downsampling in Encoder block to 3 branch: 2 feature maps from 2 parallel convolutional branches and 1 ensemble feature map from them. This is to capture the interaction between feature maps by ensemble and refine them using Refinement block. Instance normalization layers are used (inspired from LL-UNet++) to process each image's special structure and noise.

Another contribution is that we successfully integrate Quantization-Aware Training (QAT) and maintain high performance of the model even at 8-bit precision. In training stage, we add blocks of FakeQuant at every blocks to simulate quantization error. This allows model to correct the quantization error that it can encounters when deploying at 8-bit format. The result QAT model achieve 22.194 PSNR and 0.796 SSIM when evaluating in FP32 configuration and 21.050 PSNR and 0.725 SSIM in INT8 configuration. Normal PTQ model (Post-Training Quantization, i.e. convert a normal model to INT8 without training it to correct quantization error) only achieves 20.576 PSNR and 0.6139 SSIM. This means that QAT model saves 0.474 PSNR and 0.1111 SSIM score compared to PTQ model. Moreover, the qualitative results also shows significant improvement in QAT compared to PTQ (rightmost columns).

At first, we aim to distill large model to small model and combine with QAT. However, we can not find a good reference for such method. Therefore, we think that distillation + QAT for RGB image enhancement is a good direction for future work.