Undergraduate Student @ University of Science, VNU-HCM
My general interests are Deep Learning, Computer Vision and Multimodal Models and their applications in real-world problems Currently, my research focused on Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs).
Research
Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
Designing an efficient image enhancement model for RGB photos. The model is designed to improve the visual quality of images to match one taken from Canon 70D DSLR, while maintaining computational efficiency, making it suitable for real-time applications on mobile devices. The 8-bit quantized model achieved 21.050 PSNR and 0.725 SSIM on the DPED dataset even with only 915K parameters.
Performing 3D image segmentation to detect surfaces in ancient scrolls. Experimenting with different techniques such as 2.5D approach using MONAI library, 3D segmentation using nnUNetv2 library, and post-processing methods to improve segmentation quality.
Enhancing traffic video understanding and captioning by developing rigorous pipeline that integrates spatio and temporal information to boost the performance of existing Vision-Language Models. Designing novel caption decomposition strategy to cover spatio and temporal aspects of traffic videos. Extensive experiments on AI City Challenge datasets demonstrate the effectiveness of our proposed method. This pipeline achieved 7th place in the ICCV 2025 AI City Challenge Track 2.
Implementing and comparing MaskRCNN and DeeplabV3 for semantic segmentation on the Cityscapes dataset. Training both models from scratch and evaluating their performance using metrics such as mIoU and pixel accuracy.
Implementing normal research papers in the field of deep learning, computer vision and natural language processing. This project serves as a personal repository to practice and understand various research papers by implementing them from scratch.