To Thanh Dat

Undergraduate Student @ University of Science, VNU-HCM

My general interests are Deep Learning, Computer Vision and Multimodal Models and their applications in real-world problems
Currently, my research focused on Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs).

Publications

Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement

arXiv CVPRW 2026

Dat To-Thanh, Nghia Nguyen-Trong, Hoang Vo, Hieu Bui-Minh, Tinh-Anh Nguyen-Nhu

Published at Mobile AI Workshop @ CVPR 2026

STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models

arXiv ICCVW 2025

Tinh-Anh Nguyen-Nhu* and Triet Dao Hoang Minh* and Dat To-Thanh* and Phuc Le-Gia and Tuan Vo-Lan and Tien-Huy Nguyen

Published at 9th AI City Challenge Workshop @ ICCV 2025

Projects

(CVPRW 2026) Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement

Designing an efficient image enhancement model for RGB photos. The model is designed to improve the visual quality of images to match one taken from a Canon 70D DSLR, while maintaining computational efficiency for mobile deployment. The 8-bit quantized model achieved 21.050 PSNR and 0.725 SSIM on the DPED dataset with only 915K parameters.

CVPyTorchImage Enhancement

Vesuvius Surface Detection

Performing 3D image segmentation to detect surfaces in ancient scrolls. Experimenting with 2.5D approaches using MONAI, 3D segmentation using nnUNetv2, and post-processing methods to improve segmentation quality.

CVPyTorch3D Segmentation

(ICCVW 2025) STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models

Enhancing traffic video understanding and captioning by developing a rigorous pipeline that integrates spatial and temporal information to boost the performance of existing vision-language models.

Multimodal ModelsPyTorch

Segmentation on Cityscapes Dataset using MaskRCNN and DeeplabV3

Implementing and comparing MaskRCNN and DeeplabV3 for semantic segmentation on the Cityscapes dataset. Training both models from scratch and evaluating them with metrics such as mIoU and pixel accuracy.

2D SegmentationPyTorchCV

Evalution test for ArtExtract - Human AI - Google Summer of Code 2025

Build multiple models such as ResNet50, ViT (base_patch16_224) to classify specific Artist, Genre, Style of paintings and ResNet50 + LSTM to classify the combination of all styles in ArtGAN dataset. Building framework to find similarity in painting in National Gallery Of Art dataset using a query image. Experimenting with multiple metrics and compared performance of used metrics.

Image ClassificationPythonCV

Implementing model from research paper

Implementing research papers in deep learning, computer vision, and natural language processing as a personal repository to practice and understand different methods from scratch.

PythonNLPCVMultimodal Models

Futoshiki Puzzles

Converting game FOL rules to CNF format and perform Forward Chaining and Backward Chaining to solve the game. Evaluating performance and runtime of Forward/Backward Chaining and compared with A*, Backtrack and Bruteforce solvers

Python

Activities

Event Collaborator at GStar Summit 2026

May 2026 - May 2026

Support speakers, MCs, and guests, coordinate stage access, and collaborate with the technical team to ensure smooth event operations at GStar Summit 2026.

Mentor of Computer Science and Engineering Technology (CSET) Club, Ben Tre High School for Gifted Students

March 2026 - Present

Mentoring students in the Computer Science and Engineering Technology (CSET) program at Ben Tre High School for Gifted Students
Providing guidance and support to help students develop their skills in computer science and engineering.

Head of AI&DS Team, Google Developer Group on Campus, University of Science, VNU-HCM

October 2025 - Present

Proposed ideas for activities, events and workshops in tech domain
Organized and hosted internal training sessions for members to improve their technical skills.