To Thanh Dat

Undergraduate Student @ University of Science, VNU-HCM

My general interests are Deep Learning, Computer Vision and Multimodal Models and their applications in real-world problems
Currently, my research focused on Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs).


Publications

Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
Dat To-Thanh, Nghia Nguyen-Trong, Hoang Vo, Hieu Bui-Minh, Tinh-Anh Nguyen-Nhu
Published at Mobile AI Workshop @ CVPR 2026
STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
Tinh-Anh Nguyen-Nhu* and Triet Dao Hoang Minh* and Dat To-Thanh* and Phuc Le-Gia and Tuan Vo-Lan and Tien-Huy Nguyen
Published at 9th AI City Challenge Workshop @ ICCV 2025

Projects


(CVPRW 2026) Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
Designing an efficient image enhancement model for RGB photos. The model is designed to improve the visual quality of images to match one taken from a Canon 70D DSLR, while maintaining computational efficiency for mobile deployment. The 8-bit quantized model achieved 21.050 PSNR and 0.725 SSIM on the DPED dataset with only 915K parameters.
CVPyTorchImage Enhancement

Vesuvius Surface Detection
Performing 3D image segmentation to detect surfaces in ancient scrolls. Experimenting with 2.5D approaches using MONAI, 3D segmentation using nnUNetv2, and post-processing methods to improve segmentation quality.
CVPyTorch3D Segmentation

(ICCVW 2025) STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
Enhancing traffic video understanding and captioning by developing a rigorous pipeline that integrates spatial and temporal information to boost the performance of existing vision-language models.
Multimodal ModelsPyTorch

Segmentation on Cityscapes Dataset using MaskRCNN and DeeplabV3
Implementing and comparing MaskRCNN and DeeplabV3 for semantic segmentation on the Cityscapes dataset. Training both models from scratch and evaluating them with metrics such as mIoU and pixel accuracy.
2D SegmentationPyTorchCV

Evalution test for ArtExtract - Human AI - Google Summer of Code 2025
Build multiple models such as ResNet50, ViT (base_patch16_224) to classify specific Artist, Genre, Style of paintings and ResNet50 + LSTM to classify the combination of all styles in ArtGAN dataset. Building framework to find similarity in painting in National Gallery Of Art dataset using a query image. Experimenting with multiple metrics and compared performance of used metrics.
Image ClassificationPythonCV

Implementing model from research paper
Implementing research papers in deep learning, computer vision, and natural language processing as a personal repository to practice and understand different methods from scratch.
PythonNLPCVMultimodal Models

Converting game FOL rules to CNF format and perform Forward Chaining and Backward Chaining to solve the game. Evaluating performance and runtime of Forward/Backward Chaining and compared with A*, Backtrack and Bruteforce solvers
Python

Activities


May 2026 - May 2026
  • Support speakers, MCs, and guests, coordinate stage access, and collaborate with the technical team to ensure smooth event operations at GStar Summit 2026.
March 2026 - Present
  • Mentoring students in the Computer Science and Engineering Technology (CSET) program at Ben Tre High School for Gifted Students
  • Providing guidance and support to help students develop their skills in computer science and engineering.
October 2025 - Present
  • Proposed ideas for activities, events and workshops in tech domain
  • Organized and hosted internal training sessions for members to improve their technical skills.

@ 2026 Tô Thành Đạt. All rights reserved