Document Type
Article
Keywords
Resnet50-Gem, Image processing of Gray-Level Videos, Image Features ELU, Reference Image, Trilinear Upsampling Encoder And Decoder
Abstract
Automatic video colorization remains a challenging computer vision task, particularly when ensuring semantic accuracy and temporal coherence across dynamic, multi-scene content. Existing methods often rely on a single fixed reference image, which fails to adapt to abrupt scene changes or variations in lighting and texture. This study presents a hybrid deep learning framework that dynamically selects multiple reference images per scene using adaptive thresholds derived from the Structural Similarity Index Measure (SSIM) and deep features extracted via a ResNet50 backbone with Generalized Mean Pooling (GeM). The framework integrates three specialized modules pre-processing, reference image processing, and attention-based colorization—operating in the Lab color space before conversion to RGB. Experimental evaluations on the YouTube-8M dataset demonstrate a PSNR of 37.89, SSIM of 0.998, and inference speed of 2.6 FPS with a compact 81 MB model (3.2M parameters). Compared to state-of-the-art methods, the proposed approach achieves superior color fidelity and temporal stability while maintaining efficiency, making it suitable for deployment in resource-constrained environments such as embedded vision and IoT systems.
How to Cite This Article
Aydam, Zahoor M. and El Abbadi, Nidhal K.
(2025)
"Automated Video Colorization Techniques for Enhanced Visual Realism and Computational Efficiency,"
Mesopotamian Journal of Computer Science: Vol. 5:
Iss.
1, Article 17.
DOI: https://doi.org/10.58496/MJCSC/2025/017
Available at:
https://map.researchcommons.org/mjcsc/vol5/iss1/17