Export Publication
The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.
Ramna Maqsood, Nunes, P., Soares, L. D. & Conti, C. (2025). Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction. In 2025 33rd European Signal Processing Conference (EUSIPCO). (pp. 606-610). Palermo, Italy: IEEE.
R. Maqsood et al., "Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction", in 2025 33rd European Signal Processing Conf. (EUSIPCO), Palermo, Italy, IEEE, 2025, pp. 606-610
@inproceedings{maqsood2025_1765115071479,
author = "Ramna Maqsood and Nunes, P. and Soares, L. D. and Conti, C.",
title = "Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction",
booktitle = "2025 33rd European Signal Processing Conference (EUSIPCO)",
year = "2025",
editor = "",
volume = "",
number = "",
series = "",
doi = "10.23919/EUSIPCO63237.2025.11226686",
pages = "606-610",
publisher = "IEEE",
address = "Palermo, Italy",
organization = "",
url = "https://ieeexplore.ieee.org/document/11226686"
}
TY - CPAPER TI - Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction T2 - 2025 33rd European Signal Processing Conference (EUSIPCO) AU - Ramna Maqsood AU - Nunes, P. AU - Soares, L. D. AU - Conti, C. PY - 2025 SP - 606-610 DO - 10.23919/EUSIPCO63237.2025.11226686 CY - Palermo, Italy UR - https://ieeexplore.ieee.org/document/11226686 AB - Event-to-video (E2V) reconstruction is a critical task in event-based vision, benefiting from the advantages of event cameras, such as high dynamic range and low latency. However, existing deep learning reconstruction methods often prioritize temporal consistency and over-emphasize low-frequency features, leading to blur artifacts and loss of fine details. To overcome these limitations, we propose a novel frequency-aware multiscale vision transformer model for E2V reconstruction (MSViT-E2V). Our model employs wavelet-based decomposition to extract features at multiple scales, preserving fine-grained details through multilevel wavelet-based downsampling blocks, followed by transformer blocks for multiscale feature aggregation and long-range dependency modeling. Extensive experiments on various event datasets demonstrate that our model not only minimizes artifacts and preserves fine details but also reduces computational costs by up to 50% compared to the transformer-based model ET-Net. ER -
Português