Export Publication

The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.

Export Reference (APA)
Ramna Maqsood, Nunes, P., Soares, L. D. & Conti, C. (2025). Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction. In 2025 33rd European Signal Processing Conference (EUSIPCO). (pp. 606-610). Palermo, Italy: IEEE.
Export Reference (IEEE)
R. Maqsood et al.,  "Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction", in 2025 33rd European Signal Processing Conf. (EUSIPCO), Palermo, Italy, IEEE, 2025, pp. 606-610
Export BibTeX
@inproceedings{maqsood2025_1765115071479,
	author = "Ramna Maqsood and Nunes, P. and Soares, L. D. and Conti, C.",
	title = "Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction",
	booktitle = "2025 33rd European Signal Processing Conference (EUSIPCO)",
	year = "2025",
	editor = "",
	volume = "",
	number = "",
	series = "",
	doi = "10.23919/EUSIPCO63237.2025.11226686",
	pages = "606-610",
	publisher = "IEEE",
	address = "Palermo, Italy",
	organization = "",
	url = "https://ieeexplore.ieee.org/document/11226686"
}
Export RIS
TY  - CPAPER
TI  - Efficient Frequency-Aware Multiscale Vision Transformer for Event-to-Video Reconstruction
T2  - 2025 33rd European Signal Processing Conference (EUSIPCO)
AU  - Ramna Maqsood
AU  - Nunes, P.
AU  - Soares, L. D.
AU  - Conti, C.
PY  - 2025
SP  - 606-610
DO  - 10.23919/EUSIPCO63237.2025.11226686
CY  - Palermo, Italy
UR  - https://ieeexplore.ieee.org/document/11226686
AB  - Event-to-video (E2V) reconstruction is a critical task in event-based vision, benefiting from the advantages of event cameras, such as high dynamic range and low latency. However, existing deep learning reconstruction methods often prioritize temporal consistency and over-emphasize low-frequency features, leading to blur artifacts and loss of fine details. To overcome these limitations, we propose a novel frequency-aware multiscale vision transformer model for E2V reconstruction (MSViT-E2V). Our model employs wavelet-based decomposition to extract features at multiple scales, preserving fine-grained details through multilevel wavelet-based downsampling blocks, followed by transformer blocks for multiscale feature aggregation and long-range dependency modeling. Extensive experiments on various event datasets demonstrate that our model not only minimizes artifacts and preserves fine details but also reduces computational costs by up to 50% compared to the transformer-based model ET-Net.
ER  -