Exportar Publicação
A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.
Maqsood, R., Nunes, P., Conti, C. & Soares, L. D. (2026). Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction. IEEE Open Journal of Signal Processing. 7, 541-550
R. Maqsood et al., "Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction", in IEEE Open Journal of Signal Processing, vol. 7, pp. 541-550, 2026
@article{maqsood2026_1780724008241,
author = "Maqsood, R. and Nunes, P. and Conti, C. and Soares, L. D.",
title = "Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction",
journal = "IEEE Open Journal of Signal Processing",
year = "2026",
volume = "7",
number = "",
doi = "10.1109/OJSP.2026.3693230",
pages = "541-550",
url = "https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11520274"
}
TY - JOUR TI - Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction T2 - IEEE Open Journal of Signal Processing VL - 7 AU - Maqsood, R. AU - Nunes, P. AU - Conti, C. AU - Soares, L. D. PY - 2026 SP - 541-550 SN - 2644-1322 DO - 10.1109/OJSP.2026.3693230 UR - https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11520274 AB - Event-to-video (E2V) reconstruction has gained significant attention recently for its advantages in enabling high dynamic range and fast motion capture capabilities. However, event data encodes only relative brightness changes, lacking the absolute intensity information necessary for accurate reconstruction. Recent methods incorporate previously reconstructed images to provide intensity references but process them in the spatial domain where low- and high-frequency components are highly coupled. This spatial processing typically leads to the degradation of fine details and introduces artifacts such as over-smoothing, blurring and low contrast reconstruction. To address this, we propose a deep spatio-temporal and frequency guided fusion network for E2V reconstruction (DSTFN-E2V), featuring a dual-path architecture with two key components: i) a prior frequency decomposition module (PFDM), and ii) a spatio-temporal event-driven feature extraction module (STEM). The PFDM decouples low- and high-frequency information from previously reconstructed images and current event voxel grid via a 2D discrete wavelet transform, processing the low-frequency subband through residual blocks to preserve structural coherence and intensity references, while an edgedetail refinement module (ERM) enhances edge and texture details from high-frequency subbands. The frequency-specific features from PFDM and the spatio-temporal features from STEM are then integrated through the proposed event-image fusion blocks (EIFBs) that apply cross-attention across three encoder stages, enabling simultaneous structural preservation and detail recovery. Experiments on four real-world datasets demonstrate that DSTFN-E2V achieves state-of-the-art results with 12% SSIM improvements while being 50% faster than recent attention-based methods, with superior edge fidelity and reduced artifacts. ER -
English