Scientific journal paper Q2
Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction
Ramna Maqsood (Maqsood, R.); Paulo Nunes (Nunes, P.); Caroline Conti (Conti, C.); Luís Ducla Soares (Soares, L. D.);
Journal Title
IEEE Open Journal of Signal Processing
Year (definitive publication)
2026
Language
English
Country
United States of America
More Information
Web of Science®

This publication is not indexed in Web of Science®

Scopus

Times Cited: 0

(Last checked: 2026-06-04 10:48)

View record in Scopus

Google Scholar

This publication is not indexed in Google Scholar

This publication is not indexed in Overton

Abstract
Event-to-video (E2V) reconstruction has gained significant attention recently for its advantages in enabling high dynamic range and fast motion capture capabilities. However, event data encodes only relative brightness changes, lacking the absolute intensity information necessary for accurate reconstruction. Recent methods incorporate previously reconstructed images to provide intensity references but process them in the spatial domain where low- and high-frequency components are highly coupled. This spatial processing typically leads to the degradation of fine details and introduces artifacts such as over-smoothing, blurring and low contrast reconstruction. To address this, we propose a deep spatio-temporal and frequency guided fusion network for E2V reconstruction (DSTFN-E2V), featuring a dual-path architecture with two key components: i) a prior frequency decomposition module (PFDM), and ii) a spatio-temporal event-driven feature extraction module (STEM). The PFDM decouples low- and high-frequency information from previously reconstructed images and current event voxel grid via a 2D discrete wavelet transform, processing the low-frequency subband through residual blocks to preserve structural coherence and intensity references, while an edgedetail refinement module (ERM) enhances edge and texture details from high-frequency subbands. The frequency-specific features from PFDM and the spatio-temporal features from STEM are then integrated through the proposed event-image fusion blocks (EIFBs) that apply cross-attention across three encoder stages, enabling simultaneous structural preservation and detail recovery. Experiments on four real-world datasets demonstrate that DSTFN-E2V achieves state-of-the-art results with 12% SSIM improvements while being 50% faster than recent attention-based methods, with superior edge fidelity and reduced artifacts.
Acknowledgements
This work was supported in part by the National funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., and in part by EU funds through Project/support UID/50008/2025 –Instituto de Telecomunicações, with DOI identifier 10.54499/UID/50008/2025.
Keywords
Event-to-video (E2V) reconstruction,discrete wavelet transforms,cross-attention networks
  • Computer and Information Sciences - Natural Sciences
  • Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
Funding Records
Funding Reference Funding Entity
UID/50008/2025 –Instituto de Telecomunicações FCT/MECI

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência_Iscte. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.