Ciência_Iscte
Publications
Publication Detailed Description
Deep Spatio-Temporal and Frequency Guided Fusion Network for Event-to-Video Reconstruction
Journal Title
IEEE Open Journal of Signal Processing
Year (definitive publication)
2026
Language
English
Country
United States of America
More Information
Web of Science®
This publication is not indexed in Web of Science®
Scopus
Google Scholar
This publication is not indexed in Google Scholar
This publication is not indexed in Overton
Abstract
Event-to-video (E2V) reconstruction has gained significant attention recently for its advantages in enabling high dynamic range and fast motion capture capabilities. However, event data encodes only relative brightness changes, lacking the absolute intensity information necessary for accurate reconstruction. Recent methods incorporate previously reconstructed images to provide intensity references but process them in the spatial domain where low- and high-frequency components are highly coupled. This spatial processing typically leads to the degradation of fine details and introduces artifacts such as over-smoothing, blurring and low contrast reconstruction. To address this, we propose a deep spatio-temporal and frequency guided fusion network for E2V reconstruction (DSTFN-E2V), featuring a dual-path architecture with two key components: i) a prior frequency decomposition module (PFDM), and ii) a spatio-temporal event-driven feature extraction module (STEM). The PFDM decouples low- and high-frequency information from previously reconstructed
images and current event voxel grid via a 2D discrete wavelet transform, processing the low-frequency subband through residual blocks to preserve structural coherence and intensity references, while an edgedetail refinement module (ERM) enhances edge and texture details from high-frequency subbands. The frequency-specific features from PFDM and the spatio-temporal features from STEM are then integrated
through the proposed event-image fusion blocks (EIFBs) that apply cross-attention across three encoder stages, enabling simultaneous structural preservation and detail recovery. Experiments on four real-world datasets demonstrate that DSTFN-E2V achieves state-of-the-art results with 12% SSIM improvements while being 50% faster than recent attention-based methods, with superior edge fidelity and reduced artifacts.
Acknowledgements
This work was supported in part by the National funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., and in part by EU funds through Project/support
UID/50008/2025 –Instituto de Telecomunicações, with DOI identifier 10.54499/UID/50008/2025.
Keywords
Event-to-video (E2V) reconstruction,discrete wavelet transforms,cross-attention networks
Fields of Science and Technology Classification
- Computer and Information Sciences - Natural Sciences
- Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
Funding Records
| Funding Reference | Funding Entity |
|---|---|
| UID/50008/2025 –Instituto de Telecomunicações | FCT/MECI |
Contributions to the Sustainable Development Goals of the United Nations
With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência_Iscte. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.
Português