Human activity recognition (HAR) has become one of the most active research topics in image processing and pattern recognition (Aggarwal, J. K. and Ryoo, M. S., 2011). Detecting specific activities in a live feed or searching in video archives still relies almost completely on human resources. Detecting multiple activities in real-time video feeds is currently performed by assigning multiple analysts to simultaneously watch the same video stream. Manual analysis of video is labor intensive, fatiguing, and error prone. Solving the problem of recognizing human activities from video can lead to improvements in several applications fields like in surveillance systems, human computer interfaces, sports video analysis, digital shopping assistants, video retrieval, gaming and health-care (Popa et al., n.d.; Niu, W. et al., n.d.; Intille, S. S., 1999; Keller, C. G., 2011). This area has grown dramatically in the past 10 years, and throughout our research we identified a potentially underexplored sub-area: Action Prediction. What if we could infer the future actions of people from visual input? We propose to expand the current vision-based activity analysis to a level where it is possible to predict the future actions executed by a subject. We are interested in interactions which can involve a single actor, two humans and/or simple objects. For example try to predict if “a person will cross the street” or “a person will try to steal a handbag from another” or where will a tennis-player target the next volley. Using a hierarchical approach we intend to represent high-level human activities that are composed of other simpler activities, which are usually called sub-events which may themselves be decomposable. We expect to develop a system capable of predicting the next action in a sequence initially using offline-learning to bootstrap the system and then with self-improvement/task specialization in mind, using online-learning.