Research Area

Human beings are a good evidence of a system that combines multiple information sources. Example of information combined by humans include: the use of both eyes, seeing and touching the same object, seeing and listening a person talking (which greatly increases the intelligibility). Following the example of human beings, GTAV is investigating and developing new approaches for scene and content classification that combine different modalities (audio and video).

Early research in scene content analysis and multimedia document indexing has been focused only on video features for segmentation, classification, and summarization. GTAV realizes that audio characteristics are equally, if not more, important when it comes to understanding the semantic content of a video. For instance, every person could differentiate between different program categories just by listen a small number of audio segments of the video. Moreover, the computational cost associated with the calculation of different audio features is much lower than the one for video characteristics. So, when audio alone is sufficient to categorize the scene content, more sophisticated and complex video processing can be saved. For this reason, GTAV is investigating relevance procedures in order to establish a ranking of audio features for scene content analysis taking into account to parameters: computational burden and classification performance.

The main current objective within face analysis is investigating the introduction of 3D model approaches in the face recognition area. GTAV is extending the traditional approaches (PCA or LDA) to a 2D space and additionally introducing depth information (third dimension). Recently, a face database has been created and is available at the GTAV Face Database.

Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely in low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. GTAV is developing a new approach for automatic summarization generation of soccer videos using high-level semantic descriptions. The approach first finds low-level audio-visual descriptors that are lately adequately combined to define semantic events of interest in the soccer video. Once the semantic events are defined, a summarization of a complete soccer match is generated from a set of events video automatic filters based on true soccer summarization experience. Results are very good and are in the process of being used in commercial applications.

Reverse Engineering (RE) in the semiconductor industry deals with the process of obtaining the schematic diagram of an integrated circuit from its physical representation. The application of image processing in this field ease and speed up repetitive tasks that require high human efforts and hardware resources. GTAV, in combination with the Barcelona Microelectronics Institute (IMB-CNM) is doing active work in this area that has lead to an impressive number of results and background knowledge to be applied in different areas.

GTAV performs also research in advance video coding approaches. In particular a lot of effort has been put lately on Distributed Video Coding (DVC). DVC allows the complexity of the encoders to be reduced, so power- or complexity-limited devices like mobile phones or surveillance cameras can transmit video. The combination of such a Simple-Encoder-Complex-Decoder system with existing Complex-Encoder-Simple-Decoder systems should allow mobile-to-mobile videoconferencing (by using a transcoder located on the fixed network).

GTAV is also involved in a number of application projects related to the Spanish and Catalan audio-visual industry. Knowledge is being gained on mobile applications based on IOS and Android operating systems.