Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg und Laura Dietz.
Knowledge-rich image gist understanding beyond literal meaningData & Knowledge Engineering, 117, 12, 114 - 132 We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that have previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. Our proposed algorithm brings together aspects of entity linking and clustering, subgraph selection, semantic relatedness, and learning-to-rank in a novel way. In addition to this novel task and a complete evaluation of our approach, we introduce a novel dataset to foster further research on this problem. To enable a throughout investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of different kinds of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. Our supervised approach relies on the availability of human-annotated gold standard datasets. Annotating images with, possibly complex, topic labels is arguably a very time-consuming task that must rely on expert human annotators. We accordingly investigate whether parts of this process could be automatized using automatic image annotation and caption generation techniques. Our results indicate the general feasibility of an end-to-end approach to gist detection when replacing one of the two dimensions with automatically generated input, i.e., using automatically generated image tags or generated captions. However, we also show experimentally that state-of-the-art image and text understanding is better at understanding literal meanings of image-caption pairs, with non-literal pairs being instead generally more difficult to detect, thus paving the way for future work on understanding the message of images beyond their literal content.
Philip Mildner, Tonio Triebel, Stephan Kopf und Wolfgang Effelsberg.
Scaling online games with NetConnectors : a peer-to-peer overlay for fast-paced massively multiplayer online gamesComputers in Entertainment : CIE, 15, 9, 1-21 This article presents a peer-to-peer overlay for massively multiplayer online games with a focus on fast-paced action. More than other genres, action games like first-person shooters employ fast and dynamic game mechanics. In multiplayer environments, these properties have to be reflected by the underlying network structure. At the same time, the system should be able to support a huge amount of users in order to deliver a massive experience to the participating players. The capacity of current client/server systems limits the number of players in a game, preventing the desired massive experience.
To provide both a scalable and a responsive system, we use a fully distributed peer-to-peer network with a dynamic connection scheme. By exploiting local interests in the virtual world, our system supports a huge number of users. Therefore, an area-of-interest mechanism is applied to the connection scheme. Users do not connect to all participating users, but they only establish connections to other users they are interested in. These neighbors are determined by the user's perception of the virtual world. Instead of using a purely distance-based approach, our system uses a more flexible neighbor-based approach that supports the use of multiple metrics to determine the set of interesting nodes for each user. A second kind of connection—so-called NetConnectors—utilizes the players' distribution in the virtual world to ensure overlay consistency. For the dissemination of messages, we use a publish/subscribe mechanism. This prevents inconsistencies introduced by unidirectional neighborhood relations that can occur with sender-oriented models. Further, the publish/subscribe mechanism models the users' interests more accurately. In addition to the regular sending mechanism, we implemented a Geocast algorithm that allows information distribution to arbitrary regions of the virtual world. While regular messages are always addressed to specific users, Geocasts cover certain geographical regions. Thus, Geocasts can be used to disseminate messages to all users that are located in the addressed region.
Simulations show that our design performs well in terms of scalability. By keeping the amount of connections per user nearly constant, users do not get overloaded with too many connections. This also applies for crowded regions where the user density is much higher compared to an evenly populated virtual world. Another important aspect of fast-paced multiplayer games is the users' motion behavior. Different movement strategies are evaluated for their impact on network load and connection dynamics.
Philipp Schaber, Stephan Kopf, Sina Wetzel, Tyler Ballast, Christoph Wesch und Wolfgang Effelsberg.
CamMark: Analyzing, Modeling, and Simulating Artifacts in Camcorder CopiesACM Transactions on Multimedia Computing, Communications, and applications : TOMCCAP, 11, 10, Article 42, 1-23 To support the development of any system that includes the generation and evaluation of camcorder copies, as well as to provide a common benchmark for robustness against camcorder copies, we present a tool to simulate digital video re-acquisition using a digital video camera. By resampling each video frame, we simulate the typical artifacts occurring in a camcorder copy: geometric modifications (aspect ratio changes, cropping, perspective and lens distortion), temporal sampling artifacts (due to different frame rates, shutter speeds, rolling shutters, or playback), spatial and color subsampling (rescaling, filtering, Bayer color filter array), and processing steps (automatic gain control, automatic white balance). We also support the simulation of camera movement (e.g., a hand-held camera) and background insertion. Furthermore, we allow for an easy setup and calibration of all the simulated artifacts, using sample/reference pairs of images and videos. Specifically temporal subsampling effects are analyzed in detail to create realistic frame blending artifacts in the simulated copies. We carefully evaluated our entire camcorder simulation system and found that the models we developed describe and match the real artifacts quite well.
Martin Mauve, Jürgen Vogel, Volker Hilt und Wolfgang Effelsberg.
Local-lag and Timewarp : Providing Consistency for Replicated Continuous ApplicationsIEEE Transactions on Multimedia, 6, 2, 47-57 In this paper we investigate how consistency can be established for replicated applications changing their state in reaction to user-initiated operations as well as the passing of time. Typical examples of these applications are networked computer games and distributed virtual environments. We give a formal definition of the terms consistency and correctness for this application class. Based on these definitions, it is shown that an important tradeoff relationship exists between the responsiveness of the application and the appearance of short-term inconsistencies. We propose to exploit the knowledge of this tradeoff by voluntarily decreasing the responsiveness of the application in order to eliminate short-term inconsistencies. This concept is called local-lag. Furthermore, a timewarp scheme is presented that complements local-lag by guaranteeing consistency and correctness for replicated continuous applications. The computational complexity of the timewarp algorithm is determined in theory and practice by examining a simple networked computer game. The timewarp scheme is then compared to the well-known dead-reckoning approach. It is shown that the choice between both schemes is application-dependent.
Jürgen Vogel, Martin Mauve, Volker Hilt und Wolfgang Effelsberg.
Late Join Algorithms for Distributed Interactive ApplicationsMultimedia systems / Association for Computing Machinery, 9, 2, 327-336 Distributed interactive applications such as shared whiteboards and multiplayer games often support dynamic groups where users may join and leave at any time. A participant joining an ongoing session has missed the data that have previously been exchanged by the other session members. It is therefore necessary to initialize the application instance of the latecomer with the current state. In this paper, we propose a late join algorithm for distributed interactive applications that provides such an initialization of applications. The algorithm is scalable and robust and can be easily adapted to the needs of different applications by means of late join policies. The behavior of the late join algorithm and the impact of design alternatives are investigated in detail by means of an extensive simulation study. This study also shows that an improper handling of the late join problem can cause very high application and network load.
Martin Mauve, Hannes Hartenstein, Holger Füßler, Jörg Widmer und Wolfgang Effelsberg.
Positionsbasiertes Routing für die Kommunikation zwischen FahrzeugenInformationstechnik und technische Informatik : it + ti, 44, 2, 278-286 In the near future communication between vehicles by means of wireless technology will enhance both safety and comfort of the passengers. One main challenge in realizing this communication will be the routing of messages from one sender to one or more receivers. In this paper we propose a position-based ad-hoc routing protocol which solves this problem. In this protocol all vehicles work together, thus no pre-established infrastructure is required. As a consequence, the resulting network is inexpensive and robust. In order to prove the viability of the approach, a simulation study was performed using the ns-2 network simulator. As a basis for this study realistic car movement patterns were used. The study shows that even over large distances requiring message forwarding by multiple vehicles, high success rates for the delivery of messages are achieved.
Rainer Lienhart und Wolfgang Effelsberg.
Automatic Text Segmentation and Text Recognition for Video IndexingMultimedia Systems, 8, 6, 69-81 Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our methods for automatic segmentation of text in digital videos. The output is directly passed to a standard OCR software package in order to translate the segmented text into ASCII. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. Especially the inter-frame dependencies of the characters provide new possibilities for their refinement. Then, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher semantics in videos.
Rainer Lienhart, Wolfgang Effelsberg und Ramesh Jain.
VisualGREP: A Systematic Method to Compare and Retrieve Video SequencesMultimedia Tools and Applications, 10, 2, 47-72 In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be easily transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representa-tion with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal res-olution. It allows the inquirer to fully control the importance of temporal ordering and duration. The general approach is illustrated by introducing and discussing some of the many possible image and video features. Promising experimental results are presented.
Claudia Schremmer, Volker Hilt und Wolfgang Effelsberg.
Erfahrungen mit synchronen und asynchronen Lernszenarien an der Universität MannheimPraxis der Informationsverarbeitung und Kommunikation : PIK, 23, 2, 121-128 An der Universität Mannheim werden seit 1996 TeleTeaching-Projekte durchgeführt. Der Schwerpunkt liegt dabei in synchronen Lernszenarien, worunter wir eine zeitgleiche (Internet-) Übertragung einer Veranstaltung an verschiedene Orte verstehen. Die Audio- und Videoströme des Dozenten werden zugleich aufgezeichnet und als Bestandteil eines Computer-Based-Trainings zeitunabhängig für asynchrone Lernszenarien zur Verfügung gestellt. Die implementierten synchronen und asynchronen Lernszenarien werden seit 1998 im Verbundprojekt VIROR der ober-rheinischen Universitäten Freiburg, Heidelberg, Karlsruhe und Mannheim eingesetzt. Dieser Artikel beschreibt verschiedene Lernszenarien bei TeleVeranstaltungen, ihre Besonderheiten und spezifischen Anforderungen. Die zugrunde liegende Technologie wird kurz vorgestellt, einige selbst entwickelte Software Werkzeuge beschrieben und weitere, bislang nicht realisierte Anforderungen an die Technik formuliert. Das Hauptaugenmerk aber legen wir auf Erfahrungen, die wir in den nunmehr vier Jahren aktiven TeleTeachings gewonnen haben.
Silvia Pfeiffer, Rainer Lienhart, Stephan Fischer und Wolfgang Effelsberg.
Abstracting Digital Movies AutomaticallyJournal of Visual Communication and Image Representation, 7, 3, 345-353 Large video on demand databases consisting of thousands of digital movies are not easy to handle: the user must have an attractive means to retrieve his movie of choice. For analog video, movie trailers are produced to allow a quick preview and perhaps stimulate possible buyers. This paper presents techniques to automatically produce such movie abstracts of digtial videos.