Stephan Kopf

Titel: PD Dr. rer. nat.

Office: B 220

Office hours: Tuesdays, 12:00 - 13:00 p.m. in room A5, B220

Phone: +49 621 181-2613, Fax: +49 621 181-2601



  • Video content analysis
  • Computer vision
  • Computer graphics
  • Multimedia applications

 Specific research areas

  • Video retargeting
  • Digital watermarking for multimedia content
  • High dynamic range videos
  • Object recognition
  • New learning technologies

Curriculum vitae

Dr. Stephan Kopf received the M.Sc. degree in business informatics (2000) and the Ph.D. degree in computer science (2007) both from the University of Mannheim (Germany). He finished his 'habilitation' (postdoctoral lecture qualification) on 'Algorithms for Image and Video Processing' in 2012 and is currently working as senior researcher and lecturer at the multimedia department at the University of Mannheim. His research interests focus on multimedia content analysis, media retargeting, high dynamic range videos, shape-based object recognition, and digital watermarking for videos. He has published over 50 refereed journal and conference papers in these fields. Dr. Kopf is initiator and co-editor of the ACM SIG Multimedia Records and the ACM SIGMM eNewsletter. He serves on the program committee of several conferences and workshops, and is initiator and co-chair of several workshops.



  • Daniel Schön, Melanie Klinger, Stephan Kopf and Wolfgang Effelsberg. SCITEPRESS, 2016 Customized teaching scenarios for smartphones in university lecture settings : experiences with several teaching scenarios using the MobileQuiz2. . Setubal, Portugal
    Many teachers use Audience Response Systems (ARS) in lectures to re-activate their listeners and to get an insight in students’ knowledge of the current lecture contents. Plenty of such applications have been developed in recent years, they provide a high variety of different teaching scenarios with the use of the students’ smartphones, including quizzes, lecture feedback and dynamic message boards. We developed a novel application based on an abstract model to enable this variety of customizable teaching scenarios within one application. After presenting the application to the first charge of lecturers, the responses were quite good, and several new teaching scenarios were created and used. This paper presents first experiences when using a variety of customizable teaching scenarios, the special opportunities and challenges as well as the opinions of lecturers and students, which we collected with a survey at the end of the semester.
  • Daniel Schön, Melanie Klinger, Stephan Kopf, Thilo Weigold and Wolfgang Effelsberg. Springer, 2016 Customizable learning scenarios for students’ mobile devices in large university lectures : a next generation audience response system. Communications in computer and information science. Cham
  • Mariia Zrianina and Stephan Kopf. , Technical reports. 2016 Classification of iconic images. Mannheim, 90170. 16-001
    Iconic images represent an abstract topic and use a presentation that is intuitively understood within a certain cultural context. For example, the abstract topic “global warming” may be represented by a polar bear standing alone on an ice floe. Such images are widely used in media and their automatic classification can help to identify high-level semantic concepts. This paper presents a system for the classification of iconic images. It uses a variation of the Bag of Visual Words approach with enhanced feature descriptors. Our novel color pyramids feature incorporates color information into the classification scheme. It improves the average F1 measure of the classification by 0:117. The performance of our system is further evaluated under a variety of parameters.


  • Simone Paolo Ponzetto, Hartmut Wessler, Lydia Weiland, Stephan Kopf, Wolfgang Effelsberg and Heiner Stuckenschmidt. Peter Lang Edition, 2015 Automatic classification of iconic images based on a multimodal model : an interdisciplinary project. Sprache - Medien - Innovationen. Frankfurt am Main ; Bern; Wien
  • Philipp Schaber, Sally Dong, Benjamin Guthier, Stephan Kopf and Wolfgang Effelsberg. ACM, 2015 Modeling temporal effects in re-captured video. . New York, NY
    The re-capturing of video content poses significant challenges to algorithms in the fields of video forensics, watermarking, and near-duplicate detection. Using a camera to record a video from a display introduces a variety of artifacts, such as geometric distortions, luminance transformations, and temporal aliasing. A deep understanding of the causes and effects of such phenomena is required for their simulation, and for making the affected algorithms more robust. In this paper, we provide a detailed model of the temporal effects in re-captured video. Such effects typically result in the re-captured frames being a blend of the original video's source frames, where the specific blend ratios are difficult to predict. Our proposed parametric model captures the temporal artifacts introduced by interactions between the video renderer, display device, and camera. The validity of our model is demonstrated through experiments with real re-captured videos containing specially marked frames.
  • Daniel Schön, Melanie Klinger, Stephan Kopf and Wolfgang Effelsberg. SCITEPRESS, Science and Technology Publications, 2015 A model for customized in-class learning scenarios - an approach to enhance audience response systems with customized logic and interactivity. . [Setúbal]
  • Daniel Schön, Licheng Yang, Melanie Klinger, Stephan Kopf and Wolfgang Effelsberg. Assoc. for the Advancement of Computing in Education , Universität Mannheim 2015 On the effects of different parameters in classroom interactivity systems on students. . Waynesville, NC
    Classroom Response Systems (CRS) are often used in higher education lectures. They help to activate students and to get a deeper insight on the students' knowledge base and on their opinion on currently discussed topics. Many different systems were created, offering a similar amount of functionality. We thus investigate what the important parameters of such systems are, and how they influence the students’ behavior. Therefore, we consider classic response systems as well as systems with a higher amount of interactivity. In a first step, we have defined eight possible parameters, like the usage of pictures or a progress bar. We did a field study in thirty six lectures comparing the impact of the different parameters. As expected, the overall satisfaction with CRS is very high but we have obtained surprising results with particular parameters. We present the most interesting results and give a suggestion on which parameters are useful for an investigation in greater depth.
  • Stefan Wilk, Stephan Kopf and Wolfgang Effelsberg. ACM, 2015 Video composition by the crowd : a system to compose user-generated videos in near real-time. . New York, NY
  • Philipp Schaber, Stephan Kopf, Sina Wetzel, Tyler Ballast, Christoph Wesch and Wolfgang Effelsberg. 2015 CamMark: Analyzing, Modeling, and Simulating Artifacts in Camcorder Copies. ACM transactions on multimedia computing, communications, and applications : TOMM, 11, 2s, Article 42, 1-23
    To support the development of any system that includes the generation and evaluation of camcorder copies, as well as to provide a common benchmark for robustness against camcorder copies, we present a tool to simulate digital video re-acquisition using a digital video camera. By resampling each video frame, we simulate the typical artifacts occurring in a camcorder copy: geometric modifications (aspect ratio changes, cropping, perspective and lens distortion), temporal sampling artifacts (due to different frame rates, shutter speeds, rolling shutters, or playback), spatial and color subsampling (rescaling, filtering, Bayer color filter array), and processing steps (automatic gain control, automatic white balance). We also support the simulation of camera movement (e.g., a hand-held camera) and background insertion. Furthermore, we allow for an easy setup and calibration of all the simulated artifacts, using sample/reference pairs of images and videos. Specifically temporal subsampling effects are analyzed in detail to create realistic frame blending artifacts in the simulated copies. We carefully evaluated our entire camcorder simulation system and found that the models we developed describe and match the real artifacts quite well.


  • Johannes Kiess, Daniel Gritzner, Benjamin Guthier, Stephan Kopf and Wolfgang Effelsberg. ACM, 2014 GPU Video Retargeting with Parallelized SeamCrop. . New York, NY
  • Philipp Schaber, Stephan Kopf, Christoph Wesch and Wolfgang Effelsberg. ACM, 2014 CamMark : a camcorder copy simulation as watermarking benchmark for digital video. . New York, NY
    In 1998, Petitcolas et al. proposed StirMark as a benchmark for image watermarking schemes. The main idea was to introduce a re-sampling process that mimics the analog process of printing and scanning a watermarked image. For digital video, the corresponding concept is a camcorder copy, where a video displayed on a screen is (digitally) recorded using a video camera. As most commercial video streaming systems (VOD, IPTV) and offline distribution (Blu-ray, HDDs for cinemas) are strongly protected by means of DRM, filming a display is actually a relevant use case and a requirement for robust video watermarking systems to survive. We therefore present a tool to simulate content re-acquisition with a camcorder. Our goal is to support watermark development by enabling automated test cases for such camcorder copy attacks, as well as to provide a benchmark for robust video watermarking. Manually creating camcorder copies is a cumbersome process, and even more problematic, it is hardly reproducible with the same setup. By re-sampling each video frame, we simulate the typical artifacts of a camcorder copy: geometric modifications (aspect ratio changes, cropping, perspective and lens distortion), temporal modifications (unsynchronized frame rates and the resulting frame blending), sub-sampling (rescaling, filtering, Bayer color array filter), and histogram changes (AGC, AWB). We also support simulating camera movement (e.g., a hand-held camera) and background insertion.
  • Daniel Schön, Philip Mildner, Stephan Kopf and Wolfgang Effelsberg. Gesellschaft für Informatik, 2014 SMASH: Ein generisches System für interaktive Szenarien in der Vorlesung. GI-Edition / Proceedings. Freiburg, Br.
  • Daniel Schön, Steffen Sikora, Stephan Kopf and Wolfgang Effelsberg. RWTH, 2014 GLA: A Generic Analytics Tool for e-Learning. CEUR workshop proceedings. Aachen
    Several software applications are used at the University of Mannheim for learning and teaching purposes. The majority of them, like lecture feedback, quizzes, forums, and wikis, are hosted within our learning management system ILIAS. In addition, we run several prototypes of serious games and mobile feedback systems. While the data generated by students and teachers is mainly used for current courses, it could be further used for Learning Analytics if it was stored in an adequate format. Considering the variable and fast-moving nature of our learning applications, we invented a concept for a generic database structure, that can handle analyses on a variety of original tools. This paper presents the prototype application GLA (Generic Learning Analytics), which tries to provide a step in the right direction. Data from wikis, forums, quizzes and serious games transformed into one homogeneous format that can be used to do comparable analyzes. Beside comparing several semesters and courses of one application, we can also match related data sets e. g. user behavior between a wiki and a file upload.


  • Torben Dittrich, Stephan Kopf, Philipp Schaber, Benjamin Guthier and Wolfgang Effelsberg. ACM, 2013 Saliency Detection for Stereoscopic Video. . New York, NY
  • Benjamin Guthier, Kalun Ho, Stephan Kopf and Wolfgang Effelsberg. ACM, 2013 Determining Exposure Values from HDR Histograms for Smartphone Photography. . [New York, NY]
  • Benjamin Guthier, Johannes Kiess, Stephan Kopf and Wolfgang Effelsberg. IEEE, 2013 Seam Carving for STereoscopic Video. . Piscataway, NJ
  • Philip Mildner, Frederik Claus, Stephan Kopf and Wolfgang Effelsberg. ACM, 2013 Navigating Videos by Location. . New York, NY
  • Daniel Schön, Melanie Klinger, Stephan Kopf and Wolfgang Effelsberg. AACE, 2013 HomeQuiz: Blending Paper Sheets with Mobile Self-Assessment Tests. . Chesapeake, Va.
  • Stefan Wilk, Stephan Kopf, S Schulz and Wolfgang Effelsberg. AACE, 2013 Social Video: A Collaborative Video Annotation Environment to Support E-Learning. . Chesapeake, Va.
    Our social video system allows users to enrich video by additional information like external websites, hypertext, images, other videos, or communication channels. Users are able to annotate whole videos, scenes, and objects in the video. We do not focus on a single user accessing the system but on multiple users watching the video and accessing the annotations others have created. Our web-based prototype differs from classical hypervideo systems because it allows annotation (authoring) and navigation in videos by focusing on collaboration and communication between the users. The prototype is integrated into the online social network Facebook and was evaluated with more than 300 users. The evaluation analyzes the usage of the system with a learning scenario in mind and indicates a learning success of users.
  • Benjamin Guthier, Johannes Kiess, Stephan Kopf and Wolfgang Effelsberg. , Technical reports. 2013 Stereoscopic Seam Carving With Temporal Consistency. Mannheim, 90170. 13-002
    In this paper, we present a novel technique for seam carving of stereoscopic video. It removes seams of pixels in areas that are most likely not noticed by the viewer. When applying seam carving to stereoscopic video rather than monoscopic still images, new challenges arise. The detected seams must be consistent between the left and the right view, so that no depth information is destroyed. When removing seams in two consecutive frames, temporal consistency between the removed seams must be established to avoid flicker in the resulting video. By making certain assumptions, the available depth information can be harnessed to improve the quality achieved by seam carving. Assuming that closer pixels are more important, the algorithm can focus on removing distant pixels first. Furthermore, we assume that coherent pixels belonging to the same object have similar depth. By avoiding to cut through edges in the depth map, we can thus avoid cutting through object boundaries.
  • Jakob Huber, Stephan Kopf and Philipp Schaber. , Technical reports. 2013 Analyse von Bildmerkmalen zur Identifikation wichtiger Bildregionen. Mannheim, 90170. 13-004
    Eine zuverlässige Erkennung wichtiger Bildregionen ist die Grundlage für viele Verfahren im Bereich der Bildverarbeitung wie beispielsweise bei der Bildkompression, bei Verfahren zur Anpassung der Bildauflösung oder beim Einfügen digitaler Wasserzeichen in Bilder. Es wurde ein System entwickelt, das Merkmalspunkte in Bildern identifiziert und diese nutzt, um wichtige Bildbereiche zu identifizieren. Zur Berechnung der Merkmalspunkte wird das SURF-Verfahren (Speeded Up Robust Features) verwendet. Die gefundenen Merkmale werden in einem zweiten Schritt einzelnen Bildregionen zugeordnet. Die Qualität der ermittelten Regionen sowie das Laufzeitverhalten der verschiedenen Verfahren werden anhand einer umfangreichen Bilddatenbank analysiert.
  • Stephan Kopf, Benjamin Guthier, Philipp Schaber, Torben Dittrich and Wolfgang Effelsberg. , Technical reports. 2013 Analysis of Disparity Maps for Detecting Saliency in Stereoscopic Video. Mannheim, 90170. 13-003
    We present a system for automatically detecting salient image regions in stereoscopic videos. This report extends our previous system and provides additional details about its implementation. Our proposed algorithm considers information based on three dimensions: salient colors in individual frames, salient information derived from camera and object motion, and depth saliency. These three components are dynamically combined into one final saliency map based on the reliability of the individual saliency detectors. Such a combination allows using more efficient algorithms even if the quality of one detector degrades. For example, we use a computationally efficient stereo correspondence algorithm that might cause noisy disparity maps for certain scenarios. In this case, however, a more reliable saliency detection algorithm such as the image saliency is preferred. To evaluate the quality of the saliency detection, we created modified versions of stereoscopic videos with the non-salient regions blurred. Having users rate the quality of these videos, the results show that most users do not detect the blurred regions and that the automatic saliency detection is very reliable.
  • Michael Magin and Stephan Kopf. , Technical reports. 2013 A Collaborative Multi-Touch UML Design Tool. Mannheim, 90170. 13-001
    The design and development of software projects is usually done in teams today. Collaborative systems based on multi-touch walls or large table-top screens could support these highly interactive tasks. We present a novel collaborative design tool which allows several developers to jointly create complex UML (Unified Modeling Language) diagrams. We have developed new algorithms to recognize the gestures drawn by the users, to create the respective elements of the diagram, to adjust the edges between classes, and to improve the layout of the classes automatically. Auxiliary lines provide the user with means to align classes precisely so a more consistent layout is achieved. Export functionality for XML and Java code skeletons completes the application; the UML diagram can thus be used in further steps of the software design process. User evaluations confirm considerable benefits of our proposed system.



  • Benjamin Guthier, Stephan Kopf and Wolfgang Effelsberg. , 2011 Optimal Shutter Speed Sequences for Real-Time HDR Video. , 90170.
    A technique to create High Dynamic Range (HDR) video frames is to capture Low Dynamic Range (LDR) images at varying shutter speeds. They are then merged into a single image covering the entire brightness range of the scene. While shutter speeds are often chosen to vary by a constant factor, we propose an adaptive approach. The scene's histogram together with functions judging the contribution of an LDR exposure to the HDR result are used to compute a sequence of shutter speeds. This sequence allows for the estimation of the scene's radiance map with a high degree of accuracy. We show that, in comparison to the traditional approach, our algorithm achieves a higher quality of the HDR image for the same number of captured LDR exposures. Our algorithm is suited for creating HDR videos of scenes with varying brightness conditions in real-time, which applications like video surveillance benefit from.
  • Stephan Kopf, Thomas Haenselmann, Johannes Kiess, Benjamin Guthier and Wolfgang Effelsberg. 2011 Algorithms for Video Retargeting. Multimedia Tools and Applications, 51, 2, 819-861






  • Stephan Kopf. 2006 Computergestützte Inhaltsanalyse von digitalen Videoarchiven.
    Der Übergang von analogen zu digitalen Videos hat in den letzten Jahren zu großen Veränderungen innerhalb der Filmarchive geführt. Insbesondere durch die Digitalisierung der Filme ergeben sich neue Möglichkeiten für die Archive. Eine Abnutzung oder Alterung der Filmrollen ist ausgeschlossen, so dass die Qualität unverändert erhalten bleibt. Zudem wird ein netzbasierter und somit deutlich einfacherer Zugriff auf die Videos in den Archiven möglich. Zusätzliche Dienste stehen den Archivaren und Anwendern zur Verfügung, die erweiterte Suchmöglichkeiten bereitstellen und die Navigation bei der Wiedergabe erleichtern. Die Suche innerhalb der Videoarchive erfolgt mit Hilfe von Metadaten, die weitere Informationen über die Videos zur Verfügung stellen. Ein großer Teil der Metadaten wird manuell von Archivaren eingegeben, was mit einem großen Zeitaufwand und hohen Kosten verbunden ist. Durch die computergestützte Analyse eines digitalen Videos ist es möglich, den Aufwand bei der Erzeugung von Metadaten für Videoarchive zu reduzieren. Im ersten Teil dieser Dissertation werden neue Verfahren vorgestellt, um wichtige semantische Inhalte der Videos zu erkennen. Insbesondere werden neu entwickelte Algorithmen zur Erkennung von Schnitten, der Analyse der Kamerabewegung, der Segmentierung und Klassifikation von Objekten, der Texterkennung und der Gesichtserkennung vorgestellt. Die automatisch ermittelten semantischen Informationen sind sehr wertvoll, da sie die Arbeit mit digitalen Videoarchiven erleichtern. Die Informationen unterstützen nicht nur die Suche in den Archiven, sondern führen auch zur Entwicklung neuer Anwendungen, die im zweiten Teil der Dissertation vorgestellt werden. Beispielsweise können computergenerierte Zusammenfassungen von Videos erzeugt oder Videos automatisch an die Eigenschaften eines Abspielgerätes angepasst werden. Ein weiterer Schwerpunkt dieser Dissertation liegt in der Analyse historischer Filme. Vier europäische Filmarchive haben eine große Anzahl historischer Videodokumentationen zur Verfügung gestellt, welche Anfang bis Mitte des letzten Jahrhunderts gedreht und in den letzten Jahren digitalisiert wurden. Durch die Lagerung und Abnutzung der Filmrollen über mehrere Jahrzehnte sind viele Videos stark verrauscht und enthalten deutlich sichtbare Bildfehler. Die Bildqualität der historischen Schwarz-Weiß-Filme unterscheidet sich signifikant von der Qualität aktueller Videos, so dass eine verlässliche Analyse mit bestehenden Verfahren häufig nicht möglich ist. Im Rahmen dieser Dissertation werden neue Algorithmen vorgestellt, um eine zuverlässige Erkennung von semantischen Inhalten auch in historischen Videos zu ermöglichen.



  • Stephan Kopf, Thomas Haenselmann, Dirk Farin and Wolfgang Effelsberg. SPIE, 2004 Automatic Generation of Summaries for the Web. Proceedings of SPIE. Bellingham, Wash.
    Many TV broadcasters and film archives are planning to make their collections available on the Web. However, a major problem with large film archives is the fact that it is difficult to search the content visually. A video summary is a sequence of video clips extracted from a longer video. Much shorter than the original, the summary preserves its essential messages. Hence, video summaries may speed up the search significantly. Videos that have full horizontal and vertical resolution will usually not be accepted on the Web, since the bandwidth required to transfer the video is generally very high. If the resolution of a video is reduced in an intelligent way, its content can still be understood. We introduce a new algorithm that reduces the resolution while preserving as much of the semantics as possible. In the MoCA (movie content analysis) project at the University of Mannheim we developed the video summarization component and tested it on a large collection of films. In this paper we discuss the particular challenges which the reduction of the video length poses, and report empirical results from the use of our summarization tool.
  • Stephan Kopf, Thomas Haenselmann, Dirk Farin and Wolfgang Effelsberg. IEEE Operations Center, 2004 Automatic generation of video summaries for historical films. . Piscataway, NJ
    A video summary is a sequence of video clips extracted from a longer video. Much shorter than the original, the summary preserves its essential messages. In the ECHO (European Chronicles On-line) project, a system was developed to store and manage large collections of historical films for the preservation of cultural heritage. At the University of Mannheim, we have developed the video summarization component of the ECHO system. We discuss the particular challenges the historical film material poses, and how we have designed new video processing algorithms and modified existing ones to cope with noisy black-and-white films.