
For many audio-visual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated with their respective audio signals in the form of text. The video and audio signals are captured and processed separately. The signals are buffered and integrated and synchronized using a time-stamping technique. Time-stamps provide the timing information for each of the audio and video processes, the speech recognition and the object detection, respectively. This information is necessary to correlate the audio packets to the video frames. Hence, integration is achieved without the use of video information, such as lip movements. The results obtained are based on a specific implementation of the speech recognition module, which is determined to be the bottleneck process in the proposed system.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
