Intelligent Video Conferencing

Design Overview

The system proposed is composed of two major subsystems: tracking and transmission. The goal is to use standard IP networking allowing for the system to be easily portable to a variety of computer architectures and operating systems, and over any of the many networks which can transport IP data. While the prototype system has been developed on a Linux platform (a Unix variant), it should be easily ported to the Apple Macintosh or Microsoft Windows, or any other hardware-supported operating system. In this effort, a minimal amount of low-level driver code is in C, while the majority of the code is written in Java. Writing in Java allows the development team to take advantage of Java's cross-platform nature, built-in graphical tools and networking features. The following diagram summarizes these systems, and their interactions.

Subsystems

The tracking system, while a completely separate project in some respects, is the essential feature which differentiates this system from others like it. The basis for the system is a transducer which outputs a 40 KHz sinusoidal signal. This transmitter is worn by the user and pulsed occasionally to provide the system with a point to lock onto. Similar transducers are used to receive the signal. The information they receive is then used to drive two stepping motors which move the tripod mount to which the camera is attached.

This product is a third-party add-on to the QuickCam, allowing users to utilize their existing equipment without alteration. They would simply need to mount the camera to the motorized platform and supply power to it in order to gain the image following capabilities. The basic video conferencing features would be available with or without the motorized mount, and could be easily distributed via the Internet.

Overview

Personal Computer-based Video Conferencing

The basis for the project is a gray-scale QuickCam by Connectix Corp. The Windows version uses a standard 25-pin parallel port and a pass-through connector on the keyboard port of computers running Windows 3.x or 95. Windows NT is scheduled for support in the near future, and OS/2 in Windows compatibility mode already functions. This same camera also works with the variety of Unix operating systems available for the Intel platform. A serial port version is available for the Macintosh, and can be used with other computers which do not have a parallel port. The QuickCam is a digital image capture system, and as such does not need an additional video-capture card. The gray-scale QuickCam comes from the factory with a fixed focus lens set for 18 inches to infinity; however, it is possible to adjust this beyond factory specifications. The camera also supports images of up to 320x240 pixels. A color version is also available, and should be easy to integrate into the video conferencing system if the end user chooses--that product has a manual focus lens for 1 inch to infinity and maximum resolution of 640x480. In either case, the frame rate depends on the resolution in use, speed of host computer and other similar factors.

While the QuickCam is an important part of the video conferencing system, the system proposed is to be marketed as an add-on for those already owning a QuickCam. In addition to the acoustical tracking system and motorized camera mount, the proposed system involves software written in Java. Due to the security features of the Java language, it is necessary to manually install the software on the host computer prior to using it, though it is possible to obtain the software online. Java does not allow software on a remote computer to access local resources such as disk drives, input/output ports, etc. in order to minimize the risk of "Trojan horse" type attacks. The Java system implements all the user controls for the system.

While the Java environment is preferred for its built-in tools and cross platform availability, its cross-platform nature does not offer one access to hardware devices, such as the QuickCam or a microphone, as do C or C++. Therefore, these components are/will be written in C and interfaced with the Java code via hooks built into the Java language. As a result of this design decision, these components will need localization when the system is ported to other platforms. However, that process should be kept to minimum by use standardized library calls and allowing the Java language to be used whenever possible.

An overview of the system can be represented by the block diagram shown. The user would interact with a Java application. This would then interface with the C-based audio and video subsystems. The video subsystem would allow communication to and from the QuickCam; the audio subsystem would integrate with the microphone and speaker. Network activity would be through the existing Java-native commands.

While portions of this project have been done by others, our searches have yet to find any implementation of our project as a whole. The QuickCam is heavily supported, and in some respects is the standard video capture device, for the CU-SeeMe program. CU-SeeMe, developed at Cornell University, is a point-to-point video-conferencing system that operates under the Macintosh and, to a lesser extent, Windows operating systems. It is a stand-alone application for video-conferencing, but does not support the image following capabilities of the proposed system. Pseudo-multicast use of CU-SeeMe is possible by pointing an individual client to a "Reflector" which retransmits the data to each of its clients--the "Reflector" is an additional software component and runs only on a variety of Unix platforms. Information on CU-SeeMe in general can be found at <URL:http://cu-seeme.cornell.edu/>, and information on the commercial version can be found at <URL:http://www.cu-seeme.com/>.

Another project of interest is A QuickCam Inside a Java Applet by Ralf Ackermann. While the system is no longer being demonstrated, the information on the project is still online. This implementation uses a QuickCam connected to a computer running Linux. Users from a limited set of systems can use any web browser that supports Java to pan and tilt the camera. Their commands are received by the server and translated where a C program interfaces with a set of stepper motors to move the camera. In this implementation, Java is used only for the controls by the remote user and a combination of custom C code and a modified web server are used on the host machine. The system does not implement audio nor does it automatically follow a given subject--movement of the camera is the responsibility of the viewer. More information on this implementation can be found at <URL:http://alwin.informatik.tuchemnitz.de/~java/ROBOT_CAMERA/example1.html>.

The third major implementation that is providing background information for this project is QuickCam for Linux or Qcam Multimedia by Scott Laird and Jerry Kalkhof. This is a series of projects which are being built up to become a platform independent video-conferencing system. The implementation currently has a working C program for interfacing with the QuickCam, and the beginnings of a set of Java applets and applications for the video conferencing project. Among the ideas proposed by Laird and Kalkhof is a packet format allowing for a good synchronization between the audio and video. More information is available from <URL:http://www.cs.odu.edu/~kalkhof/quickcam/qcam-0.3/>.

Image Following

The system can track a user within a 15 foot radius from the camera, about 86 degrees left or right and 42 degrees up and down. This requires the use of a tracking scheme to sense position, and some robotics to rotate the camera. For the tracking scheme, a two dimensional sensor is used, one dimension for lateral movement and the other for height adjustment. So that the user is centered in the frame, an offset to account for the distance between the tracking signal and the user's face was included, as is an angular tolerance of approximately one step (or +/-3.6 degrees) in each dimension.

During the research stage, it was realized that the most efficient way to implement a tracking device would be to make it a stand-alone system. That is, the tracking device is completely separate from the video conferencing and, therefore, will not impede the performance of the network in any way. It was decided to use a specific audio transmitter and receiver that are constant in their characteristics. This allows for a constant, deterministic homing signal rather than a stochastic one, such as the user's voice, which would be constantly changing. Being that sound travels slowly and the wavelengths are generally large, an audio system seemed the most appropriate for this design.

For ease of design, the tracking device consists of an audio transmitter and three audio receivers. Each component is a DC device and emits/receives sound at 40 KHz (+/-1KHz) which is well beyond the range of human hearing. The transmitter is attached to the user while the receivers are mounted at the camera in an "L" shape. In order to mount these receivers, an easily bendable metal was used, allowing for the components to be angled away from each other slightly for directionality.

Audio signals are emitted as user generated pulses, by the push of a "buzzer" button, and the tracking device is only enabled while the pulses are being transmitted. The receivers pick up the pulses in the order in which they arrive at each respective component and this determines which direction we need to move to center the user. The process of ordering the arrival of the signals uses the three input capture pins of the HC11 microprocessor, one for each receiver. The microprocessor can detect either a rising or falling edge and the input was regulated to 5V using external circuitry. The order of capture is then retrieved from the capture registers and used as the data for the direction sensing.

Tracking System
Operations

For the purpose of motion the choice was between using a continuous motor or a servomotor, both of which require analog feedback, or a stepping motor which can use digital feedback. Since the use of digital feedback from the microprocessor was desired, the stepping motor was the obvious choice. The specific motor was chosen in consideration of the practical parameters such as angular tolerance, holding torque, and ease of control. Selected was a 12V, unipolar motor with 3.6 degrees per step and a holding torque of 600 g-cm.

The motors have a holding torque of 600 g-cm or approximately 0.52 in-lbs. Since the camera is considerably lighter than the motors, and the motors are only 0.5 lbs, it was found that the motors supplied plenty of holding torque to ensure that the camera did not slip. To maximize the torque supplied by the motor, a step sequence which drives current through two windings simultaneously was chosen. This produces 1.4 times the torque of having current through just one winding at a time, though it needs a fairly large current supply. With current through two windings simultaneously and two motors in the design, a 12 VDC power source which could sustain at least 700 mA was required. Chosen was a wall-mounted transformer which has an input of 110VAC at 60Hz and an output of 12VDC with a maximum current of 1200mA. The steady state torque needs of the system were minimized by pivoting the assembly about its neutral axis.

In this design, all motion is rotational since the QuickCam is spherical. To implement rotation in two directions, two motors are used. Each motor receives feedback that was collected from the two receivers in the "L" shaped array which correspond to the motion which that motor provides. The motor continues to step until the user is determined to be in the center of the frame. This is done by stepping twice and then checking to see if the user is still off in the same direction as before. If the direction has changed then the motor will stop stepping. having passed the user by the tolerance distance or less.