Intelligent Video Conferencing

Status Report



The system proposed is composed of two major subsystems: tracking and transmission. The goal is to use standard IP networking allowing for the system to be easily portable to a variety of computer architectures and operating systems, and over any of the many networks which can transport IP data. While the prototype system will be developed on a Windows platform, it should be easily ported to the Macintosh, or to a variety of Unix systems. In this effort, a minimal amount of low-level driver code will be in C, while the majority of the code will be written in Java. Writing in Java allows the development team to take advantage of Java's cross-platform nature, built-in graphical tools and the independence gained by allowing control via standard Java-capable web browsers versus having to design and implement a complete application with a static set of features. The following diagram summarizes these systems, and their interactions.

The tracking system, while a completely separate project in some respects, is the essential feature which differentiates this system from others like it. The basis for the system is a transducer which outputs a 40 KHz sinusoidal signal. This transmitter would be worn by the user and pulsed occasionally to provide the system with a point to lock onto. Similar transducers would be used to receive the signal, and based on the difference in time between when they received the signal, it can be determined which direction the signal originated. This information would then be used to drive two stepping motors which move the tripod mount to which the camera is attached.

This product would be a third-party add-on to the QuickCam, allowing users to utilize their existing equipment without alteration. They would simply need to mount the camera to the motorized platform and supply power to it in order to gain the image following capabilities. The basic video conferencing features would be available with or without the motorized mount, and could be easily distributed via the Internet.


Overview of Operation

Personal Computer-based Video Conferencing

The basis for the project is a gray-scale QuickCam by Connectix Corp. The Windows version uses a standard 25-pin parallel port and a pass-through connector on the keyboard port of computers running Windows 3.x or 95. Windows NT is scheduled for support in the near future, and OS/2 in Windows compatibility mode already functions. This same camera also works with the variety of Unix operating systems available for the Intel platform. A serial port version is available for the Macintosh, and can be used with other computers which do not use a parallel port. The QuickCam is a digital image capture system, and as such does not need an additional video-capture card. The gray-scale QuickCam comes from the factory with a fixed focus lens set for 18 inches to infinity; however, it is possible to adjust this beyond factory specifications. The camera also supports images of up to 320x240 pixels. A color version is also available, and should be easy to integrate into the Video-conferencing system if the end user chooses--that product has a manual focus lens for 1 inch to infinity and maximum resolution of 640x480. In either case, the frame rate depends on the resolution in use, speed of host computer and other similar factors.

While the QuickCam is an important part of the video-conferencing system, the system proposed is to be marketed as an add-on for those already owning a QuickCam. In addition to the acoustical tracking system and motorized camera mount, the proposed system involves software written in Java. Due to the security features of the Java language, it is necessary to manually install the software on the host computer prior to using it, though it is possible to obtain the software online. Java does not allow software on a remote computer to access local resources such as disk drives, input/output ports, etc. in order to minimize the risk of "Trojan horse" type attacks. The Java system will implement all the user controls for the system, including options for manual control of camera movement, changes in resolution and frames per second and audio controls. The Java applets would be accessed via a standard Web browser, such as Netscape Navigator or Microsoft Internet Explorer, allowing the overall user interface to match and keep pace with the user's preferred Web environment.

While the Java environment is preferred for its built-in tools and cross platform availability, it is not quite as fast as C or C++; this becomes very important for the time critical applications of interfacing with the audio and video subsystems and sending/receiving data over the network. Therefore, these components will be written in C or C++ and interface with the Java code via hooks built into the Java language. As a result of this design decision, these components will need localization when the system is ported to other platforms. However, that process should be kept to minimum by use standardized library calls and allowing the Java language to be used whenever possible.

An overview of the system can be represented by the block diagram shown on the previous page. The user would interact with a web browser, which would present not only the video images but also the controls for operating the system. Both would be in the form of Java applets within the web browser window. These applets would then interface with the three subsystems via C or C++ code. The video subsystem would allow communication to and from the QuickCam, the audio subsystem would integrate with the microphone and speaker, and the network subsystem would handle the transmission and reception of data from the network.

While portions of this project have been done by others, our searches have yet to find any implementation of our project as a whole. The QuickCam is heavily supported, and in some respects is the standard video capture device, for the CU-SeeMe program. CU-SeeMe, developed at Cornell University, is a point-to-point video-conferencing system that operates under the Macintosh and, to a lesser extent, Windows operating systems. It is a stand-alone application for video-conferencing, but does not support the image following capabilities of the proposed system. Pseudo-multicast use of CU-SeeMe is possible by pointing an individual client to a "Reflector" which retransmits the data to each of its clients--the "Reflector" is an additional software component and runs only on a variety of Unix platforms. Information on CU-SeeMe in general can be found at <URL:>, and information on the commercial version can be found at <URL:>.

Another project of interest is A QuickCam Inside a Java Applet by Ralf Ackermann. While the system is no longer available for public use, the information on the project is still online. This implementation uses a QuickCam connected to a computer running Linux (a freely distributable and fully functional Unix operating system). Users from a limited set of systems can use any web browser that supports Java to pan and tilt the camera. Their commands are received by the server and translated where a C program interfaces with a set of stepper motors to move the camera. In this implementation, Java is used only for the controls by the remote user and a combination of custom C code and a modified web server are used on the host machine. The system does not implement audio nor does it automatically follow a given subject--movement of the camera is the responsibility of the viewer. More information on this implementation can be found at <URL:>.

The third major implementation that is providing background information for this project is QuickCam for Linux or Qcam Multimedia by Scott Laird and Jerry Kalkhof. This is a series of projects which are being built up to become a platform independent video-conferencing system. The implementation began with a simple video only service in C/C++, grew into one with audio and video services and is now under development as a Java-based system for use with Netscape Navigator. Among the ideas proposed by Laird and Kalkhof is a packet format allowing for a good synchronization between the audio and video. More information is available from <URL:>.

Image Following

The I.V.C. can track a user within a 15 foot radius from the camera, about 60 degrees in either direction. It will not follow swift, sudden motion as it would be undesirable for the viewer to see the quick, jagged motion of the picture. On the contrary, the camera will have a smooth steady motion to follow the user's movement. This will require the use of a tracking scheme to sense position, and some robotics to rotate the camera.

During the research stage, it was realized that the most efficient way to implement a tracking device would be to make it a stand alone system. That is to say, the tracking device will be completely separate from the video conferencing and therefore will not impede the performance of the network in any way. The next consideration was to use a specific transmitter and receiver that are constant in their characteristics. This makes the system easier to implement in that there is a constant, deterministic homing signal rather than a stochastic one, such as the user's voice, which would be constantly changing. These considerations left us with two options which we were considering: an infrared system or an audio system (both of which would be undetectable by the user). Being that sound travels much slower than light, an audio system seemed the most appropriate (due to the ease of detecting time arrival differences).

For ease of design, the tracking device will consist of an audio transmitter and audio receivers. Each of these components are devices based on the piezo electric effect and emit/receive sound at 40 Khz (+/-1Khz), which is well beyond the range of human hearing. The transmitter will be attached to the user while the receivers will be mounted around the camera. Audio signals will be emitted either as steady pulses or as user generated pulses (by the push of a "buzzer" button) and the tracking device will only be enabled while the pulses are being transmitted. The receivers will pick up the pulses in the order in which they arrive at each component, and the delay between each component is what will determine the position of the user. The delay data will be collected by the Motorola HC11 microcontroller which has a machine cycle of approximately 0.5 msec (compared to the delay which will be on the order of 30 msec). As can be easily seen, this will provide plenty of resolution. The process of timing the delay will be implemented with the input capture pins of the HC11. The microcontroller's input capture pins can detect either a rising or falling edge. Thus, voltage comparators will be used to bring the transducer outputs to TTL levels. The delay time will then be retrieved from the capture registers and used as the data for the position sensing.

Hardware Flowchart

For the purpose of moving the QuickCam, the choices are between using a continuous motor or a servo-motor, both of which require analog feedback, or a stepping motor which can use digital feedback. Since it is more desirable to use digital feedback from the microcontroller, the stepping motor is the obvious choice. In order to choose the proper stepping motor, the practical parameters such as angular tolerance, holding torque, and ease of control must be considered. With these considerations, an affordable, low power, unipolar motor, seems to be a prime candidate.

In this design, all motion will be rotational since the QuickCam is spherical. To implement rotation in two directions, two motors will be used. The motors are unipolar, rated at 6V with 1.8 degrees per step, and a holding torque of 22.2 oz.-in. This will provide plenty of resolution, especially when half stepped, and sufficient torque since it is not actually moving the camera's position but simply rotating it about its axis. Each motor will receive feedback that has been collected from the two receivers in the "L" shaped array which correspond to the motion which that motor provides. The motors will continue to step until the delay between the two receivers is minimized. They will be full stepped as long as the delay is larger than some value (to be determined later), and half stepped when the delay falls below that value so that we can achieve better resolution.

The motors will be controlled by the Motorola HC11 microcontroller through the parallel port. One motor will use the four least significant bits of the port while the other motor will occupy the most significant bits. The operation of the stepper motors can be explained by visualizing each winding lead of the motor (a, b, c, and d), and considering a and b to be the two ends of one winding and c and d to be the two ends of the other winding. In this case, the sequence for full stepping would be:

a	1000100010001000	1100110011001100
b	0010001000100010	0011001100110011
c	0100010001000100	0110011001100110
d	0001000100010001	1001100110011001
where the first sequence would require less power, but the second sequence would generate more holding torque since both windings would be energized at the same time. When we combine the two full stepping sequences, (successive cases where only one winding is energized and then both are energized and so on) in the above sequence, we obtain:
a	110000011100000111000001
b	000111000001110000011100
c	011100000111000001110000
d	000001110000011100000111
which is the half stepping sequence. In order to turn the motor in the other direction, we will simply reverse the sequence. The numbers 1 and 0, correspond to bits being transmitted by the microcontroller. Care must be taken in how fast these bits are sent to the motor and also how well the current in the windings is regulated. The main restriction for speed will be the limit at which the camera can be rotated without sacrificing the visual quality of the picture. This will be determined by rotating the camera at different speeds and adjusting the speed according to visual preference. Current amplification for the motors can be implemented with the use of bipolar transistors, or TTL level FETs.

The actual assembly of the system will use a very lightweight, low cost material since there is not a lot of weight produced by the components. One possibility is that each motor will have a rod connected to its rotor and a ball, much like a mouse track ball, connected to the end of the rod. These mouse balls will be touching the camera at two separate spots, the top and the back. As the motor turns, it will rotate the mouse ball which will in turn rotate the camera through friction. The receivers will be attached to the camera in such a way that they will rotate with the camera, this way the delay will always be from the camera's perspective. There will be a simple, cube-like truss which encompasses the camera and the receivers, and have the motors mounted to it. The truss must allow the camera to rotate freely while holding it securely in place. This can be implemented by using a three point contact system ("Y" shaped) on the bottom of the camera, along with the rotating points of contact at the top and back, for a total of five contact points.

Return to the index or continue to information on the current design and implemtnation status