Kimmo Löytänä, Nokia Multimedia Network Terminals
Tommi Riikonen, Nokia Multimedia Network Terminals
Digitalization does not only mean more channels in the same bandwidth but also data to television. In the early 90's, video-on-demand (VOD) seemed to be an obvious application for interactive television, but after a few years of trials, affordable VOD-services still seem to be five years ahead. During the last few years, the internet and the Web have been hot topics and also economically very promising technology, so it is not surprising that the internet discussion is pushing its way also into the interactive television world.
Television and internet engineering cultures are traditionally very different. Television broadcasting is based on long lasting rarely changed standards, with products created for mass markets. Internet technology is based on constantly developing standards and solutions, with services traditionally created for individuals rather than for the public at large. Television sets are expected to last for over 10 years, but PCs should be updated every second year. During the last two years, these two worlds have come clearly closer to each other. Internet access and the Web are aggressively marketed to every home, and television is experiencing its biggest change since the introduction of color-TV by promising individualized and interactive services for former couch potatoes.
The first is the ISO's MPEG (Motion Picture Expert Group) work group. MPEG-1 defines a standard for video amd audio coding, primarily meant for stand-alone PC and video-CD applications, with data rates of around 150 kB/s (about 1.2 Mbits/s). Picture quality is comparable to VHS-video. MPEG-2 has been developed for broadcast quality video coding with high speed data rates of up to 15 Mbits/second and even higher. MPEG-2 doesn't define only video and audio coding, but it also defines how data should be included in multiplexed audio and video streams, which can form either one program stream or transport stream consisting of several not necessarily related video, audio and data streams. In addition to these, MPEG-2 defines also a standard for controlling the presentation of these streams with a protocol called DSM-CC for Digital Storage Media - Command and Control.
DSM-CC offers several types and levels of interface for developing interactive services for digital television. The DSM-CC object carousel can be seen as an object based file structure for broadcast data. The DSM-CC object carousel also provides a mechanism for pointing to video streams as DSM-CC objects. The DSM-CC user-to-user application interface can be used to access these object carousels, although the same interface is also used with fully interactive services in bi-directional network environments. DSM-CC has also defined data carousels, which can be used for broadcasting any data without a hierarchical structure. The data carousel is a kind of lower level protocol compared to the object carousel, and is meant for several kind of downloading purposes.
The DVB (Digital Video Broadcast) project was started in Europe in 1993. It aims to make a proposal for the technical specification of digital broadcasting, which would then be recognized by official standardization bodies like ETSI. Representatives from all sectors of the television programme chain are taking part in work on DVB. These include broadcasters and programme producers, transmission companies and satellite operators, equipment and consumer electronics manufacturers and representatives of government departments. There are currently over 200 members from over 25 countries. DVB has defined transmission standards for satellite and cable networks and will soon complete specifications for terrestrial and MMDS (often called wireless cable) networks. In addition to technical specifications for modulation etc. in different networks, DVB has defined, for example, the use of SI (Service Information) and PSI (Program Service Information) in order to gain better interoperability between different operators and terminals and to allow easy navigation through a vast number of different services. DVB is based on the MPEG-2 video coding standard and it has adopted DSM-CC, including object carousel and data carousel, in it's specifications for interactive services. DVB specifications are also starting to be used in many other countries in the Far East and South America and also by several operators in the US.
DAVIC (Digital Audio-Visual Council) is an international consortium for defining common methods of creating end-to-end solutions for interactive television environments. DAVIC also has members from all over the world, but they are perhaps more manufacturing and computer oriented than members of DVB. DAVIC is originally more oriented towards bi-directional services and networks than broadcast environments, whereas DVB started from broadcasting and is now moving towards more interactive services. DAVIC is trying to select the most suitable solution for every part of the end-to-end concept. The idea is to use existing technologies whenever possible. DAVIC has included most of the DVB specifications as the basis of its own, and also parts from, for example, the DSM-CC, TCP/IP, OSI, MPEG and MHEG standards. HTML was chosen for presenting text with different fonts like headers and italics.
All three standardization organizations work rather closely together and many companies take an active part in the work of all three organizations. DAVIC has also contacted W3C for closer cooperation in defining how to use HTML in interactive television environments.
One possible use for HTML on television is advanced teletext functions. In analog television, teletext has been used for different services for many years, especially in Europe. In the USA, it has been less popular, partly because there has not been an agreed standard on its usage. In analog television, teletext is implemented by sending data in a limited bandwidth area called the VBI (vertical blanking interval) together with the broadcast video stream. This has allowed broadcasters to send text and limited graphics for program information, news and simple advertisements. It can also be used, for example, in subtitling services. User interactivity is usually limited to selecting the desired page by entering a three digit number with a remote controller. In the US there are also some more advanced services based on the VBI and modem connections or cable return paths, but these proprietary solutions have not become very popular and they often demand additional hardware to be connected to the analog TV set.
Digitalization of television changes the possibilities of teletext type services in two ways. The bandwidth used for teletext services can be decided by the broadcaster, since the multiplexed stream can hold video, audio and data information in any proportion. Secondly, digital receivers always contain rather powerful processors in order to handle SI information and display it using an EPG (Electronic Program Guide). This processor can also be used to support more advanced interactivity than traditional teletext offers. HTML is a very good candidate for creating these applications for the same reasons that made it popular on the internet.
Typical advanced teletext services could include interactive news, extended EPG services and home shopping applications. An example application could be a normal TV program or advertisement on which a link to additional information is shown at a specific time. If the user selects the link, he or she sees new HTML either on top of the same TV program or is taken to a completely new screen. This additional information can be either broadcast or fetched from the network depending on the type of infrastructure used. However, this type of application would demand that we define common methods for pointing to different pages and other elements in broadcast environments (discussed in the chapter on URL addressing) and a definition for scheduling video based events (discussed in the chapter on Scheduling). Since video is the most natural media element for television, we also need methods to control the video streams played from local devices, such as hard disc, CD-ROM, Video-CD and DVD, as well as video streams delivered over various network structures. These requirements are discussed in the following chapter.
The usual VCR control possibilities, namely: play, stop, pause, fast forward, slow and rewind are the most obviously needed control mechanisms. It would be also nice to be able to define the speed of the fast forward, slow and rewind functions. The possibility to start playing from a defined place and play until a defined place, and looping are also needed. It should also be possible to show a selected frame as is done with e.g. Windows MediaPlayer using the slider. To support several simultaneous videos from the same document, it would be good to use identifiers instead of using URL addresses in all tags. The identifier could be tied to a specific URL in a separate tag. We also need to define a background video tag.
We have defined and used tags like video:///videos/myvideo.mpg for opening a video stream and control:///play and control:///pause for controlling it.
These definitions were made before discussions of object tag were started. Object tag would seem to give the required possibilities at least for identifying the object.
<OBJECT ID="vid1" DECLARE DATA="dvb:///right_path/myvideo.mpg"
TYPE="application/mpeg2video" >
-- here we could also define the display size for the object with WIDTH and HEIGHT
</OBJECT>
The actual control buttons could then be declared in forms or other object tags. The problem is how to inform the object which control option we want to use and how to allow the user to set other possible parameters like speed. One possible solution might be to use video players as objects and then use a technique, similar to Java applets, for specifying the method like #vid1.play and #vid1.pause. PARAM elements could be used for setting speed or looping parameters.
Here is an example of how we could define a video object with time and space related links. This is a modification of examples found in "Inserting objects into HTML, WD-object-960412, http://www.w3.org/public/WWW/TR/WD-object.html".
First, with a separate modified map file implementation:
<OBJECT ID="movie1" DATA="video:///movies/myvideo.mpg" TYPE="application/mpeg2video" USEMAP="#imap1">
</OBJECT>
<MAP NAME=imap1>
<AREA SHAPE=rect HREF="video:///movie2.mpg" COORDS="0,0,100,100" TIME="0.0,10.0" TIMETYPE=SEC ALT=Movie2 >
<AREA SHAPE=rect HREF="video:///movie3.mpg" COORDS="0,0,100,100" TIME="10.0,20.0" TIMETYPE=SEC ALT=Movie3 >
</MAP>
Another possibility would be to extend the anchor element to permit four new attributes: SHAPE, COORDS, TIME and TIMETYPE:
<OBJECT ID="movie1" shapes
DATA="video:///movies/myvideo.mpg" TYPE="application/mpeg2videomap"
<A HREF="video:///movies/movie2.mpg" SHAPE=rect COORDS="0,0,100,100" TIME="0.0,10.0" TIMETYPE=SEC >Movie 2</A>
<A HREF="video:///movies/movie3.mpg" SHAPE=rect COORDS="0,0,100,100" TIME="10.0,20.0" TIMETYPE=SEC >Movie 3</A>
</OBJECT>
Instead of a TIMETYPE attribute, it might be easier to use "standard time units" (comparable to standard units for lengths) like ms for milliseconds, s for seconds, etc. Instead of using a real URL address in HREF, it should be possible to use the ID of some other object. In these examples, videos which are referenced cannot have any "image maps". If we would like to use image maps with the referenced video, we could of course point to an HTML page containing an object defining the desired functions.
It would be nice to have a tag within the document itself to specify the length of time that the document should be shown, and which page should be presented next. Actually, if we use time attributes, it might be possible to specify an object's appearance and disappearance for each object separately. Would this lead us too near to MHEG and HyTime standards?
The first possibility is to use HTTP addresses as usually used in internet networks. A typical address would be in the form http://www.server.net/right_path/right.file. This does not demand any changes on the network side, except in the network gateway, but it demands implementation of a TCP/IP stack on the client and the use of IP over MPEG. IP over MPEG definitions are currently under development in the DAVIC and MPEG organizations. This type of addressing would be most suitable for files, which are really fetched from the internet.
DSM-CC client software will be implemented on many DVB compliant terminals, thus using the DSM-CC user-to-user type of addressing for accessing broadcast object carousels and interactive service files. This would be very similar to internet addressing, since object carousels also form hierarchical structures and can have server names. This would allow us to use the same addressing mechanism for both interactive and broadcast data, and there would be no need to implement a TCP/IP stack for terminals.
DSM-CC data carousels can be used for broadcasting any data without a hierarchical structure. They could be used for receiving HTML pages, if we agree on a common addressing mechanism. Implementation of support for data carousels demands less resources and it will be an attractive addressing mechanism, especially for simple receivers supporting simple advanced teletext applications
The MPEG transport stream addresses different services with PIDs (Program IDentifiers). It would be possible to use PID values for pointing to a certain file, but this would lead to similar troubles as with using IP numbers on the internet. For several reasons, a broadcaster might change the PID of a certain service just as a network administrator sometimes has to change a certain machine's IP-number while restructuring his network. For this reason it would be better to use some other addressing mechanism in DVB broadcast streams. This could be based on Service Information (SI) tables in a similar fashion to DNS support in IP-networks.