VCN (Video, Compression, Networking) Glossary
This is a collection of often used and misused technical terms
regarding video, compression and networking.
Many sources contributed to this list.
If you wish to contribute, correct any mistake or just send your
comments and impressions please contact :
Luigi.Filippini@crs4.it
Although sometimes used interchangeably, advanced and high-definition
television (
HDTV) are not one and the same. Advanced television
(ATV) would distribute wide-screen television signals with resolution substantially
better than current systems. It requires changes to current emission regulations,
including transmission standards. In addition, ATV would offer at least two-channel,
CD-quality audio.
The a:b:c notation for sampling ratios, as found in the CCIR-610
specifications, has the following meaning :
- 4:2:2 means 2:1 horizontal downsampling, no vertical downsampling. (Think
"4 Y samples for every 2 Cb and 2 Cr samples in a scanline".)
- 4:1:1 *ought* to mean 4:1 horizontal downsampling, no vertical. (Think
"4 Y samples for every 1 Cb and 1 Cr samples in a scanline".) But it is
often misused to mean the same as:
- 4:2:0 means 2:1 horizontal and 2:1 vertical downsampling. (Think "I want
some of whatever these guys were taking.")
Not only is this notation not internally consistent, but it is incapable of
being extended to represent any unusual sampling ratios, eg different ratios
for the Cb and Cr channels.
Perhaps the major drawback to each of the
Huffman
encoding techniques is their poor performance when processing texts where one
symbol has a probability of occurrence approaching unity. Although the
entropy
associated with such symbols is extremely low, each symbol must still be encoded
as a discrete value.
Arithmetic coding removes this restriction by representing messages
as intervals of the real numbers between 0 and 1. Initially, the range of values
for coding a text is the entire interval [0, 1]. As encoding proceeds, this
range narrows while the number of bits required to represent it expands. Frequently
occurring characters reduce the range less than characters occurring infrequently,
and thus add fewer bits to the length of an encoded message.
ATM (Asynchronous Transfer Mode) is a switching/transmission
technique where data is transmitted in small, fixed sized cells (5 byte header,
48 byte payload). The cells lend themselves both to the time-division- multiplexing
characteristics of the transmission media, and the packet switching characteristics
desired of data networks. At each switching node, the ATM header identifies
a virtual path or virtual circuit that the cell contains data
for, enabling the switch to forward the cell to the correct next-hop trunk.
The virtual path is set up through the involved switches when two endpoints
wish to communicate. This type of switching can be implemented in hardware,
almost essential when trunk speed range from 45Mb/s to 1Gb/s.
The human visual system has much less acuity for spatial variation
of colour than for brightness. Rather than conveying RGB, it is advantageous
to convey luma in one channel, and colour information that has had luma removed
in the two other channels. In an analog system, the two colour channels can
have less bandwidth, typically one-third that of luma. In a digital system each
of the two colour channels can have considerably less data rate (or data capacity)
than luma.
Green dominates the luma channel: about 59% of the luma signal
comprises green information. Therefore it is sensible, and advantageous for
signal-to-noise reasons, to base the two colour channels on blue and 1red. The
simplest way to remove luma from each of these is to subtract it to form
the difference between a primary colour and luma. Hence, the basic video
colour-difference pair is (B-Y), (R-Y) [pronounced "B minus Y, R minus Y"].
The (B-Y) signal reaches its extreme values at blue (R=0, G=0,
B=1; Y=0.114; B-Y=+0.886) and at yellow (R=1, G=1, B=0; Y=0.886; B-Y=-0.886).
Similarly, the extrema of (R-Y), +-0.701, occur at red and cyan. These are inconvenient
values for both digital and analog systems. The colour spaces YPbPr, YCbCr,
PhotoYCC and YUV are simply scaled versions of (Y, B-Y, R-Y)
that place the extrema of the colour difference channels at more convenient
values.
Bridges are devices that connect similar and dissimilar LANs
at the data link layer (
OSIlayer 2), regardless of the physical
layer protocols or media being used. Bridges require that the networks have
consistent addressing schemes and packet frame sizes. Current introductions
have been termed learning bridges since they are capable of updating node address
(tracking) tables as well as overseeing the transmission of data between two
Ethernet LANs.
Brouters are
bridge/
router
hybrid devices that offer the best capabilities of both devices in one unit.
Brouters are actually
bridges capable of intelligent routing
and therefore are used as generic components to integrate workgroup networks
. The
bridge function filters information that remains
internal to the network and is capable of supporting multiple higher-level protocols
at once.
The router component maps out the optimal
paths for the movement of data from one point on the network to another. Since
the brouter can handle the functions of both bridges and
routers, as well as bypass the need for the translation
across application protocols with gateways, the device offers significant cost
reductions in network development and integration.
Commite' Consultatif International de Telecommunications et
Telegraphy A committee of the International Telecommunications Union responsible
for making technical recommendations about telephone and data communication
systems for PTTs and suppliers. Plenary sessions are held every four years to
adopt new standards.
CD-DA (Compact Disc-Digital Audio), are standard music CDs.
CD-DA began
CD-ROM when people realized that you could
store a whole bunch of computer data on a 12cm optical disc (650mb).
CD-ROM
drives are simply another kind of digital storage media for computers, albeit
read-only. They are peripherals just like hard disks and floppy drives. (Incidentally,
the convention is that when referring to magnetic media, it is spelled
disk.
Optical media like CDs, LaserDisc, and all the other formats are spelled
disc)
CD-I means
Compact Disc Interactive. It is meant to provide
a standard platform for mass consumer interactive multimedia applications. So
it is more akin to
CD-DA, in that it is a full specification
for both the data/code and standalone playback hardware: a CD-I player has a
CPU, RAM, ROM, OS, and audio/video/(MPEG) decoders built into it. Portable players
add an LCD screen and speakers/phonejacks. It has limited motion video and still
image compression capabilities. It was announced in 1986, and was in beta test
by Spring 1989
This is a consumer electronics format that uses the optical disc
in combination with a computer to provide a home entertainment system that delivers
music, graphics, text, animation, and video in the living room. Unlike a CD-ROM
drive, a CD-I player is a standalone system that requires no external computer.
It plugs directly into a TV and stereo system and comes with a remote control
to allow the user to interact with software programs sold on discs. It looks
and feels much like a CD player except that you get images as well as music
out of it and you can actively control what happens. In fact, it is a
CD-DA player and all of your standard music CDs will play
on a CD-I player; there is just no video in that case.
For a CD-I disk, there may be as few as 1 or as many as 99 data
tracks. The sector size in the data tracks of a CD-I disk is approximately 2
kbytes. Sectors are randomly accessible, and, in the case of CD-I, sectors can
be multiplexed in up to 16 channels for audio and 32 channels for all other
data types. For audio these channels are equivalent to having 16 parallel audio
data channels instantly accessible during the playing of a disk.
If you want information about Philips CD-I products, you can call these numbers:
US: Consumer hotline: 800-845-7301 For nearest store: 800-223-7772 Developers
hotline: 800-234-5484 UK: Philips CD-I hotline: 0800-885-885 Some useful
references about CD-I are : "Discovering CD-I" available for $45 from:
"Discovering CD-I" Microware Systems Corporation 1900 NW 114th Street Des Moines,
IA 50325-7077 1-800-475-9000 Other books by Philips IMS and published by Addison
Wesley: "Introducing CD-I" ISBN 0-201-62748-5 "The CD-I Production Handbook"
ISBN 0-201-62750-7 "The CD-I Design Handbook" ISBN 0-201-62749-3
CD-ROM means "Compact Disc Read Only Memory". A CD-ROM is physically
identical to a Digital Audio Compact Disc used in a CD player, but the bits
recorded on it are interpreted as computer data instead of music. You need to
buy a CD-ROM Drive and attach it to your computer in order to use CD-ROMs.
A CD-ROM has several advantages over other forms of data storage,
and a few disadvantages. A CD-ROM can hold about 650 megabytes of data, the
equivalent of thousands of floppy discs. CD-ROMs are not damaged by magnetic
fields or the xrays in airport scanners. The data on a CD-ROM can be accessed
much faster than a tape, but CD-ROMs are 10 to 20 times slower than hard discs.
You cannot write to a CD-ROM. You buy a disc with the data already
recorded on it. There are thousands of titles available.
CD-XA is a
CD-ROM extension being designed
to support digital audio and still images.
Announced in August 1988 by Microsoft, Philips, and Sony, the
CD-ROM XA (for Extended Architecture) format incorporates audio from the CD-I
format. It is consistent with ISO 9660, (the volume and the structure of CD-ROM),
is an application extension of the Yellow Book, and draws on the Green Book.
CD-XA defines another way of formatting sectors on a CD-ROM,
including headers in the sectors that describe the type (audio, video, data)
and some additional info (markers, resolution in case of a video or audio sector,
file numbers, etc).
The data written on a CD-XA can still be in ISO9660 file system
format and therefore be readable by MSCDEX and Unix CD-ROM
file system translators. A CD-I player can also read CD-XA
discs even if its own `Green Book' file system only resembles ISO9660 and isn't
fully compatible. However, when a disc is inserted in a CD-I
player, the player tries to load an executable application from the CD-XA, normally
some 68000 application in the /CDI directory. Its name is stored in the disc's
primary volume descriptor. CD-XA bridge discs, like Kodak's PhotoCDs,
do have such an application, ordinary CD-XA discs don't.
A CD-DA drive is a CD-ROM drive but with some of the compressed
audio capabilities found in a CD-I player (called ADPCM). This allows interleaving
of audio and other data so that an XA drive can play audio and display pictures
(or other things) simultaneously. There is special hardware in an XA drive controller
to handle the audio playback. This format came from a desire to inject some
of the features of CD-I back into the professional market.
Cell is a compression technique developed by SMI. The compression
algorithms, the bit-stream definition, and the decompression algorithms are
open. That is Sun will tell anybody who is interested about them . Cell
compression is similar to
MPEG and
H.261
in that there is a lot of room for value-add on the compressor end. Getting
the highest quality image from a given bit count at a reasonable amount of compute
is an art. In addition the bit-stream completely defines the compression format
and defines what the decoder must do and there is less art in the docoder.
There are two flavors of Cell: the original called Cell or CellA,
and a newer flavor called CellB. CellA is designed for use many times video,
where one does not mind that the encoder runs at less than real time. For example,
CD-ROM playback, distance learning, video help for applications.
CellB is designed for use once video where the encoder must run at real time
(interactive) rates. For example, video mail and video conferencing.
Both flavors of cell use the same basic technique of representing
each 4x4 pixel block with a 16-bit bitmask and two 8-bit vector quantized codebook
indices. This produces a compression of 12-1 (or 8-1) since each 16 pixel block
is represented by 32 bits (16-bit mask, and two 8-bit codebook indices). In
both flavors, further compression is accomplished by checking current blocks
against the spatially equivalent block in the previous frame. If the new block
is "close enough" to the old block, the block is coded as a skip code. Consecutive
skip codes are run-length encoded for further compression. Modifying the definition
of close enough allows one to trade off quality and compression rates.
Both version of Cell typically compress video images down to about .75
to .5 bits/pixel.
Both flavors have many similar steps in the early part of compression.
For each 4x4 block, the compressor calculates the average luma of the 16 pixels.
It then partions the pixels into two groups, those whose luma is above the average
and those whose luma is below the average. The compressor sets the 16-bit bitmask
based on which pixels are in each partition. The compressor then calculates
a color to represent each partition.
In Cell, the compressor calculates an average color of each partion,
it then does a vector quantization against the Cell codebook (which is just
a color-map). The encoded block is the 16-bit mask and the two 8-bit colormap
indices. The compressor maintains statistics about how much error each codebook
entry is responsible for and how many times each codebook entry is used. It
uses these numbers to adaptively refine the codebook on each frame. Changed
codebooks are sent in the bitstream.
In CellB, the compressor calculates the average luma for each
partition and the average chroma for the entire block. This gives two colors
[Y_lo, Cb_ave, Cr_ave] and [Y_hi, Cb_ave, Cr_ave]. The pair [Y_lo, Y_hi] is
vector quantized against the Y/Y codebook and the pair [Cb_ave, Cr_ave] is vector
quantized against the Cr/Cb codebook. Here the encoded block is the 16-bit mask
and the two 8-bit VQ indices. Both of CellB's codebooks are fixed. This allows
both the compressor and decompressor to run at high-speed by using table lookups.
Both codebooks are designed with the human visual system in mind. They are not
just uniform partition of the Y/Y or Cr/Cb space. Each codebook has fewer than
256 entries.
Cell (or CellA) is supported in XIL 1.0 from SMI. It is part
of Solaris 2.2. CellB is supported in XIL 1.1 from SMI. It will be part of Solaris
2.3 when that becomes available. Complete bitstream definitions for both flavors
of cell are in the XIL 1.1 programmer's guide. There is some discussion of the
CellA bitstream in the XIL 1.0 programmer's guide.
CellB was used for the SMI Scott McNealy holiday broadcast, where
he talk to the company in real-time over Sun Wide Area Network. This broadcast
reach from Tokyo Japan to Munich Germany with over 3000 known viewers.
Common Image Format. The standardization of the structure of
the samples that represent the picture information of a single frame in digital
HDTV, independent of frame rate and sync/blank structure.
The uncompressed bit rates for transmitting CIF at 29.97 frames/sec
is 36.45 Mbit/sec.
Differential pulse code modulation (DPCM) is a source coding
scheme that was developed for encoding sources with memory.
The reason for using the DPCM structure is that for most sources
of practical interest, the variance of the prediction error is substantially
smaller than that of the source.
Digital Video Interactive (DVI) technology brings television
to the microcomputer. DVI's concept is simple: information is digitized and
stored on a random-access device such as a hard disk or a
CD-ROM,
and is accessed by a computer. DVI requires extensive compression and real-time
decompression of images. Until recently this capability was missing. DVI enables
new applications. For example, a DVI
CD-ROM disk on twentieth-century
artists might consist of 20 minutes of motion video; 1,000 high-res still images,
each with a minute of audio; and 50,000 pages of text. DVI uses the
YUV
system, which is also used by the European
PAL color television
system. The Y channel encodes luminance and the U and V channels encode chrominance.
For DVI, we subsample 4-to-1 both vertically and horizontally in U and V, so
that each of these components requires only 1/16 the information of the Y component.
This provides a compression from the 24-bit RGB space of the original to 9-bit
YUV space.
The DVI concept originated in 1983 in the inventive environment
of the David Sarnoff Research Center in Princeton, New Jersey, then also known
as RCA Laboratories. The ongoing research and development of television since
the early days of the Laboratories was extending into the digital domain, with
work on digital tuners, and digital image processing algorithms that could be
reduced to cost-effective hardware for mass-market consumer television.
European Association of Consumer Electronics Manufacturers
Extended [or Enhanced] Definition Television. A television system
that offers picture quality substantially improved over conventional 525-line
or 625-line receivers, by employing techniques at the transmitter and at the
receiver that are transparent to (and cause no visible quality degradation to)
existing 525-line or 625-line receivers. One example of EDTV is the improved
separation of luminance and colour components by pre-combing the signals prior
to transmission, using techniques that have been suggested by Faroudja, Central
Dynamics and Dr William Glenn
Entropy, the average amount of information represented by a
symbol in a message, is a function of the model used to produce that message
and can be reduced by increasing the complexity of the model so that it better
reflects the actual distribution of source symbols in the original message.
Entropy is a measure of the information contained in message,
it's the lower bound for compression.
Economics and Statistics Advisory Committee
European Strategic Programme for Research and Development in
Information Technology
European Telecommunication Standard Institute
Fast Fourier Transform
Gateways provide functional
bridges between
networks by receiving protocol transactions on a layer-by-layer basis from one
protocol (
SNA) and transforming them into comparable functions
for the other protocol (
OSI). In short, the gateway provides
a connection with protocol translation between networks that use different protocols.
Interestingly enough, gateways, unlike the
bridge, do
not require that the networks have consistent addressing schemes and packet
frame sizes. Most proprietary gateways (such as IBM
SNA gateways)
provide protocol converter functions up through layer six of the
OSI,
while
OSI gateways perform
protocol translations up through
OSI layer seven.
Recognizing the need for providing ubiquitous video services
using the Integrated Services Digital Network (
ISDN),
CCITT
(International Telegraph and Telephone Consultative Committee) Study Group XV
established a Specialist Group on Coding for Visual Telephony in 1984 with the
objective of recommending a video coding standard for transmission at m x 384
kbit/s (m=1,2,..., 5). Later in the study period after new discoveries in video
coding techniques, it became clear that a single standard, p x 64 kbit/s (p
= 1,2,..., 30), can cover the entire
ISDN channel capacity.
After more than five years of intensive deliberation,
CCITT
Recommendation H.261, Video Codec for Audiovisual Services at p x 64 kbit/s,
was completed and approved in December 1990. A slightly modified version of
this Recommendation was also adopted for use in North America.
The intended applications of this international standard are
for videophone and videoconferencing. Therefore, the recommended video coding
algorithm has to be able to operate in real time with minimum delay. For p =
1 or 2, due to severely limited available bit rate, only desktop face-to-face
visual communication (often referred to as videophone) is appropriate. For p>=6,
due to the additional available bit rate, more complex pictures can be transmitted
with better quality. This is, therefore, more suitable for videoconferencing.
High-Definition Television. A television system with approximately
twice the horizontal and twice the vertical resolution of current 525-line and
625-line systems, component colour coding (e.g. RGB or
YCbCr)
a picture aspect ratio of 16:9 and a frame rate of at least 24 Hz. Currently
there are a number of proposed HDTV standards, including HD-MAC, HiVision and
others.
In the archetypal hybrid coder, an estimate of the next frame
to be processed is formed from the current frame and the difference is then
encoded by some purely intraframe mechanism. In recent years, the most attention
has been paid to the motion-compensated DCT coder where the estimate is formed
by a two-dimensional warp of the previous frame and the difference is encoded
using a block transform (the Discrete Cosine Transform).
This system is the basis for international standards for videotelephony,
is used for some HDTV demonstrations, and is the prototype
from which MPEG was designed. Its utility has
been demonstrated for video sequence, and the DCT concentrates the remaining
energy into a small number of transform coefficients that can be quantized and
compactly represented.
The key feature of this coder is the presence of a complete decoder
within it. The difference between the current frame as represented as the receiver
and the incoming frame is processed. In the basis design, therefore, the receiver
must track the transmitter precisely, the decoder at the receiver and the decoder
at the transmitter must match. The system is sensitive to channel errors and
does not permit random access. However, it is on the order of three to four
times as efficient as one that uses no prediction.
In practice, this coder is modified to suit specific application.
The standard telephony model uses a forced update of the decoded frame so that
channel errors do not propagate. When a participant enters the conversation
late or alternates between image sources, residual errors die out and a clear
image is obtained after a few frames. Similar techniques are used in versions
of this coder being developed for direct satellite television broadcasting.
For a given character distribution, by assigning short codes
to frequently occurring characters and longer codes to infrequently occurring
characters, Huffman's minimum redundancy encoding minimizes the average number
of bytes required to represent the characters in a text.
Static Huffman encoding uses a fixed set of codes, based on a
representative sample of data, for processing texts. Although encoding is achieved
in a single pass, the data on which the compression is based may bear little
resemblance to the actual text being compressed.
Dynamic Huffman encoding, on the other hand, reads each text
twice; once to determine the frequency distribution of the characters in the
text and once to encode the data. The codes used for compression are computed
on the basis of the statistics gathered during the first pass with compressed
texts being prefixed by a copy of the Huffman encoding table for use with the
decoding process.
By using a single-pass technique, where each character is encoded
on the basis of the preceding characters in a text, Gallager's adaptive Huffman
encoding avoids many of the problems associated with either the static or dynamic
method.
Improved Definition Television. A television system that offers
picture quality substantially improved over conventional receivers, for signals
originated in standard 525-line or 625-line format, by processing that involves
the use of field store and/or frame store (memory) techniques at the receiver
. One example is the use of field or frame memory to implement de-interlacing
at the receiver in order to reduce interline twitter compared to that of an
interlaced display . IDTV techniques are implemented entirely at the receiver
and involve no change to picture origination equipment and no change to emission
standards
International Electrotechnic Committee. A standardisation body
at the same level as ISO
Interactive video-disc is another video related technology,
using an analog approach. It has been available since the early 1980s, and is
supplied in the U.S. primarily by Pioneer, Sony, and IBM.
ISDN stands for "Integrated Services Digital Networks", and
it's a
CCITT term for a relatively new telecommunications
service package. ISDN is basically the telephone network turned all-digital
end to end, using existing switches and wiring (for the most part) upgraded
so that the basic
call is a 64 kbps end-to-end channel, with bit-diddling
as needed (but not when not needed!). Packet and maybe frame modes are thrown
in for good measure, too, in some places. It's offered by local telephone companies,
but most readily in Australia, France, Japan, and Singapore, with the UK and
Germany somewhat behind, and USA availability rather spotty.
A Basic Rate Interface (BRI) is two 64K bearer ("B") channels
and a single delta ("D") channel. The B channels are used for voice or data,
and the D channel is used for signaling and/or X.25 packet
networking. This is the variety most likely to be found in residential service.
Another flavor of ISDN is Primary Rate Interface (PRI). Inside the US, this
consists of 24 channels, usually divided into 23 B channels and 1 D channel,
and runs over the same physical interface as T1. Outside of the US then PRI
has 31 user channels, usually divided into 30 B channels and 1 D channel. It
is typically used for connections such as one between a PBX and a CO or IXC.
A television system that limits the recording or transmission
of useful picture information to about three-quarters of the available vertical
picture height of the distribution format (e.g. 525-line) in order to offer
program material that has a wide picture aspect ratio
Video originates with linear-light (tristimulus) RGB
primary components, conventionally contained in the range 0 (black) to +1 (white).
From the RGB triple, three gamma-corrected primary signals are computed; each
is essentially the 0.45-power of the corresponding tristimulus value, similar
to a square-root function.
In a practical system such as a television camera, however, in
order to minimize noise in the dark regions of the picture it is necessary to
limit the slope (gain) of the curve near black. It is now standard to limit
gain to 4.5 below a tristimulus value of +0.018, and to stretch the remainder
of the curve to place the Y-intercept at -0.099 in order to maintain function
and tangent continuity at the breakpoint:
Rgamma = (1.099 * pow(R,0.45))
- 0.099 Ggamma = (1.099 * pow(G,0.45)) - 0.099 Bgamma = (1.099 * pow(B,0.45))
- 0.099 Luma is then computed as a weighted sum of the gamma-corrected
primaries: Y = 0.299*Rgamma + 0.587*Ggamma + 0.114*Bgamma The three
coefficients in this equation correspond to the sensitivity of human vision
to each of the RGB primaries standardized for video. For example, the low value
of the blue coefficient is a consequence of saturated blue colours being perceived
as having low brightness.
The luma coefficients are also a function of the white point
(or chromaticity of reference whitex). Computer users commonly have a
white point with a colour temperature in the range of 9300 K, which contains
twice as much blue as the daylight reference CIE D65 used in television. This
is reflected in pictures and monitors that look too blue.
Although television primaries have changed over the years since
the adoption of the NTSC standard in 1953, the coefficients
of the luma equation for 525 and 625 line video have remained unchanged. For
HDTV, the primaries are different and the luma coefficients
have been standardized with somewhat different values.
Algorithm used by the Unix compress command to reduce
the size of files, eg. for archival or transmission. The algorithm relies on
repetition of byte sequences (strings) in its input. It maintains a table mapping
input strings to their associated output codes. The table initially contains
mappings for all possible strings of length one. Input is taken one byte at
a time to find the longest initial string present in the table. The code for
that string is output and then the string is extended with one more input byte,
b. A new entry is added to the table mapping the extended string to the next
unused code (obtained by incrementing a counter). The process repeats, starting
from byte b. The number of bits in an output code, and hence the maximum number
of entries in the table is usually fixed and once this limit is reached, no
more entries are added.
Communicating a higher-level model of the image than pixels
is an active area of research. The idea is to have the transmitter and receiver
agree on the basic model for the image; the transmitter then sends parameters
to manipulate this model in lieu of picture elements themselves. Model-based
decoders are similar to computer graphics rendering programs.
The model-based coder trades generality for extreme efficiency
in its restricted domain. Better rendering and extending of the domain are research
themes.
An electronic device for converting between serial data (typically
RS-232) from a computer and an audio signal suitable for transmission over telephone
lines. The audio signal is usually composed of silence (no data) or one of two
frequencies representing 0 and 1. Modems are distinguished primarily by the
baud rates they support which can range from 75 baud up to 19200 and beyond.
Data to the computer is sometimes at a lower rate than data from
the computer on the assumption that the user cannot type more than a few characters
per second. Various data compression and error algorithms are required to support
the highest speeds. Other optional features are auto-dial (auto-call) and auto-answer
which allow the computer to initiate and accept calls without human intervention.
National Association of Broadcasters
Nippon Hoso Kyokai, principal japanese broadcaster
USA video standard with image format 4:3, 525 lines, 60 Hz and
4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses
YIQ
The Open Systems Interconnection Reference Model was formally
initiated by the International Organization for Standardization (
ISO)
in March, 1977, in response to the international need for an open set of communications
standards. OSI's objectives are:
- To provide an architectural reference point for developing standardized
procedures
- To allow inter-networking between networks of the same type
- To serve as a common framework for the development of services and protocols
consistent with the OSI model
- To expedite the offering of interoperable, multi-vendor products and
services
The model is similar in structure to that of
SNA.
It consists of seven architectural layers: the physical layer; the data link
layer, the network layer; the transport layer; the session layer; the presentation
layer; the application layer.
The physical and data link layers provide the same functions
as their SNA counterparts (physical control and data link control layers). The
network layer selects routing services, segments blocks and messages, and provides
error detection, recovery, and notification.
The transport layer controls point-to-point information interchange,
data packet size determination and transfer, and the connection/disconnection
of session entities.
The session layer serves to organize and synchronize the application
process dialog between presentation entities, manage the exchange of data (normal
and expedited) during the session, and monitor the establishment/release of
transport connections as requested by session entities.
The presentation layer is responsible for the meaningful display
of information to application entities.
More specifically, the presentation layer identifies and negotiates
the choice of communications transfer syntax and the subsequent data conversion
or transformation as required. The application layer affords the interfacing
of application processes to system interconnection facilities to assist with
information exchange. The application layer is also responsible for the management
of application processes including initialization, maintenance and termination
of communications, allocation of costs and resources, prevention of deadlocks,
and transmission security.
European video standard with image format 4:3, 625 lines, 50
Hz and 4 Mhz video bandwidth with a total 8 Mhz of video channel width. PAL
uses
YUV.
Quarter Common source Intermediate Format (1/4
CIF
, e.g. 1180*144)
The uncompressed bit rates for transmitting QCIF at 29.97 frames/sec
is 9.115 Mbit/s.
Region Coding has received attention because of the ease with
which it can be decoded and the fact that a coder of this type is used in Intel's
Digital Video Interactive system (
DVI), the only commercially
available system designed expressly for low-cost, low-bandwidth multimedia video.
Its operation is relatively simple. The basic design is due to
Kunt.
Envision a decoder that can reproduce certain image primitives
well. A typical set might consist of rectangular areas of constant color, smooth
shaded patches and some textures. The image is analyzed into regions that can
be expressed in terms of these primitives. The analysis is usually performed
using a tree-structured decomposition where each part of the image is successively
divided into smaller regions until a patch that meets either the bandwidth constraints
or the quality desired can be fitted. Only the tree description and the parameters
for each leaf need then be transmitted. Since the decoder is optimized for the
reconstruction of these primitives, it is relatively simple to build.
To account for image data that does not encode easily using the
available primitives, actual image data can also be encoded and transmitted,
but this is not as efficient as fitting a patch.
This coder can also be combined with prediction (as it is in
DVI), and the predicted difference image can then be region
coded. A key element in the encoding operation is a region growing step
where adjacent image patches that are distinct leaves of the tree are combined
into a single patch. This approach has been considered highly asymmetric in
that significantly more processing is required for encoding/analysis than for
decoding. It is harder to grow a tree than to climb one.
While hardware implementations of the hybrid
DCT coder have been built for extremely low bandwidth teleconferencing and for
HDTV, there is no hardware for a region coder. However,
such an assessment is deceptive since much of the processing used in DVI
compression is in the motion predictor, a function common to both methods. In
fact, all compression schemes are asymmetric, the difference is a matter of
degree rather than one of essentials.
Repeaters are transparent devices used to interconnect segments
of an extended network with identical protocols and speeds at the physical layer
(
OSI layer 1). An example of a repeater connection would
be the linkage of two carrier sense multiple access/collision detection (CSMA/CD)
segments within a network.
Routers connect networks at
OSI layer 3.
Routers interpret packet contents according to specified protocol sets, serving
to connect networks with the same protocols (DECnet to DECnet, TCP/IP (Transmission
Control Protocol/Internet Protocol) to TCP/IP). Routers are protocol-dependent;
therefore, one router is needed for each protocol used by the network. Routers
are also responsible for the determination of the best path for data packets
by routing them around failed segments of the network.
European video standard with image format 4:3, 625 lines, 50
Hz and 6 Mhz video bandwidth with a total 8 Mhz of video channel width.
SMPTE is the Society of Motion Picture and Television Engineers.
There is an SMPTE time code standard (hr:min:sec:frame) used to identify video
frames.
Systems network Architecture entered the market in 1974 as a
hierarchical, single-host network structure. Since then, SNA has developed steadily
in two directions. The first direction involved tying together mainframes and
unintelligent terminals in a master-to-slave relationship. The second direction
transformed the SNA architecture to support a cooperative-processing environment,
whereby remote terminals link up with mainframes as well as each other in a
peer-to-peer relationship (termed Low Entry Networking (LEN) by IBM). LEN depends
on the implementation of two protocols: Logical Unit 6.2, also known as APPC,
and Physical Unit 2.1 which affords point-to-point connectivity between peer
nodes without requiring host computer control.
The SNA model is concerned with both logical and physical units.
Logical units (LUs) serve as points of access by which users can utilize the
network. LUs can be viewed as terminals that provide users access to application
programs and other services on the network. Physical units (PUs) like LUs are
not defined within SNA architecture, but instead, are representations of the
devices and communication links of the network.
Any country have national standard body where experts from industry
and universities develop standards for all kinds of engineering problems. Among
them are, for instance,
ANSI American National Standards Institute USA
DIN Deutsches Institut fuer Normung Germany BSI British Standards Institution
United Kingdom AFNOR Association francaise de normalisation France UNI Ente
Nazionale Italiano di Unificatione Italy NNI Nederlands Normalisatie-instituut
Netherlands SAA Standards Australia Australia SANZ Standards Association of
New Zealand New Zealand NSF Norges Standardiseringsforbund Norway DS Dansk Standard
Denmark and about 80 others.
The International Organization for Standardization, ISO, in Geneva
is the head organization of all these national standardization bodies. Together
with the International Electrotechnical Commission, IEC, ISO concentrates its
efforts on harmonizing national standards all over the world. The results of
these activities are published as ISO standards. Among them are, for instance,
the metric system of units, international stationery sizes, all kinds of bolt
nuts, rules for technical drawings, electrical connectors, security regulations,
computer protocols, file formats, bicycle components, ID cards, programming
languages, International Standard Book Numbers (ISBN), ... Over 10,000 ISO standards
have been published so far and you surely get in contact with a lot of things
each day that conform to ISO standards you never heard of. By the way, "ISO"
is not an acronym for the organization in any language. It's a wordplay based
on the English/French initials and the Greek-derived prefix "iso-" meaning "same".
Within ISO, ISO/IEC Joint Technical Committee 1 (JTC1) deals
with information technology.
The International Telecommunication Union, ITU, is the United
Nations specialized agency dealing with telecommunications. At present there
are 164 member countries. One of its bodies is the International Telegraph and
Telephone Consultative Committee, CCITT. A Plenary Assembly
of the CCITT, which takes place every few years, draws up a list of 'Questions'
about possible improvements in international electronic communication. In Study
Groups, experts from different countries develop 'Recommendations' which are
published after they have been adopted. Especially relevant to computing are
the V series of recommendations on modems (e.g. V.32, V.42),
the X series on data networks and OSI (e.g. X.25,
X.400), the I and Q
series that define ISDN, the Z series that defines specification
and programming languages (SDL, CHILL), the T series on text communication (teletext,
fax, videotext, ODA) and the H series on digital sound and video encoding.
Since 1961, the European Computer Manufacturers Association,
ECMA, has been a forum for data processing experts where agreements have been
prepared and submitted for standardization to ISO, CCITT
and other standards organizations.
Sub-band coding for images has roots in work done in the 1950s
by Bedford and on Mixed Highs image compression done by Kretzmer
in 1954. Schreiber and Buckley explored general two channel coding
of still pictures where the low spatial frequency channel was coarsely sampled
and finely quantized and the high spatial frequency channel was finely sampled
and coarsely quantized. More recently, Karlsson and Vetterli have
extended this to multiple subbands. Adelson et al. have shown how a recursive
subdivision called a pyramid decomposition can be used both for compression
and other useful image processing tasks.
A pure sub-band coder performs a set of filtering operations
on an image to divide it into spectral components. Usually, the result of the
analysis phase is a set of sub-images, each of which represents some region
in spatial or spatio-temporal frequency space. For example, in a still image,
there might be a small sub-image that represents the low-frequency components
of the input picture that is directly viewable as either a minified or blurred
copy of the original. To this are added successively higher spectral bands that
contain the edge information necessary to reproduce the original sharpness of
the original at successively larger scales. As with DCT coder, to which it is
related, much of the image energy is concentrated in the lowest frequency band.
For equal visual quality, each band need not be represented with
the same signal-to-noise ratio; this is the basis for sub-band coder compression.
In many coders, some bands are eliminated entirely, and others are often compressed
with a vector or lattice quantizer. Succeedingly higher frequency bands are
more coarsely quantized, analogous to the truncation of the high frequency coefficients
of the DCT. A sub-band decomposition can be the intraframe coder in a predictive
loop, thus minimizing the basic distinctions between DCT-based hybrid
coders and their alternatives.
The T1Q1.5 Video Teleconferencing/Video Telephony (VTC/VT)
ANSI
Subworking Group (SWG) was formed to draft a performance standard for digital
video. Important questions were asked, relating to video digital performance
characteristics of video teleconferencing/video telephony :
- Is it possible to measure motion artifacts with VTC/VT digital transport?
- If it can be done by objective measurements, can they be matched to subjective
tests?
- Is it possible to correlate the objective measurements of analog and
digital performance specification?
The VTC/VT Subworking Group's goal is to answer these questions. It has become
a first step to the process of constructing the performance standard.
Trellis coding is a source coding technique that has resulted
in numerous publications and some very effective source codes. Unfortunately,
the computational burden of these codes is tremendous and grows exponentially
with the encoding rate.
A trellis is a transition diagram, that takes time into account,
for a finite state machine. Populating a trellis means specifying output symbols
for each branch, specifying an initial state yields a set of allowable output
sequences.
A trellis coder is defined as follows: given a trellis populated
with symbols from an output alphabet and an input sequence x of length n, a
trellis coder outputs the sequence of bits corresponding to the output sequence
x that maximizes the SNR of the encoding.
A standard networking protocol suite approved by the
CCITT
and
ISO. This protocol suite defines standard physical,
link, and networking layers (
OSI layers 1 through 3). X.25
networks are in use throughout the world.
The set of
CCITT communications standards
covering mail services provided by data networks.
Kodak's PhotoYCC colour space (for PhotoCD) is similar to YCbCr,
except that Y is coded with lots of headroom and no footroom, and the scaling
of Cb and Cr is different from that of Rec. 601-1 in order to accommodate a
wider colour gamut:
Y_8bit = (255/1.402) * Y C1_8bit = 156 + 111.40 *
(Bgamma - Y) C2_8bit = 137 + 135.64 * (Rgamma - Y) The C1 and C2 components
are subsequently subsampled by factors of two horizontally and vertically, but
that subsampling should be considered a feature of the compression process and
not of the colour space.
The international standard CCIR-610-1 specifies eight-bit digital
coding for component video, with black at luma code 16 and white at luma code
235, and chroma in eight-bit two's complement form centred on 128 with a peak
at code 224. This coding has a slightly smaller excursion for luma than for
chroma: luma has 219 risers compared to 224 for Cb and Cr. The notation
CbCr distinguishes this set from PbPr where the luma and chroma excursions are
identical.
For Rec. 601-1 coding in eight bits per component,
Y_8b
= 16 + 219 * Y Cb_8b = 128 + 112 * (0.5/0.886) * (Bgamma - Y) Cr_8b = 128 +
112 * (0.5/0.701) * (Rgamma - Y) Some computer applications place black
at luma code 0 and white at luma code 255. In this case, the scaling and offsets
above can be changed accordingly, although broadcast-quality video requires
the accommodation for headroom and footroom provided in the CCIR-610-1 equations.
CCIR-610-1 Rec. calls for two-to-one horizontal subsampling of
Cb and Cr, to achieve 2/3 the data rate of RGB with virtually no perceptible
penalty. This is denoted 4:2:2. A few digital video systems have utilized horizontal
subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG normally subsample
Cb and Cr two-to-one horizontally and also two-to-one vertically, to get 1/2
the data rate of RGB. No standard nomenclature has been adopted to describe
vertical subsampling. To get good results using subsampling you should not just
drop and replicate pixels, but implement proper decimation and interpolation
filters.
YCbCr coding is employed by D-1 component digital video equipment.
If three components are to be conveyed in three separate channels
with identical unity excursions, then the Pb and Pr colour difference components
are used:
Pb = (0.5/0.886) * (Bgamma - Y) Pr = (0.5/0.701) * (Rgamma
- Y)
These scale factors limit the excursion of EACH colour difference
component to -0.5 .. +0.5 with respect to unity Y excursion: 0.886 is just unity
less the luma coefficient of blue. In the analog domain Y is usually 0 mV (black)
to 700 mV (white), and Pb and Pr are usually +- 350 mV.
YPbPr is part of the CCIR Rec. 709 HDTV standard,
although different luma coefficients are used, and it is denoted E'Pb and E'Pr
with subscript arrangement too complicated to be written here.
YPbPr is employed by component analog video equipment such as
M-II and BetaCam; Pb and Pr bandwidth is half that of luma.
The U and V signals above must be carried with equal bandwidth,
albeit less than that of luma. However, the human visual system has less spatial
acuity for magenta-green transitions than it does for red-cyan. Thus, if signals
I and Q are formed from a 123 degree rotation of U and V respectively [sic],
the Q signal can be more severely filtered than I (to about 600 kHz, compared
to about 1.3 MHz) without being perceptible to a viewer at typical TV viewing
distance. YIQ is equivalent to
YUV with a 33 degree rotation
and an axis flip in the UV plane. The first edition of W.K. Pratt "Digital Image
Processing", and presumably other authors that follow that bible, has a matrix
that erroneously omits the axis flip; the second edition corrects the error.
Since an analog NTSC decoder has no way of
knowing whether the encoder was encoding YUV or YIQ, it cannot
detect whether the encoder was running at 0 degree or 33 degree phase. In analog
usage the terms YUV and YIQ are often used somewhat interchangeably.
YIQ was important in the early days of NTSC but most broadcasting
equipment now encodes equiband U and V.
The D-2 composite digital DVTR (and the associated interface
standard) conveys NTSC modulated on the YIQ axes in the
525-line version and PAL modulated on the YUV
axes in the 625-line version.
In composite NTSC, PAL
or S-video systems, it is necessary to scale (B-Y) and (R-Y) so that the composite
NTSC or PAL
signal (luma plus modulated chroma) is contained within the range -1/3 to +4/3.
These limits reflect the capability of composite signal recording or transmission
channel. The scale factors are obtained by two simultaneous equations involving
both B-Y and R-Y, because the limits of the composite excursion are reached at
combinations of B-Y and R-Y that are intermediate to primary colours. The scale
factors are as follows: U = 0.493 * (B - Y) V = 0.877 * (R - Y)
U and V components are typically modulated into a
chroma
component:
C = U*cos(t) + V*sin(t) where t represents the ~3.58
MHz
NTSC colour sub-carrier.
PAL
coding is similar, except that the V component switches Phase on Alternate Lines
(+-1), and the sub-carrier is at a different frequency, about 4.43 MHz.
It is conventional for an NTSC luma signal
in a composite environment (NTSC or S-video) to have 7.5%
setup :
Y_setup = (3/40) + (37/40) * Y A PAL
signal has zero setup.
The two signals Y (or Y_setup) and C can be conveyed separately
across an S-video interface, or Y and C can be combined (encoded) into
composite NTSC or PAL:
NTSC = Y_setup + C PAL = Y + C U and V are only appropriate for
composite transmission as 1-wire NTSC or PAL,
or 2-wire S-video. The UV scaling (or the IQ set, described below) is incorrect
when the signal is conveyed as three separate components. Certain component
video equipment has connectors labelled YUV that in fact convey YPbPr
signals.
The following is a list of persons whose
material contributed to the creation of this list : Andrew
Davidson Tom Lane Charles
A. Poynton Lee Westover and many,
many more others ....
You might also want to look at the following documents :
comp.dsp.faq