
Quake DeveLS Article 02
Quake II
Cinematics Uncovered
Author: Tim Ferguson
Introduction
To improve the single
player story side of Quake, Id Software has now included cut scene cinematics
in Quake II. Several people have since been interested in how to create their
own cinematics and have discovered a program released by Id Software in their
public source dump.
This document attempts to
describe the format of a Quake II cinematic sequence (a .cin file) and include
some source code for encoding (taken from Id Softwares source) and decoding of
the cinematic sequences. I will try and keep it simple enough for non-technical
people to follow.
In essence, the Quake II
cinematics are an AVI sequence where the audio is stored in a raw pcm format,
and the 8-bit colour lookup table based video is coded using a two-pass
loss-less static Huffman coder. I will go into more detail in the following
sections.
The supplied `bin_nt/qdata.exe' by Id
Software
The program supplied by Id
Software in their public source code dump allows you to easily create .cin
cinematic files. There has been information supplied by Jeff Garstecki
(stecki@frag.com and http://www.frag.com/deconstruct)
(and a user made .cin sequence) and Paul Steed (psteed@idsoftware.com). I will
briefly re-cap their documentation, and go into a little bit more detail.
The cinematic sequences
are stored in the `quake2/baseq2/video' directory where they can be played from
the console using the map command (try typing `map end.cin' from the console).
To create your own
sequences, generate a series of individual frames of your animation sequence
and save them as 8-bit colour PCX files. The file names should be numbered sequentially
as [base name]000.pcx, or [base name]0000.pcx, (for example: hell000.pcx,
hell001.pcx, ... hell120.pcx) although qdata can start at any frame. These
files need to be located in the `/bin_nt/video/[base name]' directory (in the
example: /bin_nt/video/hell/).
Although you can have
different colour palettes for your sequences, there will be an improvement in
video quality if the frames share a common colour palette, or if the colour
palette is only changed during a black frame. This is due to slow palette
switching times. A suggestion is to fade to black, switch palettes, and fade to
the new palette. This palette switching can be seen in the ntro.cin sequence
where it is used several times. When adding PCX images, qdata checks to see if
the palette has changed and adds a change palette command to the sequence.
Technically, the frames
can be of any resolution, however, the standard resolution used is 320x240. I
have tried sequence resolutions of 336x240, 176x144 and 360x288 and found that
frames which are too large or small are scaled to fill the screen. Animations
are played at 14 frames per second, and Quake II will skip frames to maintain
this playback rate on slow or heavily loaded machines (see the section on
`Audio Coding' for the 14 fps derivation). Frame skipping is used to prevent
sound from becoming choppy.
An optional sound file can
be included in the animation sequence. The source sound must be in a .wav
format, can be mono or stereo, can be a multiple of 8-bit per sample (usually 8
or 16-bit) and can technically have an arbitrary sampling rate (typically
22050Hz or 11025Hz). The file must be placed in the same directory as the PCX
files, and have the same [base name] as the PCX files (in our example:
`/bin_nt/video/hell/hell.wav').
Finally, a QDT script file
(.qdt) needs to be created with the following information in it:
$video [base name] [no. of digits (3 or 4)] [start frame (optional)]
In our
example, we create the file hell.qdt with:
$video hell 3
The .qdt
file is placed in the /bin_nt directory, and qdata is run using the .qdt file
as its only argument. (for example qdata hell.qdt). After a few passes, a
resulting .cin file will be created in the `/bin_nt/video/' directory which can
be viewed using Quake II.
Video Coding
If you venture into the
source code of `qdata.exe' distributed in Id Software's public source dump, you
will find the file `utils3/qdata/video.c'. In this file, it can be seen that Id
tried several techniques to code their video (including a few Huffman techniques
and an LZ technique) before settling on a two-pass static loss-less Huffman
coder.
In the area of image,
video and audio storage, there are three techniques to reduce file sizes:
lossless coding, lossy coding and sub-sampling. Lossless techniques compress
data without loss to the audio or visual quality, however, obtain very low
compression ratios resulting in large files. Lossy techniques, however,
sacrifice some audio or video quality not perceivable by humans, in return for
significantly higher compression ratios. The third way of reducing storage
requirements is in the same vein as lossy compression and is done through
sub-sampling. For video, this includes pixel, spatial and temporal sub-sampling
in the form of quantising the pixel colours to produce a smaller colour palette
(eg: 256 colours rather than 16.7k colours), lower screen resolutions, and
lower video frame rates (15 frames per second (fps) rather than 25 or 30 fps)
respectively.
Id Softwares cinematic
video sequences use two of the three forms of compression: sub-sampling and
lossless coding. Video sequences are firstly sub-sampled to 8-bit per pixel
(256 colours), 320x240 pixel frames at 14 frames per second. The resulting
sequence is then lossless coded using the Huffman algorithm to achieve
approximately 3:1 reduction from the sub-sampled sequence. This format would
have probably been used due to the minimum platform specification in which the
video is conveyed: on a PC with a 256 colour display, relatively slow (P90)
processor, and a cheap mass storage device (CD rom).
If, and most probably when
(point release maybe??), Id increase their minimum platform to 24-bit colour
and a slightly faster processor, they could use a lossy technique at 16.7k
colours, over twice the frame rate and a significant improvement in
compression. The improvement in colour, spatial and temporal resolution would
greatly out-weigh the loss through coding. An example of this is a sequence
converted from .cin format to MPEG. The file idlog.cin plays with 8-bit colour
at 14 fps and is compressed at 2.3:1. The same file encoded using MPEG ( ftp://ftp.cdrom.com:/pub/idgames2/quake2/graphics/movies/idlog_avi.zip)
is played with 24-bit colour at 25 fps, and is compressed to approximately
13:1. The MPEG, as is expected, takes significantly more processing power to
play back in real time when compared to the .cin format. Other forms of less
processor demanding lossy compression not experimented with include Quicktime
and AVI incorporating codecs such as CinePak and Indeo Video. See the results
section for more .cin sequence compression results.
Huffman Coding
As stated by Peter Gutmann
in the comp.compression FAQ:
`Huffman
compression is a statistical data compression technique which gives a reduction
in the average code length used to represent the symbols of a alphabet.'
In
Huffman's coding technique, stored pixel data is assigned variable length codes
(VLC) based on the pixel's probability of occurrence. Input pixels that occur
more often are assigned shorter length codes (a fewer number of bits), while
infrequent input pixels are assigned longer length codes (a greater number of
bits). A static Huffman coder achieves this by performing two passes over the
video sequence. The first pass creates a frequency histogram of the pixels in
the video sequence, using it to generate the dictionary of VLCs. The histogram
is stored so that the decoder can reconstruct the VLC dictionary. The second
pass over the sequence pixels stores the VLC that corresponds to each input
pixel. You may need to look else where for a more in depth discussion on
Huffman coding.
Typically, a histogram of
256 elements is used when constructing the VLC dictionary, one histogram entry
per pixel value. However, video sequence images contain a high inter-pixel
correlation in the spatial domain (pixels next to one another are very similar
or the same in colour), and a significant improvement in compression
performance can be achieved if both the previous pixel and the current pixel
are used when generating the frequency histogram. This is the case with Id
Software's .cin video format. The result is 256 histograms of 256 elements
producing a 256 * 256 table. The rows of the histogram are referenced by the
previous pixel, and the columns of the histogram are referenced by the current
pixel. Since there is a high probability of the previous pixel being the same
as, or very similar to the current pixel, a diagonal line from the top-left
corner to the bottom-right corner is formed in the histogram indicating areas
of high probability. See the included image of the two dimensional histogram.
|
|
|
|
idlog.cin histogram |
ntro.cin histogram |
When decoding a sequence,
the previously decoded pixel is used to reference a row of the VLC dictionary,
while the stored variable length code is used to find the pixel value. This new
pixel value then becomes the previous pixel, and the process is repeated. The
initial `previous pixel' value is set to zero for the start of each frame.
If you are interested in
the video coding of .cin files, most of what has been said should be clearer if
you look at the supplied source code.
Audio Coding
Audio data in the .cin
cinematic sequences is stored in a raw pcm format (uncompressed). From the
sequence header, it appears that any sampling rate, sample size and number of
channels can be used, however, it would depend on what combinations of
parameters the game can play back. From the results section below, it can be
seen that sequences have used sampling rates of 22050 and 11025 Hz, sample
widths of 8 or 16 bits and either mono (1 channel) or stereo (2 channels).
Acoustically demanding sequences (speech, sound effects and music) such as the
intro and end sequence have used a higher quality stereo audio, while the less
demanding (just speech and simple sound effects) cut scenes have used lower
quality mono audio.
When audio is coded into
the cinematic sequence, a one second clip of audio data (sample rate * sample
width * sample channels) is divided into 14 chunks. Each of these chunks is
assigned to one frame of the Huffman coded video. This audio segmentation is
found in the source code supplied by Id Software, and will result in a 14
frames per second video play back rate to synchronise with the audio. More
information on the sequence format is found below.
Coding Results
Some results taken from
both the included cinematic sequences, and a user made sequence (cave.cin by
Jeff Garstecki) are as follows:
+-----------+---------+-------+-----+------+--------+-----------+-------+| sequence | vid res | rate | wid | chan | frames | file size | compr |+-----------+---------+-------+-----+------+--------+-----------+-------+| ntro.cin | 320x240 | 22050 | 16 | 2 | 2945 | 82836235 | 3.5:1 || end.cin | 320x240 | 22050 | 16 | 2 | 726 | 19311290 | 3.7:1 || idlog.cin | 320x240 | 22050 | 16 | 2 | 81 | 3159828 | 2.3:1 || eou#_.cin | 320x240 | 11025 | 8 | 1 | - | - | - || cave.cin | 320x240 | 22050 | 16 | 2 | 200 | 5453415 | 3.7:1 |+-----------+---------+-------+-----+------+--------+-----------+-------+
Where
`rate', `wid' and `chan' are the audio sampling rate, sample width and number
of channels respectively. The `compr' is the compression obtained in the video
only. From these results it can be seen that sequences with smooth coloured
areas (ntro.cin, end.cin and cave.cin) result in compression ratios of around
3.6:1. However, highly textured sequences such as the idlog.cin (its
background) result in lower compression.
Coded Cinematic Stream
This section describes the
very simple and application specific .cin file structure. The .cin file
contains a header in little endian format as follows:
32 ...... 2 1 0 Field Name Type +---------------+ 0 | | Video width Unsigned long +---------------+ 4 | | Video height Unsigned long +---------------+ 8 | | Audio sample rate Unsigned long +---------------+ 12 | | Audio sample width (in bytes) Unsigned long +---------------+ 16 | | Audio channels (1 or 2) Unsigned long +---------------+ 20 | | +- -+ 24 | | +- . . . . -+ | | Huffman table Unsigned Byte +- -+65556 | | +---------------+
This
header contains information on the video and audio resolution, as well as a
Huffman table used to code the video data. The Huffman table is a 256 * 256
table of byte values (65536 bytes total).
Following the header, and
for each frame of the video, the following is stored in the .cin sequence:
32 ...... 2 1 0 Field Name Type +---------------+ 0 | | Sequence command Unsigned long +---------------+ 4 | | +- -+ 8 | | +- . . . . -+ | | OPTIONAL colour palette Unsigned Byte +- -+ | | +---------------+ 772 | | Huffman count Unsigned long +---------------+ 776 | | Decode count (D) Unsigned long +---------------+ 780 | | +- -+ 784 | | +- . . . . -+ Encoded Huffman video data Unsigned Byte | | (contains decode count - 4 bytes) +- -+D+784 | | +---------------+D+788 | | +- -+D+792 | | Raw audio data Unsigned Byte +- . . . . -+ (contains | | audio width * audio channels * audio rate/14 +- -+ bytes) | | +---------------+
As can be
seen, the sequence stores one frame of video, and one sample of audio per
frame. The above sequence command takes on three possible values:
The
Huffman count indicates the number of coded bytes to follow (including the
decode count), and as expected, the decode count is video width * video height.
Source Code
I have put together a
small program for playing .cin files under X11. I have tested the compilation
under Linux and SunOS. Most other X based OS should work. Click HERE for
the source archive.
Article by Tim Ferguson.
|
This
site, and all content and graphics displayed on it, |