The Red Book audio CD is a pirate’s fantasy since it stores high-fidelity digital audio on unprotected media. Such CDs contain a “Table of Contents” that informs players where tracks (or songs) are located. The tracks themselves are composed of blocks (or sectors) of uncompressed and unprotected Pulse Code Modulation (PCM) digital audio content. CD players then read these blocks of PCM audio and transform them into analog waveforms that can be processed by your ears. Surprisingly, anyone call read these audio sectors and create bootleg content. Fortunately for the music industry, few people knew of this shocking weakness since early consumer devices were playback-only.
Once software manufacturers realized that CDs could also be used to hold data, CD-ROM drives began to proliferate. Initially, these drives used proprietary hardware interfaces and APIs, so operating system support was haphazard. Eventually, hardware manufacturers adopted standardized interfaces that enabled widespread operating system support.
Unbeknownst to most programmers, these operating systems added low-level device driver APIs to access audio data directly from Red Book CDs–a process known as Digital Audio Extraction (DAE) [See Robert A. Starrett's "Ripping Off Recordings: Digital Audio Extraction Do's, Don'ts, and Do'ers," July 99, pp. 34-46--Ed.]. Armed with these APIs, programmers could retrieve digital audio content from CDs and transport (or stream) it to a sound card. Since sound cards typically have better digital-to-analog (D/A) converters than a CD-ROM drive, audit) streaming programs offer improved sound quality. Other legitimate uses for such low-level APIs include backup and streaming multimedia APIs.
MP3: THE KILLER APP THAT KILLED AUDIO?
Some perceptive users, however, eventually discovered that low-level APIs could be used to create perfect digital copies of audio CDs. This type of piracy was limited, however, to professional users since these APIs weren’t standardized and there wasn’t a so-called “killer app” to spark consumer interest. Unfortunately for the music industry, the killer app arrived in the form of MPEG-1/Layer 3 (or MP3) compression and the Internet.
Before MP3, audio content was either uncompressed or highly compressed. Uncompressed CD-quality audio isn’t viable because of its bandwidth requirements (176KB/sec). By contrast, most audio compressor/decompressors (or codecs) were obsessed with bit-rates. (Bit-rates refer to the amount of audio that must be delivered per second to ensure smooth playback.) While these compression algorithms achieved significant compression ratios, most were designed to compress voice and struggled with audio fidelity.
Unlike voice-oriented codecs, MP3 uses perceptual encoding to compress audio. Perceptual encoding removes extraneous information (i.e., audio that is ignored by your ear) from the digital stream while retaining audio fidelity. Consequently, it’s possible to create very low bit-rate MP3 files while minimizing the loss of audio fidelity. Furthermore, higher bit-rate MP3 files are virtually indistinguishable from the original audio files.
WHY AUDIO WANTS TO BE JUST LIKE DVD-VlDEO WHEN IT GROWS UP
In the meantime, Hollywood movie studios recognized the music industry’s catastrophic mistake of releasing perfect digital copies on unprotected media and, consequently, delayed the release of the DVD-Video format for over a year until they could be assured their content was secure from casual pirates. Ultimately, the Copy Protection Technical Working Group (CPTWG) was able to reassure Hollywood about the security of DVD-Video by designing copy protection solutions like encryption, authentication, and copyright preservation.
The first phase of DVD-Video copy protection involved encryption and authentication (a.k.a., CSS). Unlike Red Book CDs, multimedia content preserved on DVD may be copy-protected at the discretion of the content owner. Consequently, DVD-ROM drives will only permit applications to retrieve sectors if they are authorized. The authorization process involves the exchange of time-based 128-bit keys between the DVD-ROM drive and appropriate DVD decoder software. Although no algorithm is hacker-proof, this particular solution is robust enough to discourage the average hacker and, therefore, satisfy the studios.
The second phase of the DVD-Video copy protection scheme revolves around content rights tracking and protection. CSS protection is simplistic: either you permit copying or you don’t. However, there are circumstances where you may want your content to be copied, but you need to be able to track who is using your intellectual property and how they are using it. In addition, a pirate might break the CSS scheme, in which case you’d want to trace an illegal DVD back to its original source. The most robust solution for protecting such rights is watermarking.
Traditional watermarks consist of background images embedded in high-grade paper, which are useful for certifying original documents and detecting counterfeits. In the digital arena, watermarks are hidden identifiers embedded within multimedia content and may be used for rights management, commerce, and fraud prevention. For instance, a rights holder may permit users to make unlimited copies of a bitmap or audio file, provided they pay a licensing fee for each usage. The embedded watermark enables the rights holder to detect the presence of copyrighted content and charge the customer accordingly.
A second use for watermarks is the association of tags with specific vendors. For example, a studio could create watermarks for each DVD vendor that resells its product. These watermarks could then be traced to determine which vendor is selling the most DVDs for that studio.
The most popular use for watermarking is fraud prevention. A watermark-enabled player checks for the existence of a watermark in content before playback is permitted. If the player detects a corrupt or illegitimate watermark, it will refuse to play it.
THE KEY TO WATERMARKING
Watermarks fall into two categories: source and transactional. Source watermarks are attached to a specific form of media (i.e., a DVD-Audio disc) and are used to identify or protect resources on that media. By contrast, transactional watermarks are intended to track usage of a particular media stream and are independent of the storage format. For example, music distributors utilize transactional watermarks to monitor the number of times a song has been downloaded over the Internet.
Since watermarks have no strict definition of what they contain, nor how they should be formatted, they can morph to fit a multitude of applications. Some assets than can be stored inside a watermark are the content tide, author, publisher, and copyright.
One element that all watermarks store is cryptographic keys. These keys feature a unique string of bits that identify the legitimacy of the watermark. The more bits used in the key, the more impervious the watermark is to cracking. However, larger key sizes require increased sophistication to embed without affecting audio quality.
Besides sizes, there are a variety of key escrow schemes used by watermarking solutions. Two of the most popular are symmetric and public keys. Symmetric key solutions use a single key to encrypt and decrypt the watermark. By contrast, public key techniques combine a public key and a private key to encrypt a watermark. Because the private key is known only to the decrypting party, it is less vulnerable than a symmetric solution.
All watermarks must be unobtrusive. If a hard copy watermark is too dark, it will distract the reader. Likewise, if a digital watermark is cumbersome to use or distracts from the audio/visual experience, users will become irritated.
CRACKING THE SURFACE OF VIDEO AND AUDIO WATERMARKING
Although all digital watermarks involve embedding tags inside content, there are dramatic differences between video and audio watermarking. For example, the eye is less sensitive to altered content than the ear. Furthermore, motion video has the additional advantage of multiple frames being displayed each second. Since a frame is visible at 1/25th of a second, it is virtually impossible to detect the minor alterations caused by quality video watermarking solution.
Even though video watermarking isn’t as arduous as audio watermarking, they share many core features. For instance, although the Data Hiding Sub-Group (DHSG) of the CPTWG is evaluating a number of watermarking solutions for DVD-Video, most of their evaluation criteria also apply to audio watermarks. This evaluation process includes transparency, permission detection, generational control, minimal false positive detection, priority, and ease of use.
A video watermark should be transparent at normal playback speeds. However, robust solutions should also be imperceptible when the video is paused or a still image is shown. Similarly, an audio watermark should not be detectable at any playback speed.
Watermarks should be able to monitor and control the number of times a video is copied. Therefore, a stream’s watermark may contain the following permission attributes: copy none, copy once, copy multiple, or unlimited copying. “Copy none” prevents all copying, while “copy once” allows a single copy. “Copy multiple” permits a specific number of copies. Once this number is exceeded, it becomes a copy none watermark. “Unlimited copying” places no limitation on the number of copies supported.
When a copy is made, the watermark in the original stream is modified to reduce the number of subsequent copies that are permissible. After the watermark in the original file reaches copy none stares, copying is no longer permitted.
Although permission detection sounds impressive, it is worthless on read-only media. To be effective, the original watermark must be updated each time a copy is made. Since content on read-only media cannot be updated, there is no way to enforce copy once or copy multiple watermarks. Thus, watermarks on read-only media are either copy none or unlimited copy.
Watermarks also must be able to survive through successive generations. (A generation refers to the multimedia stream that emerges after performing a copy.) Thus, the watermark after a thousand generations (or copies) should be as virulent as the initial watermark. Furthermore, the watermark should persist even if it is transferred to or from the analog domain. For instance, if a watermarked DVD movie is copied to a VHS tape, the watermark should transfer to VHS. Similarly, when the VHS content is transferred back into the digital realm, the watermark must be detectable.
Watermarks also must be able to survive numerous file format changes. For example, if an MP3 file contains a watermark, that watermark must be retained even if the file is converted to a PCM .WAV file, then a Mu-law .AU file and finally back to an .MP3 file.
Watermarking algorithms should also have an arbitration mechanism to detect the insertion of fraudulent watermarks. For instance, a hacker may attempt to overwrite an authentic watermark with a counterfeit watermark. A robust watermarking solution not only detects the conflicting watermarks, but also gives the original watermark precedence over the fraudulent watermark, thereby protecting the content holder’s rights.
All industries attempt to minimize “false positive” identifications, which occur when a watermarking algorithm detects a bogus watermark in the content and restricts the user from either playing or copying the content. False positives also occur when the algorithm fails to identify an illegitimate watermark and permits playback of pirated content. Since a false positive identification usually results in an irate customer, commercially successful watermarking algorithms have negligible false positive detection ratios.
Since most watermarking algorithms satisfy the basic criteria mentioned earlier, product differentiation is achieved via intangible features such as ease of use and robustness. For instance, ease-of-use may cause one solution to be preferred over another. Another solution may be chosen because of a strong patent portfolio and proven record of intellectual property rights enforcement.
ANALOG AUDIO PRIMER
Although both audio and video watermarks share the characteristics described, audio watermarks require additional safeguards to prevent detection by the human ear. To appreciate these nuances, you need to understand how audio is digitized.
Analog sounds are disturbances (or waves) detected by our ears. Two of the important characteristics of an audio wave (or waveform) are its frequency and amplitude. Frequency describes the number of cycles that pass a specific point each second. Amplitude represents the height (or volume) of the sound. Typically, analog waves are charted over time to track their movement.
You can convert these analog waves into digital audio streams by sampling (or approximating) the shape of the analog waveform. This conversion process is tedious and requires careful calculations to prevent degradation of audio quality. For instance, Nyquist’s theorem states that you must sample at twice the frequency of the analog wave or phantom waves will creep into the content. Since most humans cannot detect sounds above 22KHz, CDs and other high-quality formats sample at 44KHz to perserve audio fidelity.
Sample resolution also impacts audio quality. The more bits used to capture a sample, the closer the digital waveform will approximate the original analog waveform. Sampling size is particularly crucial for musical content since it has a greater dynamic range than speech. Thus, CDs store PCM content captured at 16-bits per sample.
Alas, no matter how many bits you use to capture the sample, the digital wave will always be an approximation of the analog wave. The difference between this guess and the actual value is known as a quantization error. The ear considers these quantization errors distortions and is very intolerant of them. Thus, digital audio streams are typically run through a filter to minimize such errors.
THE BRAIN-TO-EAR CONNECTION
The human ear is besieged by noise. As a result, the brain filters out certain frequencies that it considers extraneous. This process is known as spectral masking and is exploited by perceptual encoders to compress digital audio streams. Although audio watermarking algorithms are proprietary and closely guarded by their designers, most operate on one or more of the following digital audio characteristics: filters, frequency, amplitude, and time.
Since digital audio streams are normally filtered to remove quantization errors, an audio watermark theoretically can be inserted during filtering without affecting sound quality. For example, a filter may monitor the audio stream to prevent clipping. When it detects clipping, the filter smoothes out the wave to prevent an irritating pop. While it is smoothing, the filter can add a digital signature (or watermark) to the stream. Watermarks can also be stashed in the amplitude phase (or volume component) of the waveform. This solution is the least desirable because the ear is sensitive to volume manipulations and because hackers can detect unusual patterns in the amplitude of the wave.
Another technique for hiding a watermark is to place it in an undetectable frequency. Because the brain will discard apparently redundant frequencies, a watermark can be placed on such a frequency without affecting fidelity. You must be careful when using this approach, however, as a perceptual encoder such as MP3 may detect this frequency as unnecessary and strip out the watermark when the stream is recompressed. Frequency-based watermarks also tend to avoid frequencies above 22KHz since they aren’t detectable by most humans and are discarded by most samplers.
Since pirates can write algorithms to detect the presence of filter and frequency watermarks, time-based watermarks are usually interspersed within the content. These watermarks are located at random (or pseudo-random locations) within the stream to eliminate predictability. Although frequency, amplitude, and time-based watermarks can be used independently, most watermarking solutions combine two or more watermark alternatives to create a more robust solution. Furthermore, most companies enhance these alternatives with proprietary techniques to differentiate themselves from competitors.
THE BATTLE FOR DOMINANCE
The audio industry is about to make a transition from the uncompressed 44KHz, 16-bits per sample audio format established by the Red Book CD format into the 96KHz, 24-bit, multichannel audio specification defined for DVD-Audio. Greater sample resolution and higher sampling rates result in digital audio streams that more closely approximate the analog original and, consequently, sound better. Unfortunately, this transition could be painful for some audio watermarking vendors since it may expose weaknesses in their algorithms.
For example, if a company designed a hardware filter that assumed a 16-bit sample size, this filter would break if it tried to insert a watermark into a stream with 24-bit samples. Other poorly designed solutions will blow up when they have to deal with hiding watermarks in frequencies higher than 22KHz. In addition, some audio purists are concerned that the noise introduced by sub-standard watermarking algorithms will degrade sound quality to CD levels and thereby negate the benefits of increased sample resolution and sampling rates.
As we discovered with DVD-Video, watermarking is only one aspect of a complete audio copy protection solution. Content holders that deliver on physical media are likely to complement it with a CSS-like algorithm to prevent unauthorized access to the content.
It’s clear that audio piracy is a serious threat to content holders’ rights and that watermarking can be an effective tool to prevent unauthorized copying of content. Unfortunately, numerous vendors claim they have the ultimate audio watermarking algorithm; a number of standards bodies currently are evaluating these claims. Furthermore, there is an intense debate about how a watermarking solution can be deployed without infringing on consumers’ freedoms. What the fallout from all this controversy will be remains to be seen.