The current boom in inexpensive digital audio boards is opening the market for all manner of audio applications. Just as the explosion of graphics hardware a decade ago spawned a trend towards more visually dynamic and appealing programs, this current growth in digital audio will inevitably result in much better aural interfaces. It's safe to say the heyday of the time honored ASCII beep is nearing its end. With this new emphasis on sound will of course come increased demand for programmers skilled in the dark arts of audio programming.
This article will look at some simple digital signal processing (DSP) algorithms for programming several popular real time audio effects including pitch change, echo, flanging, phase shifting, and more. Now maybe you already sound a little like Darth Vader on Monday mornings anyway, but with these effects programs and an inexpensive soundcard, you'll be able to sound like Darth (or even like munchkins or robots) any time you want.
In addition to providing some cheap thrills and fun, this look at audio effects processing should serve as a straightforward introduction to some of the more general types of things you have to handle as you program digital audio hardware. And it uses very low cost hardware so you can get your feet wet in audio DSP without shelling out a lot of dough.
We'll do our audio DSP programming examples on two very different pieces of hardware, a dedicated DSP development board from Texas Instruments, and a popular PC soundcard from Microsoft.
If you're the type of person that enjoys assembly language, who enjoys feeling the proverbial bits between your toes, and you'd like to really get a taste of digital signal processing, there's no better way than to actually program a dedicated DSP. These are quirky yet powerful little beasts and the Texas Instruments DSP Starter Kit (DSK) package provides an excellent way to learn their family of 320C2x DSP chips.
For only $99 you get a little standalone circuit board with a 320C26 processor and voice grade A/D and D/A converters. Plus you get an assembler and a full featured debugger that run on a host PC. Manuals are included for the 320C2x family of processors as well as for the assembler and debugger.
The circuit board is powered from a generic 9v AC supply (Radio Shack part number 273-1611B) and connects to the host PC via a standard RS-232 cable. This is a really clever package and very nicely done. The module can also be easily adapted for other non-audio purposes. The A/D and D/A converters are direct coupled and could be used for control applications, plus all the major interfaces are brought out for easy wire wrapping if you're a hardware hacker type.
For those of you who are a little less inclined in the assembly language department, we'll also look at some C code for the Windows Sound System (WSS) soundcard from Microsoft. This card can be had for well under $200 and the only development tools you'll need are just your regular C compiler and linker. The WSS soundcard is comprised of an Analog Devices AD1848 analog to digital converter (ADC) and digital to analog (DAC) chip, plus a Yamaha FM synthesis chip and some glue logic. Since this card does not have a dedicated DSP onboard we will use the host CPU do do all the processing. It turns out that a 486 can really do quite a respectable job running some of the DSP algorithms even in C.
It should be noted that although both these pieces of hardware provide adequate audio quality for experimental purposes, neither provides true professional quality audio. However, they are both inexpensive, and if you decide you're interested you can certainly apply the concepts described in this article to programming higher quality albeit more expensive digital audio products.
DSP-based audio effects are everywhere these days. Many home stereo systems now come with reverb or concert hall processing built in courtesy of DSP. Some car stereos are even being similarly equipped. And you certainly can't walk in to a music store without seeing racks of versatile DSP-based effects processors and synthesizers.
We'll start our little survey of audio effects by looking at an echo effect. Echo is achieved through the use of a single fixed delay element and produces the well known:
Hello Hello Hello Hello Hello Hello Hello
Delays are an important part of many other audio effects, and we'll look next at another simple delay-based effect called "flanging". This popular swooshing effect has been used and abused on numerous rock records. A classic example of this effect can be heard during the breakdown section of "Life in the Fast Lane" by the Eagles. We'll also examine chorusing and other effects which are produced by very similar means.
Next, we'll demonstrate a simple pitch changer using some of the techniques we used to create flanging and chorusing. Pitch changers allow the pitch of a sound to be dramatically altered in real time. A downward pitch change can make your voice sound like Darth Vader from Star Wars (although the actual Darth effect is a little more complex). An upward pitch change will make you sound like you've been inhaling helium or get you ready for your Alvin and the Chipmunks tryouts.
Finally, we'll look at phase shifting. This effect resembles flanging in character although it is somewhat more subdued. It can be heard all over Pink Floyd's "Dark Side of the Moon" record and numerous other records from the early '70s. This effect is based on a curious type of filter called an "all-pass" filter, and we'll demonstrate the digital implementation of this type of filter.
As we discuss these effects, we will refer to supplied code implementing the effects. The assembly language code for the TI DSK module is contained in files suffixed ".asm". The various modules of "fx.c" provide similar functionality for the WSS soundcard.
An echo is simply an identical copy of the original audio signal, but delayed by a fixed amount in time. It is extremely easy to digitally create this fixed delay.
As we read samples from the analog to digital converter, we store them in a circular buffer. When the buffer is filled, the store pointer wraps back around to the beginning of the buffer.
The delay comes from a single read pointer which is placed N slots "behind" the store pointer and marched along in step with the store pointer. As each new sample is stored, a sample is read from N samples behind it, thus creating a static delay of N * 1/Fs where Fs is the sampling rate.
This delayed signal is then mixed in with the original signal, usually at a somewhat reduced volume level. This gives us one nice echo, but so far all we have is:
To get the decaying repeats alluded to above, we need to supply a feedback path around the delay element. Figure 1 shows the block diagram of an echo effect which can provide decaying repeats. The more feedback, the longer the repeats take to fade away.
The first patch provided by the 'fx.c' program for WSS demonstrates this decaying echo effect. Note: no similar effect is provided for the TI module since it doesn't provide quite enough memory to get suitable delays for echo type effects.
Building on the concept of fixed delay element, we look at another delay-based effect called "flanging". This effect uses extremely short delay lengths however, which are not discernible to the ear as discrete echoes.
The origins of the term "flanging" are somewhat uncertain. Some sources credit George Martin, the producer for the Beatles, with coining the term in jest. Other sources suggest a practical origin. In any case, the effect was originally produced by running two tape machines with identical tapes closely in sync. Then the speed of one machine was slightly varied, possibly by a recording engineer's thumb on the flange of the tape reel. The resultant short varying delay creates the characteristic striking whooshing sound.
Why the whoosh? When a signal is mixed with a very short delay of itself, there will be certain frequencies at which the signal is 180 degrees out of phase with itself and near total cancellation will occur. For instance, with a delay of 1 millisecond, dips (or notches) will occur at 500 Hz, 1500 Hz, 2500 Hz, 3500 Hz, etc.
This frequency response shape is commonly called a "comb" filter since the notches resemble the teeth of a comb. As the delay is varied from a fraction of a millisecond to 5 milliseconds or so, the notches will sweep dramatically up and down in frequency. Our ears hear this as sounding "swooshy".
The digital implementation of flanging is similar to that of echo except the delay time must be very short and continuously variable. Figure 2a shows a block diagram of the signal path for flanging while Figure 2b shows the "shape" of delay variation we use to create flanging. The key element is the implementation of the varying delay element.
Now it might seem like you could vary the delay by taking the fixed delay element implementation described above for echo and simply moving the read pointer in relation to the store pointer by a notch every now and then. Unfortunately, this simple approach to varying the delay creates a little click every time the delay tap is changed. Any steady movement of the delay tap results in "zipper noise", an objectionable gritty modulation noise mixed with the varying delay signal.
To avoid this, we need to implement a method of achieving non-integer delays and thereby sweep the delay more smoothly, varying it by just a little bit with each sample. This problem turns out to be closely akin to the general problem of sample rate conversion, which is discussed at length in many texts on DSP. Unfortunately, most proper methods for non-integer ratio sample rate conversion tend to be a bit computationally intensive and not always well suited for real time work.
Luckily, there's a very simple but inexact method that yields subjectively low audible distortion and yet is computationally very efficient. An averaged linear interpolation between two sample points turns out to give us very good bang-for-the-buck. Figure 3 graphically illustrates this technique. Study the file 'flange.asm' and the module 'flange_chorus()' from 'fx.c' for implementation details of this linear interpolation technique.
To create the basic flanging sound, this fine grained variable delay element is cyclicly "swept" between a very short delay value of less than a millisecond to a longer delay value of 5-10 milliseconds. The rate and range of this sweep can be adjusted to achieve radically different characters of the basic flange.
Other variations on the basic flanging effect can be achieved by providing a feedback path (just as was used to create decaying echoes) and recirculating some of the delayed signal. This can dramatically intensify the flanging effect. It can impart a strong harmonic nature to the sound as the feedback creates a more resonant filtering action.
Note also that the delayed signal and the feedback can be inverted before being summed by simply reversing the sign of the gain stages. This creates some interesting variants often overlooked in commercial effects devices. Refer to Figure 2a again, noting the paths for feedback and the gain stages providing for inversion prior to summing.
For another variant, the sweep is disabled entirely, reverting back to the basic fixed length delay effect only using very short delay times. The robot voice patch in 'fx.c' uses a fixed short delay with lots of feedback. This creates a static metallic resonant filter sound which was extensively used to make mechanical voices in old sci-fi movies.
It's also a short hop from flanging to an effect called "chorusing". Chorusing uses the exact same processing as flanging except the delay value is increased to somewhere around 20-40 milliseconds. This is a long enough delay that the exaggerated comb filtering action decreases but short enough that the delayed signal is not quite heard as a distinct echo. Instead, the gently undulating pitch change resulting from the varying delay just adds a subjective richness to the sound, much like a second voice singing unison-- hence the term "chorusing".
As you play around with flanging and chorusing effects, you'll probably notice how faster rates of delay change result in funny warbling pitch variations in the signal. As the delay goes from longer to shorter, you'll hear a sort of Doppler shift up in pitch. When the sweep reverses, you'll hear the reverse Doppler shift as the delay "moves away from you". It proves to be fairly easy to harness this effect of a varying pitch from a varying delay into a decent real time pitch change algorithm.
For example, to create an upward pitch change, we could start with a 30 millisecond delay and steadily decrease it at a rate to yield the desired pitch change. As the delay approaches zero, we would start a second delay channel at 30 milliseconds and sweep it as well. Then we do a quick crossfade from the first channel to the second channel making sure to have the first channel completely faded out before its delay reaches zero. We repeat this process going back and forth between the delay channels. Refer to Figure 4 for a diagram of the changing delays and the crossfading pattern.
A downward pitch change is achieved in a similar manner, only the delay channels are started with a near-zero initial delay, and the delay is increased out to around 30 milliseconds at which time the alternate channel is started and the crossfade performed.
The subjective result of this approach is quite good for small to medium amounts of pitch change. As the interval becomes greater, however, the frequency of the splicing (or crossfading) increases until you start to notice the a "singing through the fan" effect in the pitch-changed signal. But in spite of its simplicity and obvious limitations, this approach is quite effective and many commercial units are based on something along these lines.
Refer to 'pitch.asm' and the 'pitch_change' module of 'fx.c' for implementation details. A smooth crossfade is essential to good quality blending of the two delay channels. The 'fx.c' version use sin and cosine lookup tables to generate ideal crossfade blends, whereas the memory limited 'pitch.asm' version uses a two piece linear approximation of the sin and cosine functions to perform the crossfade. Note that the crossfade time is one parameter that can be tinkered with to provide less noticeable splicing at certain pitch change rates and for different signal types.
You will definitely want to plug in a microphone and talk through some of these pitch change effects. It's hours of fun. Warning: you'll probably want to change your answering machine message once you hear how cool you sound talking through a serious downward pitch change!
Some really wild effects can be created by combining pitch change with echo-length delays and/or providing feedback paths around the pitch change element. Some the patches provided with the 'fx.c' program demonstrate these additional tricks.
We'll close out our little survey of audio effects with a biggie from the early '70s. Phase shifting is not unlike flanging in that its frequency response characteristic is that of one or more notches sweeping up and down. Like flanging, the notches in the frequency spectrum result from phase cancellation between the unaffected signal and the processed signal. However, the nature of the processing is quite different.
Phase shifting uses an interesting type of filter called an all-pass filter. As the name implies, this type of filter passes all frequencies, but instead "filters" the phase of the signal. While its frequency response is a straight line, its phase response varies by 180 degrees with a 90 degree phase shift at what would traditionally be considered the cutoff frequency of a normal filter.
The normalized transfer function of a first order all-pass filter is:
s - 1 H(s) = ----- s + 1
Using a bilinear z-transform (BZT) method, we arrive at a difference equation of:
y(n) = A * x(n) + A * y(n-1) - x(n-1)
Where the coefficient A is described by:
1 - wp A = ------ 1 + wpand:
wp = (PI * freq) / Fs Fs = sampling rate
The phase shifting effect is then implemented by cascading several such all-pass filter sections and sweeping their cutoff frequencies in unison. Mixing this processed signal with the original signal results in the notching effect as the total phase delay through the filter sections causes certain frequencies to cancel. Like flanging, the effect can be varied by providing a feedback path around the filter sections, and by providing for inverting the processed signal and the feedback.
A smooth sweep function is important; the frequencies of the filters should be changed exponentially over time. This is easily accomplished using floating point in C, but the assembly version uses another linear approximation of the desired function. Refer to 'phaser.asm' and the 'phase_shift' module of 'fx.c' for more implementation details.
The demo programs for the TI DSK module were all assembled and tested using their supplied assembler and debugger. There is no real user interface to any of the programs other than changing the suggested values in the source code (see the section in each file called "Knobs"). However, they assemble in the blink of an eye and download equally quickly so it's easy to try different things. These programs are not highly optimized but they do attempt to use features of the DSP such as its built in saturation overflow mode. This mode provides analog-like "clipping" instead of allowing overflow. Trust me, you don't want to have headphones on when you test a program that has an unchecked overflow problem. Word wrap is not a pretty thing to listen to.
The demo program 'fx.c' for the WSS soundcard was compiled and tested using Borland Turbo C 2.0 and Borland C++ 2.0 on a 33MHz 486. You'll need a 486 or a fast 386 with math coprocessor to run this program as it uses floating point extensively to avoid obscuring the algorithms with the fussy bit shiftings and normalization characteristic of integer DSP.
All patches work at the default sample rate of 16K. The version compiled under Turbo 2.0 will in fact run most patches at a sample rate of 27K on a 33MHz 486, however the Borland C++ 2.0 version is somewhat slower. Your mileage may vary with other compilers of course.
Here's the part where we wrap up with the obligatory "the future is exciting" paragraph. But the future really is exciting. The field of crunching audio with computers is still very much in its youth.
Recording studios and audio-for-video facilities are just now starting to move rapidly towards more digital implementations. As studios set aside clunky tape storage as a primary medium and adopt hard disk-based systems, the possibilities for increased digital processing abound. We will effectively be able to have access to the audio signal (italics) before it happens (end italics)! This opens up whole new realms of playback audio processing possibilities. Not to mention, unheard of non-real time audio processing will be convenient to perform. Extremely complex audio processing can be "rendered" on hard disk and then auditioned.
Whatever your interests-- voice recognition, music, communications, or just games, audio DSP is sure to play an ever increasing part in your future.
|Digital Signal Processing - A Practical Approach||Ifeachor and Jervis 0-201-54413-X|
|Principles of Digital Audio||Pohlmann 0-672-22634-0|
|Digital Audio Signal Processing||John Strawn 0-86576-082-9|
TMS320C2X Digital Signal Processing Starter's Kit Part #TMDS3200026 Available from any TI distributor such Hamilton/HallmarkDSP home