An
application of Discrete Fast Fourier Transform algorithm
Semester: Summer 2002
Instructor: Prof. Gonhsin
Liu
Student
Name: Lee,
Kun-Hung
Student
ID:
0567654
1.
Introduction to digital audio
2.
Frequency information in a function of time
3.
The Fourier Transform as a
mathematical concept
4.
Table 1: Symmetry Properties
of the Fourier Transform
5.
The Discrete Fast Fourier
Transform algorithm
7. Testing
8. Source code
The most common type of digital audio recording is called pulse code modulation (PCM). Pulse code modulation is what compact discs and most WAV files use. In PCM recording hardware, a microphone converts a varying air pressure (sound waves) into a varying voltage. Then an analog-to-digital converter measures (samples) the voltage at regular intervals of time. For example, in a compact disc audio recording, there are exactly 44,100 samples taken every second. Each sampled voltage gets converted into a 16-bit integer. A CD contains two channels of data: one for the left ear and one for the right ear, to produce stereo. The two channels are independent recordings placed "side by side" on the compact disc. (Actually, the data for the left and right channel alternate...left, right, left, right, ... like marching feet.)
The data that results from a PCM
recording is a function of time. It often amazes people that a sequence of
millions of integers on a compact disc recording can yield music and speech.
People tend to wonder, "How can a stream of numbers sound like an entire
orchestra?" It seems magical, and it is! Yet the magic is not in the
digital recording; it's in your ear and your brain. To understand why this is
true, imagine that you could place a microscopic movie camera in your ear to
film your ear drum in slow motion. Suppose the movie camera was so fast that it
could take a picture every 1/44,100 of a second. Also, suppose that the images
this camera captured on film were so crisp and sharp that you could discern
65,536 (64K) distinct positions of the ear drum's surface as it moved back and
forth in response to incoming sound waves. If you used this hypothetical
technology to film your ear drum while listening to your best friend saying
your name, then took the resulting movie and wrote down the numeric position of
your ear drum in every frame of the movie, you would have a digital PCM
recording. If you could later make your ear drum move back and forth in
accordance with the thousands of numbers you had written down, you would hear
your friend's voice saying your name exactly as it sounded the first time. It
really doesn't matter what the sound is - your friend, a crowded party, a
symphony - the concept still holds. When you hear more than one thing at a time,
all the distinct sounds are physically mixed together in your ears as a single
pattern of varying air pressure. Your ears and your brain work together to
analyze this signal back into separate auditory sensations. It's literally all
in your head!
An organ in our inner ears called the
cochlea enables us to detect tonality in the sounds we hear. The cochlea is
acoustically coupled to the eardrum by a series of three tiny bones. It
consists of a spiral of tissue filled with liquid and thousands of tiny hairs.
The hairs on the outside of the spiral are longer than the hairs on the inside
of the spiral. In fact, the hairs get gradually smaller as you wind your way around
the spiral to the inside. Each hair is connected to a nerve which feeds into
the auditory nerve bundle going to the brain. The longer hairs resonate with
lower frequency sounds, and the shorter hairs with higher frequencies. Thus the
cochlea serves to transform the air pressure signal experienced by the ear drum
into frequency information which can be interpreted by the brain as tonality
and texture. This way, we can tell the difference between adjacent notes on a
piano, even if they are played equally loud. The Fourier Transform is a
mathematical technique for doing a similar thing: resolving any time-domain
function into a frequency spectrum, much like a prism splitting light into a
spectrum of colors. This analogy is not perfect, but it gets the basic idea
across.
(Go to Top)
The Fourier transform, in essence, decomposes or separates a waveform or function into sinusoids of different frequency which sum to the original waveform. It identifies or distinguishes the different frequency sinusoids and their respective amplitudes. The Fourier transform of f(x) is defined as
![]()
Applying the same transform to F(s) gives
![]()
If f(x) is an even function of x, that is f(x) = f(-x), then f(w) = f(x). If f(x) is an odd function of x, that is f(x) = -f(-x), then f(w) = f(-x). When f(x) is neither even nor odd, it can often be split into even or odd parts.
To avoid confusion, it is customary to write the Fourier transform and its inverse so that they exhibit reversibility:
so that
as long as the integral exists and any discontinuities, usually
represented by multiple integrals of the form ˝[f(x+) + f(x-)],
are finite. The transform quantity F(s) is often represented as
and the Fourier
transform is often represented by the operator
.
There are functions for which the Fourier transform does not exist; however, most physical functions have a Fourier transform, especially if the transform represents a physical quantity. Other functions can be treated with Fourier theory as limiting cases. Many of the common theoretical functions are actually limiting cases in Fourier theory.
Usually functions or waveforms can be split into even and odd parts as follows
f(x) = E(x) + O(x)
where
E(x) = ˝ [f(x) + f(-x)]
O(x) = ˝ [f(x) - f(-x)]
and E(x), O(x) are, in general, complex. In this representation, the Fourier transform of f(x) reduces to
![]()
It follows then that an even function has an even transform and that
an odd function has an odd transform. Additional symmetry properties are shown
in Table 1.
(Go to Top)
4.Table 1: Symmetry Properties of
the Fourier Transform
function transform-----------------------------------------------------------real and even real and evenreal and odd imaginary and oddimaginary and even imaginary and evencomplex and even complex and evencomplex and odd complex and oddreal and asymmetrical complex and asymmetricalimaginary and asymmetrical complex and asymmetricalreal even plus imaginary odd realreal odd plus imaginary even imaginaryeven evenodd odd
An important case from Table 1 is that of an Hermitian function, one in which the real part is even and the imaginary part is odd, i.e., f(x) = f*(-x). The Fourier transform of an Hermitian function is even. In addition, the Fourier transform of the complex conjugate of a function f(x) is F*(-s), the reflection of the conjugate of the transform.
The cosine transform of a function f(x) is defined as
This is equivalent to the Fourier transform if f(x) is an even function. In general, the even part of the Fourier transform of f(x) is the cosine transform of the even part of f(x). The cosine transform has a reverse transform given by
Likewise, the sine transform of f(x) is defined by
As a result, i times the odd part of the Fourier transform of f(x) is the sine transform of the odd part of f(x).
Combining the sine and cosine transforms of the even and odd parts of f(x) leads to the Fourier transform of the whole of f(x):
where
and
stand for -i
times the Fourier transform, the cosine transform, and the sine transform
respectively, or
F(s) = ˝FC(s) - ˝iFS(s)
Since the Fourier transform F(s) is a frequency domain
representation of a function f(x), the s characterizes the
frequency of the decomposed cosinusoids and sinusoids and is equal to the
number of cycles per unit of x . If a function or waveform is not
periodic, then the Fourier transform of the function will be a continuous
function of frequency.
(Go to Top)
The discrete FFT is an algorithm that converts a sampled complex-valued function of time into a sampled complex-valued function of frequency. Most of the time, we want to operate on real-valued functions, so we set all the imaginary parts of the input to zero. In order for you to understand the algorithm, there are some specifications you need to know.
realOut array holds the coefficients of the cosine waves in the
Fourier formula. imagOut array holds the coefficients of the sine waves in the Fourier
formula. realOut and imagOut) are a
little bit strange, because they contain both positive and negative
frequencies. Both positive and negative frequencies are necessary for the
math to work when the inputs are complex-valued (i.e. when at least one of
the inputs has a non-zero imaginary component). Most of the time, the FFT
is used for strictly real-valued inputs, and this is especially the case
in digital audio analysis. The FFT, when fed real-valued inputs, gives
outputs whose positive and negative frequencies are redundant. It turns
out that they are complex conjugates of each other, meaning that
their real parts are equal and their imaginary parts are negatives of each
other. If your inputs are all real-valued, you can get all the frequency
information you need just by looking at the first half of the output
arrays. realOut[0] and imagOut[0], contains the average value of all the input samples. For the
output indicies i = 1, 2, 3, ..., n/2, the value of the
frequency expressed in Hz is f = samplingRate * i / n. The negative frequency counterpart of
every positive frequency index i = 1, 2, 3, ..., n/2 - 1, is
i' = n - i. Here's an example. Suppose the sampling
rate is 44,100 Hz, and we are using buffers of 1024 complex numbers for
both the inputs and outputs. The frequency at i = 1 would be
(44,100 Hz) * 1 / 1024 = 43.07 Hz. The negative frequency counterpart
would be at i = 1024 - 1 = 1023 (i.e. the last slot in the array),
with a frequency of -43.07 Hz. Likewise, i = 17 would correspond to
a frequency of (43.07 Hz)*17 = 732.13 Hz, while i = 1024 - 17 =
1007 would correspond to -732.13 Hz, etc. · #include <math.h>
·
· double magnitude = sqrt ( realOut[i]*realOut[i] + imagOut[i]*imagOut[i] );
· double angle = atan2 ( imagOut[i], realOut[i] );
If you are interested in doing the inverse conversion, from magnitude and angle to real and imaginary, use the following code:
#include <math.h> double real = magnitude * cos(angle); double imag = magnitude * sin(angle);

Although this formula tells you what the FFT is equivalent to, this formula is not
how the FFT algorithm is implemented. This formula requires O(n^2)
operations, whereas the FFT itself is O(n*log2(n)).
In other words, if you were to use the formula above, it would be much slower
than using the FFT algorithm. However, if you only need a small subset of the
frequency spectrum (say two or three frequency samples), or you have a number
of samples that isn't a power of 2, this formula combined with some trig optimizations
could be of practical use.
The FFT algorithm tends to be better suited to analyzing digital audio recordings than for filtering or synthesizing sounds. A common example is when you want to do the software equivalent of a device known as a spectrum analyzer, which electrical engineers use for displaying a graph of the frequency content of an electrical signal. You can use the FFT to determine the frequency of a note played in recorded music, to try to recognize different kinds of birds or insects, etc. The FFT is also useful for things which have nothing to do with audio, such as image processing (using a two-dimensional version of the FFT). The FFT also has scientific/statistical applications, like trying to detect periodic fluctuations in stock prices, animal populations, etc. FFTs are also used in analyzing seismographic information to take "sonograms" of the inside of the Earth. I have even read about Fourier methods used to analyze DNA sequences!
The main problem with using the FFT for processing sounds is that the digital recordings must be broken up into chunks of n samples, where n always has to be an integer power of 2. One would first take the FFT of a block, process the FFT output array (i.e. zero out all frequency samples outside a certain range of frequencies), then perform the inverse FFT to get a filtered time-domain signal back. When the audio is broken up into chunks like this and processed with the FFT, the filtered result will have discontinuities which cause a clicking sound in the output at each chunk boundary. For example, if the recording has a sampling rate of 44,100 Hz, and the blocks have a size n = 1024, then there will be an audible click every 1024 / (44,100 Hz) = 0.0232 seconds, which is extremely annoying to say the least.
I have had some success getting rid of the discontinuities using the
following method. Assume the buffer size is n = 2^N. On the first
iteration, read n samples from the input audio, do the FFT, processing,
and IFFT, and keep the resulting time data in a second buffer. Then, shift the
second half of the original buffer to the first half (remember that n is
a power of 2, so it is divisible by 2), and read n/2 samples into the
second half of the buffer. Do the same FFT, processing, IFFT. Now, do a linear
fade out on the second half of the old buffer that was saved from the first
(FFT,processing,IFFT) triplet by multiplying each sample by a value that varies
from 1 (for sample number n/2) to 0 (for sample number n - 1). Do
a linear fade in on the first half of the new output buffer (going linearly
from 0 to 1), and add the two halves together to get output which is a smooth
transition. Note that the areas surrounding each discontinuity are virtually
erased from the output, though a consistent volume level is maintained. This
technique works best when the processing does not disturb the phase information
of the frequency spectrum. For example, a bandpass filter will work very well,
but you may encounter distortion with pitch shifting.
(Go to Top)
My testing signal is composed of three Cosine waveform. Their functions and graphics are below:
dlg.m_amplitude*cos( 2*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )

dlg.m_amplitude/2*cos( 4*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )

(dlg.m_amplitude/3)*cos( 6*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )

-(dlg.m_amplitude/4)*cos( 8*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) ) ;

The actual signal’s function is like this:
a = dlg.m_amplitude*
cos( 2*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )
+(dlg.m_amplitude/2)*
cos( 4*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )
+(dlg.m_amplitude/3)*
cos( 6*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) )
-(dlg.m_amplitude/4)*
cos( 8*PI*dlg.m_frequency*i/pow(2,dlg.m_sampling) ) ;

It’s FFT view is like this

From this FFT view, we can know that signal is composed of four waveforms. By the FFT data we can get its original form like this

And its power view like this

This algorithm works exactly
as what we think.
(Go to Top)
(Go to Top)
Last
Updated: 2002/9/17
Write by: Lee, Kun-Hung
© copywrite 2002