ECEN 1200 - Telecommunications 1

Peter Mathys, Fall 2006, 11/03/06


Homework/Computer Lab 9: Analyzing and Editing Audio Files, Writing a "Sound Story"


How to Submit Solutions for this Homework/Computer Lab
Due Date: Friday 11-10-06
E-mail To: no-spam e-mail
Subject Line: HWCL 09
1'st Line: Your name and student number

This computer lab is written for Windows 9x/2000/NT/XT based PCs, with a Mozilla/Firefox Browser or an Internet Explorer. Those portions of the lab that are not platform-specific can also be run on a Mac.

To run the experiments in this lab your computer needs to have a soundcard and speakers or headphones. Make sure that you plug the speakers/headphones into the soundcard output (often located at the rear end of the computer) and not into the headphone output of the CD-ROM drive (usually located more conveniently at the front end of the computer).

Quick Links


Goals of this Lab

The goals of this computer lab are:


Programs to Download

For this lab you will need the following programs. If necessary, download them, unzip them, and then install them on your computer.

7-Zip Icon 7-Zip is a file archiver with high compression ratio. It supports the 7z, ZIP, GZIP, BZIP2 and TAR formats for both packing and unpacking. Unpacking only is supported for the RAR, CAB, ISO, ARJ, LZH, CHM, Z, CPIO, RPM, DEB, and NSIS formats. The home page of 7-Zip is at http://www.7-zip.org.

GoldWave 4.26 Icon GoldWave V4.26 is a sound editor, player, recorder, and converter. It can create entertaining sound files for Web pages, answering machines, or Windows sounds. A rich set of effects and editing features are included for professional sound production. High quality audio CDs can be created by using GoldWave in conjunction with CD writer software. The home page for GoldWave is at http://www.goldwave.com. Note: If you cannot install GoldWave because you have no administrator privileges, then you should use the files in gwave426run.7z after unzipping (with 7-Zip) to a directory of your choice.

Spectrogram 4.1.2 Icon Spectrogram 4.1.2 (use 7-Zip to unpack the archive). Most ordinary sounds are complex combinations of individual frequency components or harmonics with a wide range of frequency and intensity. A spectrogram is simply a plot of the frequency components of such an audio signal as a function of time. With this Spectrogram program, digital audio recordings are analyzed to produce a plot of frequency versus time, with harmonic intensity represented by a variable color scale. These spectrograms reveal the fascinating hidden frequency structure of audio signals and can be used for identifying or classifying particular sounds.

Edit Plus 2.21 Icon EditPlus V2.21 is an Internet-ready, 32-bit text editor for Windows. It offers many powerful features for Web page authors and programmers and has powerful syntax highlighting for HTML, JavaScript, Perl, Java, C/C++ and any other programming language, based on the default syntax files or user-defined syntax files, for both the screen display and the printing. A spell checker (US version) is also available. The home page for EditPlus is at http://www.editplus.com.
Note: If you cannot install EditPlus because you have no administrator privileges, then you should use the files in epp221run.7z after unpacking (with the 7-Zip program) to a directory of your choice.

Speaker Icon wav-files02.rar. An archive with all the .wav files that you need to complete this homework/computer lab (unpack using 7-Zip).


Sounds in the Time Domain

When one looks at sounds in the time domain, then one plots the amplitude of a sound wave on the vertical axis, versus time on the horizontal axis. For a pure sinewave (single tone) like sin900.wav this looks as follows:

900 Hz sine signal in time domain

To determine the frequency of a pure tone in Hertz (Hz), one counts the number of periods of the waveform per second. As a practical matter, periods are usually counted over a shorter time interval and from that the number of periods per second is computed. In the above plot there are 9 complete periods in 0.01 seconds or 10 milliseconds. Thus, in one second there must be 9*100 = 900 periods and the above waveform therefore has a frequency of 900 Hz.

Musical instruments usually generate several harmonically related tones, i.e., tones which consist of a fundamental frequency f1, the second harmonic at frequency f2=2*f1, the third harmonic at frequency f3=3*f1, etc. The tone C6 (two octaves above middle C) recorded from a piano is shown in the time domain in the next graph.

Tone C6 recorded from piano, time domain representation

This view of the graph does not have enough resolution for reading off the fundamental frequency of the tone. But if the highlighted portion in the graph is extended, then the figure below is obtained in which the periodic nature of this tone can be easily seen.

Tone C6 recorded from piano, zoomed into time domain representation

Counting periods in this graph, one finds that there are 10.5 periods in 10 milliseconds, and therefore the fundamental frequency of note C6 on the piano is 1050 Hz. The second harmonic is then at 2*1050=2100 Hz, the third harmonic at 3*1050=3150 Hz, the fourth harmonic at 4*1050=4200 Hz, etc. This is quite clearly visible in the following spectrogram.

Tone C6 recorded from piano, spectrogram representation

The next figure shows the settings that were chosen to display the above spectrogram. The controls marked with red ovals were used to select the frequency and time resolution (use "File", "Parameters", "Change" in the Spectrogram program to open the control panel).

Settings for spectrogram representation of tone C6 from piano

The next graph shows a "chord" consisting of the sum of three tones, from the file chord.wav

.
Chord signal consisting of 3 tones, time domain representation

In this case it is not obvious what the three frequencies of the tones in the chord are, and one would have to resort to frequency domain techniques to determine those frequencies (see next section). A speech signal like bernie22k.wav is best recognized in the time domain when you zoom out and look at several seconds (instead of milliseconds) of the signal. In this way you can easily see the pauses between words and syllables that are characteristic for speech, as shown in the next plot.

Speech signal, time domain representation

The music signal in no5122k.wav also shows changes in intensity over the span of several seconds, but it is quite uncommon in a music piece to have several pauses of almost complete silence in the time span of a few seconds. A time domain plot of the music signal in no5122k.wav is shown in the figure below.

Music signal, time domain representation

Sounds in the Frequency Domain

In the frequency domain it is of interest to measure the power or intensity of a signal along the vertical axis versus the frequency along the horizontal axis. The power along the vertical axis is often measured in decibel (dB), which is porpotional to the logarithm of the power. One reason why one does this is because the human ear also has an approximately logarithmic characteristic with respect to how it perceives the loudness of sounds. The graph of power versus frequency is also called the spectrum of a signal or waveform. For a pure sinusoid only one tone is present and, ideally, the spectrum of a sinusoid is just a single spectral line at the frequency of the sinusoid. In practice this spectral line widens a bit at the bottom, as shown for the signal sin900.wav in the following graph:

900 Hz sine signal, frequency domain representation

If you look at the spectrum of the signal in chord.wav, then you can see quite clearly that it is made up of 3 frequency components, as shown in the following graph:

Chord signal consisting of 3 tones, frequency domain representation

A speech signal like bernie22k.wav generally consists of a more continuous spectrum, with the main power contained in the frequency range from about 500 to 2500 Hz, as shown in the next plot.

Speech signal, frequency domain representation

The music signal in no5122k.wav also has a more continuous spectrum. The main difference compared to the speech signal are the low frequency components that now extend with fairly high power down to about 50 Hz.

Music signal, frequency domain representation

Digital Representation of Sounds

How does a computer "hear" sound? Sound as the human ear hears it consists of waves of changing air pressure which vary continuously (i.e., analog) in both amplitude (or loudness) and time. Computers can only store digital information at discrete time instants. The soundcard in a computer converts analog sound into digital sound by sampling the analog sound waveform at discrete time instants and by quantizing the amplitude of each sample to a finite number of bits. The following graph shows a short piece of an analog sound signal s(t).

Analog sound signal s(t)

To sample this signal with a sampling rate or sampling frequency Fs in Hz, the time axis (horizontal axis) is subdivided into sampling time intervals Ts = 1/Fs. In this example Fs = 8000 Hz and thus Ts = 1/8000 sec or 0.125 ms. The graph below shows the original analog signal s(t) in blue and the sampled signal in the form of a stem plot in red. Note that only time has been discretized yet, amplitude is still continuous or analog.

Sampled sound signal s(n*Ts), discrete time, continuous amplitude

The next step is to quantize the amplitude of each sample to a finite number of bits. For reasonable sound quality at least 8 bits per sample are needed, and for good sound quality 16 or more bits are used. The next graph shows quantization to 4 bits (or 16 levels). To achieve this, the vertical axis is subdivided into 16 intervals of equal height. Each of the "bins" obtained in this way has a unique binary number associated with it. Any sound amplitude value that falls within a particular bin is assigned the binary representation of this bin. In the graph below the bins were numbered (in binary) starting from 0000 at the bottom to 1111 at the top. The binary representation of the digital sound signal shown below is

   0101,0100,0101,0111,1010,1100,1101,1101,1100,1011,1011,
   1011,1001,0101,0010,0000,0010,0100,0110,0111,0110
Digital sound signal sq(n*Ts), discrete time and amplitude

When sound is played through a sound card, it is converted back to an analog signal as shown below (bold cyan curve). But note that the quantization introduces irreversible errors (the original sound waveform is shown dashed in blue). However, by choosing enough bits per sample, this error can be made as small as desired, so that it is not perceptible by the human ear.

Interpolation of digital sound signal sq(n*Ts), resulting in sq(t)

Typical sampling rates and quantizations are

  1. 8000 samples/sec, 8 bits per sample, mono, for speech signals
  2. 11025 samples/sec, 8 bits per sample, mono, for low quality music signals or higher quality speech signals
  3. 22050 samples/sec, 8 bits per sample, mono, for medium quality music signals
  4. 22050 samples/sec, 16 bits per sample, mono, for medium quality music signals with improved dynamic range
  5. 44100 samples/sec, 16 bits per sample, stereo, for Hi-Fi music signals (CD)
  6. 48000 samples/sec, 16 bits per sample, stereo, for Hi-Fi music signals (DVD)

There are quite a few different file formats for audio signals. In this lab you will only use uncompressed .wav ("wave", native Windows format) files. The GoldWave sound editor can be used to convert sound files to and from many other formats.


Sound Editor

If you have not already done so, download the GoldWave sound editor. Run gwave426.exe to extract the program files and then run GoldWave.exe from the directory to which the files were installed. Once GoldWave is running and you have opened a .wav file, e.g., bernie22k.wav, you should see a screen similar to the following:

GoldWave Sound Editor with a .wav file loaded

Use the help command in GoldWave to familiarize yourself with the different commands and buttons. The most important features of the sound editor are:

To look at sounds in the time domain, just load the corresponding sound file into GoldWave. Under "View" zoom in or out to either display small or large portions of the sound file. The graph which you see in the main window of GoldWave is sound intensity or amplitude (on the vertical axis) versus time (on the horizontal axis). Such graphs are called waveform graphs in the time domain (as opposed to the frequency domain).


Spectrogram Analyzer

The Spectrogram analyzer displays time and frequency domain data versus each other. The horizontal axis is the time axis and the vertical axis is the frequency axis. This type of display enables you to see which frequency components are present during which time intervals. The MP3 (MPEG, layer 3) audio standard, for example, uses a similar analysis technique to compress audio files.

If you have not already done this, download the Spectrogram analyzer. Unpack the file gram412.7z into a directory of your choice (using the 7-Zip program). Then run Gram.exe from the directory into which you unzipped the files. When Spectrogram is running and you click on "File" and then on "Analyze File" and select a filename, e.g., bernie22k.wav, you will get a screen similar to the following where you can specifiy a whole bunch of parameters.

Specification of Parameters in Spectrogram Analyzer

Start out by using the default values and look at the outcome. Then go back, click on "File", "Parameters" and and "Change" to adjust one or more of the parameters encircled in the above figure. If the displayed spectrogram doesn't cover the whole screen from left to right, then decrease the Time Scale value (marked with yellow in the above figure), e.g., by going from a value of 12 msec to 6 msec for the file bernie22k.wav. Conversely, if the signal is too long from left to right to fit on the screen, increase this value. If you need more resolution in the vertical direction (which represents frequency), then change the frequency resolution (marked with red in the above figure) to a smaller value. The resulting display for the file bernie22k.gif (with time scale set to 6 msec, and frequency resolution set to 43.1 Hz) looks like this:

Spectrogram Analyzer with a .wav file loaded

To display markers along the horizontal time axis and the vertical frequency axis, click on "Toggle Grid" in the lower right corner of the Spectrogram display. You will then get a screen similar to the following:

Spectrogram Analyzer with a .wav file loaded and the Grid activated

If you click on "Play Wdw" at the bottom right of the Spectrogram display, then the sound that is currently displayed in the window is played and you see a cursor moving across the spectrogram in synchronism with what you hear.


Sounds on Your Home Page

Suppose now that you have created some absolutely incredible sound effects that you would like to share with the rest of the world. Two steps are needed to do this:

  1. Upload your sound file (e.g., with a .wav extension) to the www directory on your WebFiles account.
  2. Put an anchor (that is an <a> tag) with a reference to your soundfile and a linktext in the place where you want users to click to listen to your sound file. Examples of this anchor tag are shown in the table below (note that the <ol> tag is used to create an ordered list).
HTML Element Resulting Display
<a href="sound_address">linktext</a>,

where sound_address can be a filename (e.g., a .wav file, or a fully qualified URL for a sound file on another WWW server.
linktext,

where sound_address can be a filename (e.g., a .wav file, or a fully qualified URL for a sound file on another WWW server.
<p>Hear the difference:</p>
<ol>
  <li><a href="rit44_16.wav">
  16 bits/sample, stereo, 44100
  samples/sec</a> (1843 kB).</li>
  <li><a href="rit8_8.wav">
  8 bits/sample, mono, 8000
  samples/sec</a> (84 kB).</li>
</ol>

Hear the difference:

  1. 16 bits/sample, stereo, 44100 samples/sec (1843 kB).
  2. 8 bits/sample, mono, 8000 samples/sec (84 kB).

Here is an example of a home page with sound: The Amelia Earhart story.


A "Sound" Story

The .wav file bmaLelttiL.wav contains the first recording of human voice in history, but to hide it a little, it was recorded backwards. You have to start thinking about your final project, and to get you started with practicing to collect materials about a topic and then to make it into a report, you will have to write "A Short Story about the First Recording of Human Voice" and then post it as a Web page. As an example of the scope and the organization of such a story, look at the Amelia Earhart story about the the first woman who crossed the Atlantic solo in a plane. Your story will need to have the following elements as a minimum:

  1. One paragraph of writing about who made the first recording of human voice, and when and how it was made.
  2. One image that has some connection with the topic of the first recording of human voice.
  3. The sound file bmaLelttiL.wav, but played forward and not backward (in the form of a link on which you can click, and then the .wav file is delivered from your www directory).

Click here for a list of search engines on the WWW that you may want to use to find suitable materials.


Questions you Have to Answer

To obtain credit for this homework/computer lab you need to answer the questions stated below. Send your solution to no-spam e-mail.

Note: If you have trouble downloading the individual .wav files, try downloading the zipped archive that contains all the .wav files you need for this assignment.

Rules for the Submission of Homework/Computerlab Solutions

Format: E-mail your solution as a plain text (ASCII) file. Do not use word processor files like Microsoft Word or "rich-text" HTML files.

Corrections: If you need to make corrections after you submitted your solution, resubmit all your answers (not only the ones that changed) since only your last submission of each homework/computer lab will be graded.

Teamwork: Teamwork is fine for the homework/computer labs, but the solutions must be turned in individually. In particular, copy and paste of entire solutions from other students is not acceptable.

Questions:

  1. The .wav files contain different kinds of test signals that can be used, for example, to test a HiFi system. Listen to each of the signals and describe what you hear (e.g., a single tone, noise, a distorted tone, etc.). Then open them in Goldwave and describe what these test signals look like in the time domain, e.g., do they have a constant amplitude, what does the waveform look like (sinusoid, rectangular, triangular, sawtooth, irregular). Note: You will have to look at an enlarged portion (approximately 10 to 20 milliseconds) of these signals to see the detailed waveforms. One of the signals is a sinusoid. Determine its frequency. Next, open each of the .wav files in the spectrogram analyzer. Which of the three signals uses the smallest portion of the frequency spectrum? Which uses the largest one?
  2. Download the .wav files One is a music signal and one is a speech signal. Listen to each file and open each of them in the GoldWave editor. How would you characterize the difference between a speech and a music signal in the time domain plots that you see in GoldWave?
  3. The sound file in sample6.wav is a simple tune that you will easily recognize. Play it back at normal speed first. Then, under "Effects" and "Pitch" change from a scale of 1.000 to 2.000 and save the resulting sound as sample6fast.wav. Listen to this .wav file. Then open both, sample6.wav and sample6fast.wav in the spectrogram analyzer (open up two copies of the program so that you can see the results side by side). How have the frequencies of the individual tones changed from the original .wav file to the speeded up version? Hint: Compare the frequencies of the two highest tones in the tune.
  4. Listen to the .wav files Have you ever heard such signals before? Describe what they sound like and identify them. Hint: Among the four signals is one AM (amplitude modulation) and one FM (frequency modulation) signal. The other two you should recognize from having used them in your daily lives.
  5. The following .wav files were recorded from a piano Listen to each signal and order them from lowest to highest tone. Then determine the fundamental frequency of each tone, e.g. by counting the number of periods in 10 ms, or by using the cursor in the spectrogram analyzer.
  6. Listen to the .wav files in One of these signals has two distinct frequency components, one has three distinct frequency components, and one has five distinct frequency components. Can you determine which is which by listening to the signals? If not, how else are you going to find out?
  7. The .wav file in contains a speech signal, but it is buried underneath a strong "police siren" sound. Use a filter (under "Effects" in Goldwave) to remove the disturbing siren sound. Then maximize the volume to hear the speech signal. Write down the text of the speech signal in your solution. Hints: Speech signals have most of their energy concentrated in the range from 700 to 2000 Hz. Thus, if the interfering siren sound does not fall into this range, it can be filtered out. Use the Spectrogram analyzer to find out which frequency band the siren uses. Then use the "Filter" command (under "Effects" in GoldWave) to pass only those frequencies that are not used by the siren. A filter that passes all frequencies below a cutoff frequency is called a lowpass filter (LPF). Conversely, a filter that passes all frequencies above a cutoff frequency is called a highpass filter (HPF). A filter that passes all frequencies between a lower and an upper cutoff frequency is called a bandpass filter (BPF).
  8. The following graph shows a short piece of a music signal. Sample it with a sampling rate of 8000 Hz (==> time between samples is 1/8000 sec or 0.125 ms), and quantize each sample to 4 bits. In your solution, return the string of bits that corresponds to the digital representation of this sound signal. Hint: The binary string starts as 0110,0101,0010,....

    Analog Sound Signal s(t)

  9. Write a "sound" story as outlined in the description above and post it as a Web page on the WWW. Make sure your page can be accessed by a browser and then send in the URL of where it can be found. As a reminder, this Web page must contain at least one image, one paragraph of writing, and a link on which you can click and listen to the sound file of the first recording of human voice in history (after reversing the given .wav file bmaLelttiL.wav).
  10. Information about the Group Project. As you know, one of the requirements of this course is the completion of a group project.
    The group Project is due Friday, Dec. 8, 2006, 5 pm. Work in groups of 2-3 students on a topic in telecommunications and/or data networking and publish the result as one or more Web pages. For each member of the group you must clearly state the contribution of that member to the project (e.g., student A wrote the introduction and part 2, student B wrote parts 1, 3, and the conclusion, or student A was responsible for the design of the Web page and student B was responsible for the content, etc).
    As part of the solution to this computer lab, you need to form a group with other students from the class and list the names of the students (including your own) in this group. You also need to give a (tentative) title and short outline of the project that you and your group are planning to work on. For a list of students who are looking for partners, see the Teaming Up page. To get onto (or off) the list, send e-mail to no-spam e-mail.