SLabView Introduction
SLabView Introduction
Dave Davies, 2013

SLabView is a standalone Java application designed for visual exploration of speech signals. It uses the same approach for analysing the signal as its sister package SLab. It displays the raw speech signal and can also display a frequency domain spectrum of chunks of speech along with a wide array of parameters that have been used in speech recognition systems.
Description


Some of SLabView's features are:

• Signal view for viewing the time domain speech signal
• Frequency (spectral) view for viewing speech spectra
• Phonemic labelling for ANDOSL speech files
• Source Synchronous or fixed length signal framing
• Overlayed signal parameter plots (over 24 parameters)
• Point and click selection of frames for specrum viewing
• Drag selection of multiple frames for spectal animation
• Formant markers in the frequency view
• Saving text files of spectral data (disabled)
• Saving GIF files of the speech signal and spectra (local operation only)
• Saving Quicktime movies of spectral sequences (disabled)

Signal View 1 is an example of the SLabView Signal View showing a full speech utterance in low resolution

Signal View 2 is an example of the SLabView Signal View showing a segment of a speech utterance in high resolution

Frame Frequencies 3 is an example of the SLabView Spectrum for the highlighted frame in Signal View 2



An Unnecessary Distortion

Signal View 2 above can be seen as four repeated epochs marked by the vertical green lines (with the second episode highlighted). Each of these represents a single vibration of the glotis - the flaps of tissue in our throat that vibrate to produce voiced speech. The brain doesn't have a clock measuring absolute time at a millisecond scale. It operates with an episodic view of time and it readily recognises each glottal epoch as a distinct episode defining a time frame.

There is an error made at the very start of conventional speech analysis in the initial signal processing step - breaking an utterance into short frames for spectral analysis. Rather than using the natural framing, the standard approach is to chunk the signal into fixed length frames - typically ten milliseconds long as shown by the vertical purple lines in Signal View 2.
Doing this intruduces a distortion in the resulting spectrum that is illustrated in the two following images.

(4a) Spectrum Produced From Source Synchronous Framing

(4b) Spectrum Produced From Fixed Length Framing


The first is produced using glottal, or Source Synchronous, framing - the green dividing lines in View 2. It is smooth and regular from frame to frame. The second diagram shows the result of fixed length framing - the red dividions. It has a serrated artifact superimposed and varies more between frames that the synchronous results do. 

The reason for the serration is theoretically trivial - the signal in the fixed frames (between purple lines in View 2) doesn't start and finish near zero values. Artificially forcing the ends to zero (as is usually done) helps a little but the distortion and irregularity persist. All subsequent analysis is unnecessarily compromised. Since the length of glottal epochs varies, fixed frames can never consistently capture a single epoch.

The SLab package uses Source Synchronous framing that allows it to easily determine the position of peaks in the spectrum. It can also treat each frame as a basic unit, rather than having to add a smoothing step to even out the artificial irregularities between adjacent frames. Timing is a crucial element in many of the subtle cues that help us recognise speech. Having a good grasp of timing is bound to be an important element of automating the process.

I was surprised to find that many ASR researchers assumed that 4b - all they had ever seen - was the real speech spectrum and insisted that I had just smoothed it, losing information in the process. The truth is easily demonstrated with SlabView. Changing the fixed, and arbitrary, sampling length used in 4b changes the separation of the narrow artifacts - just as signal theory says it should.



Download SLabView source code and Mac Application here and change the SLabView.html file as instructed to point to your user directory.