What is an audio spectrogram?


An audio spectrogram is a visual representation of sound. Spectrogram, in general, is a time-varying spectral representation which shows the variation of spectral density of a signal with respect to time. Spectrograms (or voicegrams, sonograms or spectral waterfalls) typically identify phonetic sounds. In addition, people use them for speech processing, sonar, seismology, etc


Creation methods


They can be created using two methods, using a series of bandpass filters to form the approximated filter bank and short-time Fourier transform (STFT) calculated from the time signal. This thesis concentrates on using the short-time Fourier transform (STFT) to obtain the audio spectrogram. Moreover, the objective of this research is the real-time implementation of the real-time spectrogram of an audio signal on a video monitor using Xilinx Virtex-5 ML506 Evaluation Board. The Xilinx ML506 Virtex-5 Evaluation Board has powerful audio and video capabilities. FPGA processes the input audio signal to calculate the STFT of the signal. Once the STFT is calculated, it must be converted into a form suitable for the video monitor display. This research has several applications in scientific and commercial devices.


Sound restoration


Sound engineers often use audio spectrograms in the sound restoration process. The key to successful audio restoration lies in your ability to correctly analyze the situation. Much like a doctor recognizing symptoms that point to a certain illness. Fortunately, spectrogram technology makes this task easier by providing a visual representation of audio. The aim of any good visualization tool for audio repair and restoration is to provide you with more information about an audible problem. This not only helps inform your editing decisions. In the case of a spectrogram display, it can provide new, exciting ways to edit audio. You can use it in tandem with a waveform display.


Sound analyzing


In other words, we could describe the spectrogram as a very sophisticated audio analyzer. A spectrogram is a very detailed, accurate image of your audio, displayed in either 2D or 3D. A graph shows the audio according to time and frequency, with brightness or height (3D) indicating amplitude. Whereas a waveform shows how your signal’s amplitude changes over time, the spectrogram shows this change for every frequency component in the signal. If you often use the waveform display, it may take a while to get your head around this unique way to “see” the audio.


FFT algorithm


Not all spectrograms are created equal. An algorithm “Fast Fourier Transform,” or FFT for short,computes this visual display. Many products that feature a spectrogram display allow you to adjust the size of the FFT, but what does this mean for audio repair and restoration? Changing the FFT size will change the way the algorithm computes the spectrogram, causing it to look different. So, depending on the type of audio you’re working with and visualizing, this may help.

As a rule, higher FFT sizes give you more detail in frequencies (frequency resolution). On the other hand, lower FFT sizes give you more detail in time (time resolution). If you’re trying to identify a plosive, mic handling noise, or other muddy low-frequency information, a higher FFT size in your spectrogram settings will help. If you’re trying to identify a high-frequency event, or working with a transient signal (such as a percussion or drum loop), choose a lower FFT size.


Source text