Audio Cheatsheet

 

Here is a couple of things to remember about dealing with audio files in Python

Loading Audios

Scipy

There are multiple packages where one can use to load audio files. With respect to .wav files, the immediate one that people consider using is from scipy.io. More specifically, one may use the following template to load an audio data along with its sampling rate:

import scipy.io.wavfile as wavfile
fs,audio = wavfile.read("audio1.wav")

Then if one plots it, one will see the following

However, if there’s anything that I could tell someone is to NOT use Scipy’s Wavfile module to read in .wav files, because it imports data as integers. This is a problem because we may not know exactly the bit resolution and thus we don’t know how to properly scale it. Usually, it’s best for us to keep the values between -1 and 1.

Soundfile

Alternatively, one may use one of the following

If one uses soundfile, one may use the following template:

import soundfile as sf
audio,fs = sf.read("audio1.wav")

And if one plots it, one will observe the following plot

One can clearly see that the amplitude values range between -1 and 1, where as for Scipy, it tries to keep the values as integers. This instead then comes properly scaled.

Librosa

As a third option, and what I actually prefer, is to use librosa. Below is a template to load audio files.

import librosa
audio,fs = librosa.load("audio1.wav",sr=None)

And if one is to plot the result, one would see the following plot:

I actually prefer librosa because the folks who built this library wrote it to be adaptable so that it takes almost every audio file type. For example, it even works with .mp3 files.

audio, fs = librosa.load("audio2.mp3",sr=None)

Note that I put the sampling rate parameter sr as None. The reason for that is that librosa was built to work super well with its entire ecosystem, and it forces the sampling rate to be 22050 Hz by default. In order to load the audio to be its original value, one needs to pass the None value to the sampling rate parameter.