Thumbnail article
Visualize audio using python into chromagram
According to Hassan Ezzaidi, chromagram is defined as the whole spectral audio information mapped into one octave and each octave is divided into 12 bins representing each one semitone. Visualization into chromagram can be done easily using python library called librosa. Librosa usually used for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. The code below is used to install the librosa library. We also need to install numpy because there is some calculation.
pip install librosa
pip install numpy
The code below is used to display audio into chromagram. We are using .au file. You can use another audio format too.
import librosa
import librosa.display
import numpy as np
y, sr = librosa.load('')
D = np.abs(librosa.stft(y))
c = librosa.feature.chroma_stft(S=D, sr=sr)
librosa.display.specshow(c, y_axis='chroma', x_axis='time')
First we need to import the library. The third line is used to read the audio. It is returning audio signal and sample rate value. librosa.stft(y) returns an array of complex numbers, as one would expect from a Discrete Fourier Transform (DFT). These complex numbers give us phase and amplitude of the audio signal. librosa.feature.chroma_stft returns a chroma spectrogram. It has shape (12, n_frames). 12 for each of the 12 semitones in an octave C,C#,D…, B. And the last we plot the value and sample rate by using librosa.display.specshow. y_axis and x_axis received a several parameter. You can see the accepted parameter in the list below.Categorical types:
  • ‘chroma’ : pitches are determined by the chroma filters. Pitch classes are arranged at integer locations (0-11) according to a given key.
  • chroma_h, chroma_c: pitches are determined by chroma filters, and labeled as svara in the Hindustani (chroma_h) or Carnatic (chroma_c) according to a given thaat (Hindustani) or melakarta raga (Carnatic).
  • ‘tonnetz’ : axes are labeled by Tonnetz dimensions (0-5)
  • ‘frames’ : markers are shown as frame counts
Time types:
  • ‘time’ : markers are shown as milliseconds, seconds, minutes, or hours. Values are plotted in units of seconds.
  • ‘s’ : markers are shown as seconds.
  • ‘ms’ : markers are shown as milliseconds.
  • ‘lag’ : like time, but past the halfway point counts as negative values.
  • ‘lag_s’ : same as lag, but in seconds.
  • ‘lag_ms’ : same as lag, but in milliseconds. is the link of the audio visualization code which written in python. highly recommend you to run the program in jupyter notebook. You can install jupyter notebook by following the instruction in the link below. can also see another audio visualization in the link below.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *