Data exploration

The length of each clip is 5 seconds long.

Raw audio spectrogram

This clip starts from the beginning with almost no sound at all. Sound file below for cross reference.

To avoid empty frequencies, and to feed our model effectively, a random function has been implemented to cut the track at the desired length with a specific offset, to avoid start from 0 frame.

Code implementation

 def _random_cut(self, signal):
        if signal.shape[1] > self.num_samples:
            crop_size = signal.shape[1] - self.num_samples
            offset = np.random.randint(self.num_samples, crop_size - self.num_samples)
            signal = signal[:, offset: offset + self.num_samples]
        return signal

Descriptive Statistics Feature Extraction