Data exploration
The length of each clip is 5 seconds long.
Raw audio spectrogram
This clip starts from the beginning with almost no sound at all. Sound file below for cross reference.
To avoid empty frequencies, and to feed our model effectively, a random function has been implemented to cut the track at the desired length with a specific offset, to avoid start from 0 frame.
Code implementation
def _random_cut(self, signal):
if signal.shape[1] > self.num_samples:
crop_size = signal.shape[1] - self.num_samples
offset = np.random.randint(self.num_samples, crop_size - self.num_samples)
signal = signal[:, offset: offset + self.num_samples]
return signal