Data exploration
The length of each clip is 5 seconds long.
Raw audio spectrogram
data:image/s3,"s3://crabby-images/71c05/71c055733fbd2fe353b725750f1bdec162b7cdab" alt=""
This clip starts from the beginning with almost no sound at all. Sound file below for cross reference.
data:image/s3,"s3://crabby-images/4171e/4171ee05df015f46adad2c3a48329b1c6e321a30" alt=""
To avoid empty frequencies, and to feed our model effectively, a random function has been implemented to cut the track at the desired length with a specific offset, to avoid start from 0 frame.
Code implementation
def _random_cut(self, signal):
if signal.shape[1] > self.num_samples:
crop_size = signal.shape[1] - self.num_samples
offset = np.random.randint(self.num_samples, crop_size - self.num_samples)
signal = signal[:, offset: offset + self.num_samples]
return signal