Research
Audio Analysis

Data exploration

The length of each clip is 5 seconds long.

Raw audio spectrogram


This clip starts from the beginning with almost no sound at all. Sound file below for cross reference.



To avoid empty frequencies, and to feed our model effectively, a random function has been implemented to cut the track at the desired length with a specific offset, to avoid start from 0 frame.



Code implementation

 def _random_cut(self, signal):
        if signal.shape[1] > self.num_samples:
            crop_size = signal.shape[1] - self.num_samples
            offset = np.random.randint(self.num_samples, crop_size - self.num_samples)
            signal = signal[:, offset: offset + self.num_samples]
        return signal