Experiment 2
Binary classification with VGGish model architecture. Logs and metrics have been implemented with tensorboard and pytorch.
Parameters
basic setting parameters
clip_length: 1.0 # [sec]
preprocessing parameters
sample_rate: 22050
hop_length: 512
n_fft: 1024
n_mels: 64
Training parameters
number of audio samples: 22050
learning rate: 0.001
batch size: 20
number of epochs: 30
number of samples: 60
balanced dataset: True
random clip cut: False
Classes
Labels
grime
jazz
Class distribution
Class | Class ID | Samples |
---|---|---|
grime | 0 | 30 |
jazz | 1 | 30 |
Results
Logs
...
Epoch: [30/30] started...
Num samples: 22050
Epoch: 30 [30/60 (50%)] Loss: 0.407079 Accuracy: 90.0%
Epoch: 30 [60/60 (100%)] Loss: 0.719881 Accuracy: 56.666666666666664%
[Class: grime] accuracy: 76.7 %
[Class: jazz ] accuracy: 70.0 %
Epoch: [30/30] Loss: 0.563480 Accuracy: 73.33333333333333%
[[=============================================================================================]]
Training is done!
Metrics
Cross entropy

Accuracy

Loss/Accuracy per batch

Accuracy per class

Conv4 Layer

Conv4 parameters
self.conv4 = nn.Sequential(
nn.Conv2d(
in_channels=64,
out_channels=128,
kernel_size=3,
stride=1,
padding=2
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
Linear layer

Confusion Matrix
