Experiment 1
Binary classification with VGGish model architecture. Logs and metrics have been implemented with tensorboard and pytorch.
Parameters
basic setting parameters
clip_length: 1.0 # [sec]
preprocessing parameters
sample_rate: 22050
hop_length: 512
n_fft: 1024
n_mels: 64
Training parameters
number of audio samples: 22050
learning rate: 0.001
batch size: 10
number of epochs: 10
number of samples: 60
balanced dataset: True
random clip cut: False
Classes
Labels
grime
jazz
Class distribution
Class | Class ID | Samples |
---|---|---|
grime | 0 | 30 |
jazz | 1 | 30 |
Results
Logs
...
Epoch: 10 [30/60 (50%)] Loss: 0.591532 Accuracy: 73.33333333333333%
Epoch: 10 [60/60 (100%)] Loss: 0.585686 Accuracy: 70.0%
[Class: grime] accuracy: 86.7 %
[Class: jazz] accuracy: 56.7 %
Epoch: [10/10] Loss: 0.588609 Accuracy: 71.66666666666667%
[[=============================================================================================]]
Metrics
Cross entropy
data:image/s3,"s3://crabby-images/c22e2/c22e25f28291f0599aba74986482ff454422d4cf" alt=""
Accuracy
data:image/s3,"s3://crabby-images/76db1/76db194e5e79b139bca3389fac122096f68566e0" alt=""
Loss/Accuracy per batch
data:image/s3,"s3://crabby-images/7b73e/7b73ec7a7f86a67a8ffefcbee0ecab7e56e8154e" alt=""
Accuracy per class
data:image/s3,"s3://crabby-images/e5723/e57239ce46e3c9529c7aaaf81b68704fd47a687d" alt=""
Conv4 Layer
data:image/s3,"s3://crabby-images/f6557/f6557d676187f8cb6d6187bcc07b85d4cbb80915" alt=""
Linear layer
data:image/s3,"s3://crabby-images/c40ca/c40ca0ced2ab8dd5d9e7df7ef175ebf04f46ee23" alt=""
Confusion Matrix
data:image/s3,"s3://crabby-images/9c1ee/9c1eebf66f9190aab9d09ea084a7d216f6273677" alt=""