Research
Contribution

Contribution

As the survey showed, users are not satisfied with the classification of music by genre. The fact that even the services leave the possibility of assigning meaning to the tag interface free for users, confirms how the musical genre is slowly starting to disappear as the dominant term of interpretation and classification in music. It confirms the trend, and especially in Gen-Z, that user-listeners are beginning to interpret music in sensory ways and to assign meaning such that it projects a state of mind.

In the phase of analyzing the genealogy of the musical genus, the research identified examples where even technological AI software has established new musical genera as sub-genera. These sub-genera come to cover mental state characteristics while acting as extensions to existing traditional musical genera. In addition, these sub-genres carry features of seasonality and localization, such as for example LoFi House which was created by YouTube's DL algorithm as a playlist. Hence, this artificially created sub-genus can on the one hand describe aesthetic features in music and on the other hand increase the semantics of the main genus from which it originated. Therefore, these sub-genres function as a miniature of the term vibe that this study investigates.

The theoretical framing of the study, extended to the way we perceive the sound experience in sensory terms. It analysed the concept of sound as vibration, and how it interacts with our body either through hearing or through the environment as the dominant mediator of the sound event. There was extensive analysis around the formation of human auditory perception, and the approaches by which we distinguish features in a sound experience or musical event. This work even extended to the effect of sound vibrations through the frequency spectrum that is not perceptible to human hearing. In addition, it examined applications of sound signals used as crowd control and crowd suppression tools.

This research confirmed that whether we refer to music or a sound event, our interpretation of a sound experience is multifactorial. Factors such as memory referring to familiarity, space as a modulator of the sound signal, place as a geographically defined memory, and content which we evaluate through the dual listening approach (compositional and sensory) come together to compose the overall aesthetics behind a sound event. Hence, the features that shape the aesthetic experience in a sound event are not only musical.

In the following, this study implements a machine learning model to predict musical genre. This implementation is a small part of the larger problem which is the generation of vibe captions. After a thorough research around implementations of DL models using digital audio signals, it was found how there are many examples that can detect human speech. As, there are many speech recognition applications today of which we currently use on a daily basis. It was also found that there are several researches on the part of sound recognition of sounds within the urban landscape or environment. These researches were aimed at implementing models that can project events through sound events for safety in workplaces. In music it was also confirmed that there are researches dealing with information extraction through music (MIR) and classification in music using DL models.

This research through the implementation of ACLF has demonstrated how the prediction of a musical genus using DL algorithms and CNN is feasible and with very high accuracy. The DL model created from this study can distinguish the musical genre jazz and grime with an accuracy of 95%. Which confirms the value of continuing this research in order to reach the big goal set, the real-time VC production.