In the process of automatically annotating songs with descriptive labels, multiple types of input information can be used. These include keyword appearances in web documents, acoustic features of the song's audio content, and similarity with other tagged songs. Given these individual data sources, we explore the question of how to aggregate them. We find that fixed-combination approaches like sum and max perform well but that trained linear regression models work better. Retrieval performance improves with more data sources. On the other hand, for large numbers of training songs, Bayesian hierarchical models that aim to share information across individual tag regressions offer no advantage.
Manuscript
Six-page ISMIR paper (PDF, TeX source)
Data
Acoustic_All3andP.tab_.txt
Here are the per-tag results files for Genre and Acoustic tags for each of four model types:
Code
The code is on GitHub.
Alternatively, here's a zip file containing the Python code used to combine data sources. This won't run out of the box, because the folder doesn't include the requisite data-source files; however, it at least shows the exact algorithms and functions used. These are the individual files: