Welcome to UnitedMasters. This challenge helps us assess engineering expertise and creative thinking, while enabling you to get a better understanding of the music domain. We also think this challenge is a just a fun exercise for anyone that loves to write code. Feel free to ask questions or get clarification on anything.
The dataset for this challenge, dataset.tar, is a archive containing three gzipped JSON Lines files:
sc_tracks.json.gz: contains ~6000 Soundcloud track objects from @corpus, our internal datastore. The track object mirrors the [Soundcloud Track API] (https://developers.soundcloud.com/docs/api/reference#tracks) with UnitedMaster specific fields denoted by a leading "_" character in the field name.
track_ratings.json.gz: contains human curated quality ratings for the tracks specifed in sc_tracks.json.gz. Ratings range from -1 (spam, with the spamtype field providing further classification) to 5 (this could be the next Prince).
test_sc_tracks.json.gz: contains 500 Soundcloud track objects that have not been rated.
Build a spam detection mechanism for the unrated tracks in test_sc_tracks.json.gz using your toolchain of choice. Measure the effectiveness of your approach against the rated tracks. Show your code. Be prepared to provide suggestions on ways that your approach could be further improved. Most of all, have fun!