testing

Can't trust the feeling? How open data reveals unexpected behavior of high-level music descriptors

When run 'in the wild' by the community, high-level music descriptors may not perform the way they did in the lab.

Validation & Validity in Data Science

To what extent can we trust our automated data processing pipelines?

Oracle Issues in Machine Learning and Where to Find Them

To what extent can we trust 'ground truth' in supervised machine learning to be a reliable oracle?