Real-world stories for the training datasets crafters

With the recent breakthrough in Deep Learning, we are entering a new era in which artificial intelligence has the potential to transform whole industries. By providing ready-to-use ML frameworks in the cloud tech giants are accelerating this trend, making algorithms and computing a commodity.

Yet, one thing remains rare and hard to create: tailored training data for your specific problem. Public datasets exist, but they are too generic: they won't help you train a bone metastasis classifier, or detect defects on an automotive part conveyor.

And even if you have access to specialized raw data you are still far from having an industry-grade training and validation dataset. You will certainly have to crowdsource tasks, cross-validate annotators work, fine-tune labels...

Fortunately, you are not alone. The mission of the Ground Truth blog is to help training datasets crafters with real world stories, tips and best practice from industry use cases, by people like you in other companies or academics.

Reading list.

A selection of interesting ressources regarding training datasets construction to keep you warm :