Training Data
Data used to train a machine learning model to recognize patterns, make predictions, or generate outputs.
Also known as: training corpus, training set, labeled data
Training data is the dataset used to teach a machine learning model. The model learns patterns, relationships, and representations from examples in the training data, which it then applies to new inputs during inference.
Quality, quantity, and domain specificity of training data are primary determinants of model performance. In the AI marketplace, training data is one of the most commercially valuable asset types -- particularly labeled, domain-specific, or hard-to-obtain datasets that give model builders a competitive edge.