AI Glossary: What Is Holdout Set? Definition & Meaning

A ホールドアウトセット is a subset of your data that is set aside and not used during the training of a 機械学習 model. This separation ensures that the model’s performance can be evaluated on unseen data, providing a more accurate assessment of its generalization 能力。

In typical machine learning workflows, data is often divided into at least two main parts: the training set and the holdout set (also known as the test set). The training set is used to train the model, while the holdout set is reserved exclusively for testing the model’s performance after training has been completed. This helps to mitigate issues like overfitting, where a model performs exceptionally well on 訓練データしかし、新しい未見のデータではパフォーマンスが低い。

ホールドアウトセットを作成するために、データサイエンティストは通常、ランダムサンプリング or stratified sampling to ensure that both the training and holdout sets are representative of the overall dataset. The size of the holdout set may vary depending on the total amount of data available, but common practices recommend using around 20-30% of the data as the holdout set.

Evaluating a model against the holdout set allows practitioners to calculate various 性能指標, such as accuracy, precision, recall, and F1 score, providing insights into how well the model might perform in real-world applications.