Machine Learning Challenges: Bias, Variance, Overfitting & Model Evaluation

100+ MCQs on Machine Learning Challenges and Evaluation Techniques


Q. 1. What is prediction error in machine learning?
A. The number of parameters in a model
B. The difference between expected and actual outcomes
C. The number of iterations required to train a model
D. The training time of the model
Answer: B. The difference between expected and actual outcomes


Q. 2. Which of the following is not part of the reducible error?
A. Bias
B. Variance
C. Bayes Error
D. Model Mis-specification
Answer: C. Bayes Error


Q. 3. What type of error is considered irreducible in any model?
A. Variance
B. Bias
C. Bayes Error
D. Overfitting
Answer: C. Bayes Error


Q. 4. What does high bias in a model indicate?
A. The model is too flexible
B. The model is capturing noise
C. The model is systematically wrong
D. The model varies too much with data
Answer: C. The model is systematically wrong


Q. 5. Variance error refers to:
A. The average error over all predictions
B. The error due to model inflexibility
C. The sensitivity to small changes in training data
D. The mislabeling of classes
Answer: C. The sensitivity to small changes in training data


Q. 6. Which scenario indicates overfitting?
A. High training and test accuracy
B. Low training accuracy and low test accuracy
C. High training accuracy and low test accuracy
D. Equal training and test accuracy
Answer: C. High training accuracy and low test accuracy


Q. 7. Underfitting occurs when the model:
A. Learns noise instead of pattern
B. Is too complex
C. Has high variance
D. Fails to learn the underlying pattern
Answer: D. Fails to learn the underlying pattern


Q. 8. A model with low bias and high variance is likely to:
A. Underfit
B. Generalize well
C. Overfit
D. Have no error
Answer: C. Overfit


Q. 9. Which type of model is likely to generalize well on unseen data?
A. High bias, low variance
B. Low bias, high variance
C. Balanced bias and variance
D. High bias, high variance
Answer: C. Balanced bias and variance


Q. 10. The Bayes error rate is defined as:
A. The sum of training and testing errors
B. The minimum error achievable by any classifier
C. The average prediction error
D. The model complexity limit
Answer: B. The minimum error achievable by any classifier


Q. 11. What is the purpose of cross-validation?
A. To reduce training time
B. To increase training error
C. To evaluate model performance on unseen data
D. To tune hyperparameters randomly
Answer: C. To evaluate model performance on unseen data


Q. 12. What is LOOCV?
A. Leave-One-Out Cross Validation
B. Logistic Optimization of Output Classes
C. Linear Output Optimization CV
D. Loss Objective Output Control
Answer: A. Leave-One-Out Cross Validation


Q. 13. Which validation technique uses sampling with replacement?
A. Holdout
B. k-Fold
C. LOOCV
D. Bootstrap
Answer: D. Bootstrap


Q. 14. The 0.632 Bootstrap indicates that approximately what % of instances are used in training?
A. 63.2%
B. 36.8%
C. 50%
D. 100%
Answer: A. 63.2%


Q. 15. In model selection, what is the primary goal?
A. Minimizing dataset size
B. Maximizing training time
C. Choosing the best model for generalization
D. Selecting the most complex model
Answer: C. Choosing the best model for generalization


Q. 16. Which of the following techniques helps to reduce overfitting?
A. Increasing model complexity
B. Using the entire dataset for training
C. Regularization
D. Ignoring variance
Answer: C. Regularization


Q. 17. High bias and low variance typically result in:
A. Overfitting
B. Underfitting
C. Perfect accuracy
D. Noise modeling
Answer: B. Underfitting


Q. 18. Generalization error is measured on:
A. Training set
B. Validation set
C. Test set
D. Both B and C
Answer: D. Both B and C


Q. 19. What aspect is most critical when tuning a model?
A. Reducing dataset
B. Increasing training error
C. Balancing bias and variance
D. Maximizing variance
Answer: C. Balancing bias and variance


Q. 20. What does the bias-variance tradeoff graph illustrate?
A. Training speed
B. Model accuracy vs. runtime
C. Error vs. model complexity
D. Data size vs. prediction error
Answer: C. Error vs. model complexity


Q. 21. What type of error arises from assumptions in the learning algorithm?
A. Variance
B. Bias
C. Irreducible Error
D. Prediction Error
Answer: B. Bias


Q. 22. What causes high variance in a machine learning model?
A. Not enough features
B. Too little data
C. Excessive sensitivity to training data
D. Linear relationships only
Answer: C. Excessive sensitivity to training data


Q. 23. Which model evaluation method splits data into fixed training and test sets?
A. Cross-validation
B. LOOCV
C. Holdout
D. Bootstrap
Answer: C. Holdout


Q. 24. Which error can be reduced by increasing model complexity?
A. Irreducible error
B. Variance
C. Bias
D. Noise
Answer: C. Bias


Q. 25. A model with high variance will typically:
A. Underfit
B. Have stable predictions
C. Be very flexible
D. Ignore training data
Answer: C. Be very flexible


Q. 26. What is the effect of using a very complex model?
A. Underfitting
B. High bias
C. Overfitting
D. Low variance
Answer: C. Overfitting


Q. 27. Which type of error is directly influenced by noise in the data?
A. Bias
B. Variance
C. Bayes Error
D. None of the above
Answer: C. Bayes Error


Q. 28. Which method involves repeating random train-test splits multiple times?
A. K-Fold CV
B. Random Subsampling
C. Bootstrap
D. LOOCV
Answer: B. Random Subsampling


Q. 29. LOOCV uses how many data points in the test set per iteration?
A. 10%
B. 1%
C. 1 sample
D. Half the dataset
Answer: C. 1 sample


Q. 30. Which of the following is a method to assess generalization performance?
A. R-squared on training data
B. Mean Absolute Error on training
C. Accuracy on test data
D. Learning rate
Answer: C. Accuracy on test data


Q. 31. In k-Fold Cross-Validation, how many models are trained?
A. 1
B. k
C. n/k
D. 2k
Answer: B. k


Q. 32. Which model selection question addresses interpretability?
A. Does it use many resources?
B. Is it easy to use?
C. Can it determine feature importance?
D. How fast does it converge?
Answer: C. Can it determine feature importance?


Q. 33. If a model performs poorly on both training and test sets, it's likely:
A. Overfitting
B. Underfitting
C. Well-balanced
D. Noise-tolerant
Answer: B. Underfitting


Q. 34. What technique is often used to reduce variance in models?
A. Dropout
B. Regularization
C. Ensembling
D. Feature Scaling
Answer: C. Ensembling


Q. 35. Which of the following is true about bias-variance trade-off?
A. Reducing bias always increases variance
B. It’s impossible to reduce both
C. An optimal model minimizes both
D. Variance is not affected by model complexity
Answer: C. An optimal model minimizes both


Q. 36. What happens when a model has low bias and low variance?
A. Overfits
B. Underfits
C. Generalizes well
D. Has high prediction error
Answer: C. Generalizes well


Q. 37. Which technique resamples data without replacement?
A. Bootstrap
B. Random Subsampling
C. Holdout
D. k-Fold CV
Answer: D. k-Fold CV


Q. 38. What is the key assumption in the Holdout method?
A. The data is time-dependent
B. Train and test sets are equally distributed
C. Samples are drawn from different distributions
D. Test data is larger than training data
Answer: B. Train and test sets are equally distributed


Q. 39. A model that learns noise in the training data is:
A. Underfit
B. Cross-validated
C. Overfit
D. Bootstrapped
Answer: C. Overfit


Q. 40. Which error cannot be eliminated even with perfect model selection?
A. Variance
B. Bias
C. Irreducible Error
D. Reducible Error
Answer: C. Irreducible Error


Q. 41. Which part of the total prediction error is controllable?
A. Noise
B. Bayes Error
C. Reducible Error
D. Overhead
Answer: C. Reducible Error


Q. 42. Which method is most computationally expensive?
A. Holdout
B. 10-Fold CV
C. LOOCV
D. Bootstrap
Answer: C. LOOCV


Q. 43. Which technique is preferred when data is limited?
A. Holdout
B. LOOCV
C. Random split
D. Train on all data
Answer: B. LOOCV


Q. 44. Why is the 0.632 bootstrap called so?
A. 63.2% of features used
B. 63.2% of data is duplicated
C. 63.2% of instances are in training set
D. None of the above
Answer: C. 63.2% of instances are in training set


Q. 45. A good model selection process includes:
A. Only training evaluation
B. Only variance analysis
C. Consideration of model accuracy, bias, and interpretability
D. Ignoring feature importance
Answer: C. Consideration of model accuracy, bias, and interpretability


Q. 46. The key to avoiding both underfitting and overfitting is:
A. High model complexity
B. Less training data
C. Proper validation
D. Ignoring regularization
Answer: C. Proper validation


Q. 47. Which is a symptom of high training accuracy but low test accuracy?
A. Bias
B. Underfitting
C. Overfitting
D. Optimal performance
Answer: C. Overfitting


Q. 48. Bias is primarily caused by:
A. Large datasets
B. Noise in the test set
C. Inflexible models
D. Redundant features
Answer: C. Inflexible models


Q. 49. Variance can be reduced using:
A. Simpler models
B. Noisy features
C. Adding polynomial terms
D. Overtraining
Answer: A. Simpler models


Q. 50. Which error type reflects poor model generalization due to oversensitivity?
A. Bias
B. Variance
C. Bayes Error
D. Holdout error
Answer: B. Variance


Q. 51. Which validation method results in the smallest test set per split?
A. Holdout
B. k-Fold CV
C. LOOCV
D. Bootstrap
Answer: C. LOOCV


Q. 52. The bias-variance trade-off helps to:
A. Optimize test time
B. Balance underfitting and overfitting
C. Eliminate all errors
D. Select best hyperparameters automatically
Answer: B. Balance underfitting and overfitting


Q. 53. A model with low training error but high test error is likely:
A. Well-regularized
B. Underfit
C. Overfit
D. Cross-validated
Answer: C. Overfit


Q. 54. The main goal of model evaluation is to:
A. Fit the training set perfectly
B. Minimize training loss
C. Measure generalization performance
D. Maximize number of features
Answer: C. Measure generalization performance


Q. 55. Which model selection factor checks if model captures complex patterns without overfitting?
A. Usability
B. Speed
C. Pattern Generalization
D. Bias tuning
Answer: C. Pattern Generalization


Q. 56. A model's inability to learn the training data well indicates:
A. High variance
B. High complexity
C. Underfitting
D. Overfitting
Answer: C. Underfitting


Q. 57. Which validation method may give high variance if the data is not representative?
A. Holdout
B. k-Fold CV
C. LOOCV
D. Bootstrap
Answer: A. Holdout


Q. 58. A classifier achieving the Bayes error rate is considered:
A. Overfitted
B. Irreducible
C. Optimal
D. Biased
Answer: C. Optimal


Q. 59. Which of the following increases variance in the model?
A. Increasing training data
B. Regularization
C. More features and deeper trees
D. Dimensionality reduction
Answer: C. More features and deeper trees


Q. 60. What causes underfitting most directly?
A. High data noise
B. High model complexity
C. Low model complexity
D. Irreducible error
Answer: C. Low model complexity


Q. 61. Generalization error is lowest when:
A. Bias is high and variance is low
B. Both bias and variance are high
C. Both bias and variance are low
D. Training accuracy is low
Answer: C. Both bias and variance are low


Q. 62. A model’s accuracy on training and testing is 100%. Which is most likely?
A. Generalization
B. Bias problem
C. Data leakage or memorization
D. Model underfitting
Answer: C. Data leakage or memorization


Q. 63. Which step is essential before evaluating models?
A. Hyperparameter tuning
B. Data shuffling
C. Feature selection
D. Separating unseen test data
Answer: D. Separating unseen test data


Q. 64. The “bias error” reflects:
A. Sensitivity to data sampling
B. Over-reaction to small changes
C. Deviation from true values due to model assumptions
D. Number of training samples used
Answer: C. Deviation from true values due to model assumptions


Q. 65. Increasing model complexity will usually:
A. Increase bias
B. Reduce variance
C. Increase variance
D. Always improve accuracy
Answer: C. Increase variance


Q. 66. What’s a common cause of high test error in overfitting?
A. Too few features
B. Too simple a model
C. No cross-validation
D. Training noise capture
Answer: D. Training noise capture


Q. 67. A good ML model must:
A. Learn from all data points
B. Fit training data perfectly
C. Generalize well to new data
D. Be simple and shallow
Answer: C. Generalize well to new data


Q. 68. The term "overfitting" refers to:
A. Not enough features
B. Too much training
C. Fitting data noise and outliers
D. Undertraining
Answer: C. Fitting data noise and outliers


Q. 69. Which statement is false about cross-validation?
A. It helps in hyperparameter tuning
B. It estimates performance on unseen data
C. It increases overfitting
D. It’s more reliable than holdout
Answer: C. It increases overfitting


Q. 70. A model's ability to fit training data but not generalize means:
A. It has low bias and high variance
B. It is well regularized
C. It is underfitting
D. It has high bias and low variance
Answer: A. It has low bias and high variance


Q. 71. In bootstrapping, what’s the approximate probability a sample is not picked?
A. 0.632
B. 1/e or ~0.368
C. 0.5
D. 1
Answer: B. 1/e or ~0.368


Q. 72. An example of irreducible error source is:
A. Model overfitting
B. Random noise in observations
C. Poor feature selection
D. Unoptimized hyperparameters
Answer: B. Random noise in observations


Q. 73. A large variance suggests:
A. Good generalization
B. Poor training
C. Overly flexible model
D. Low training error
Answer: C. Overly flexible model


Q. 74. The bias-variance curve demonstrates:
A. Error vs. data noise
B. Error vs. training time
C. Error vs. model complexity
D. Training vs. test performance
Answer: C. Error vs. model complexity


Q. 75. Model tuning involves optimizing:
A. Data size
B. Label encoding
C. Hyperparameters
D. Cross-validation folds only
Answer: C. Hyperparameters


Q. 76. Regularization is used to:
A. Increase variance
B. Decrease bias
C. Reduce model complexity
D. Increase overfitting
Answer: C. Reduce model complexity


Q. 77. Which models are prone to high variance?
A. Linear regression
B. Decision Trees without pruning
C. Naive Bayes
D. Logistic regression
Answer: B. Decision Trees without pruning


Q. 78. Evaluating a model’s bias/variance is useful for:
A. Reducing training data
B. Choosing between under/overfitting
C. Avoiding regularization
D. Increasing training speed
Answer: B. Choosing between under/overfitting


Q. 79. Which strategy is most effective to handle high variance?
A. Increase learning rate
B. Reduce dataset
C. Apply regularization
D. Remove model assumptions
Answer: C. Apply regularization


Q. 80. Which is not a model evaluation metric?
A. Confusion matrix
B. Accuracy
C. Variance
D. Precision
Answer: C. Variance


Q. 81. What is a "noise-fitting" model most likely suffering from?
A. High bias
B. Low complexity
C. Overfitting
D. Feature irrelevance
Answer: C. Overfitting


Q. 82. What reduces model sensitivity to data variation?
A. Dropout
B. Increasing model depth
C. Simpler models
D. Removing validation
Answer: C. Simpler models


Q. 83. Which factor is not considered in model selection?
A. Training speed
B. Feature interpretability
C. Ability to memorize
D. Generalization capability
Answer: C. Ability to memorize


Q. 84. The total prediction error is made of:
A. Bias + Variance + Irreducible error
B. Bias × Variance
C. Noise + Bias
D. Only variance
Answer: A. Bias + Variance + Irreducible error


Q. 85. Training a very deep neural network without regularization may cause:
A. High bias
B. Overfitting
C. LOOCV
D. Underfitting
Answer: B. Overfitting


Q. 86. What does model generalization mean?
A. How fast the model trains
B. How well it performs on training data
C. How well it performs on unseen data
D. How well it handles GPU
Answer: C. How well it performs on unseen data


Q. 87. Cross-validation helps in:
A. Improving model speed
B. Feature generation
C. Estimating test accuracy
D. Noise removal
Answer: C. Estimating test accuracy


Q. 88. Which of these increases bias?
A. Deep neural nets
B. Complex models
C. Simple models
D. Large datasets
Answer: C. Simple models


Q. 89. What does "validation set" help with?
A. Measuring irreducible error
B. Evaluating test performance
C. Tuning model parameters
D. Decreasing feature size
Answer: C. Tuning model parameters


Q. 90. Which is true about variance error?
A. Related to data assumptions
B. Is irreducible
C. Increases with model flexibility
D. Not linked to generalization
Answer: C. Increases with model flexibility


Q. 91. What best describes overfitting?
A. Training error > Test error
B. Test error > Training error
C. Equal test and training error
D. Test error = 0
Answer: B. Test error > Training error


Q. 92. Model evaluation must be done on:
A. The training set
B. The same data used for tuning
C. Unseen test set
D. Irrelevant features
Answer: C. Unseen test set


Q. 93. High bias and low variance results in:
A. Overfitting
B. Perfect fit
C. Underfitting
D. No model error
Answer: C. Underfitting


Q. 94. A perfect classifier may still have error due to:
A. Bias
B. Variance
C. Bayes error
D. LOOCV
Answer: C. Bayes error


Q. 95. What should be minimized for better generalization?
A. Only bias
B. Only variance
C. Total prediction error
D. Number of models
Answer: C. Total prediction error


Q. 96. Which factor is important in model usability?
A. Accuracy
B. Training time
C. Complexity of interface
D. Ease of implementation
Answer: D. Ease of implementation


Q. 97. Bias is related to:
A. Model's flexibility
B. Number of features
C. Amount of data
D. Noise in labels
Answer: A. Model's flexibility


Q. 98. The optimal model has:
A. Low bias, low variance
B. High bias, low variance
C. Low bias, high variance
D. High bias, high variance
Answer: A. Low bias, low variance


Q. 99. High variance leads to:
A. Better generalization
B. Consistent predictions
C. Unstable outputs across training sets
D. High interpretability
Answer: C. Unstable outputs across training sets


Q. 100. What is cross-validation mainly used for?
A. Data augmentation
B. Improving bias
C. Estimating test performance
D. Dataset balancing
Answer: C. Estimating test performance


Q. 101. What helps diagnose underfitting?
A. High training and test error
B. Low training and high test error
C. No error
D. High precision only
Answer: A. High training and test error

Previous Post Next Post