Customer Segmentation and Clustering Techniques - K-Means, DBSCAN, Hierarchical

customer-segmentation-clustering-mcqs-answers-kmeans-dbscan

MCQs on Customer Segmentation and Clustering

Q.1. What is the primary purpose of customer segmentation?
A. Improve employee performance
B. Group customers based on shared characteristics
C. Combine all customers into one profile
D. Predict stock prices
✅ Answer: B. Group customers based on shared characteristics

Q.2. The 'T' in STP strategy stands for:
A. Testing
B. Targeting
C. Trading
D. Tracking
✅ Answer: B. Targeting

Q.3. Rule-based segmentation fails when:
A. We use too much automation
B. Data is in structured form
C. Goals or knowledge change
D. We use demographic data
✅ Answer: C. Goals or knowledge change

Q.4. Which technique groups customers into bins like income tiers?
A. Rule-based segmentation
B. K-means clustering
C. Binning
D. Hierarchical clustering
✅ Answer: C. Binning

Q.5. Zero-knowledge segmentation is best associated with:
A. Decision trees
B. Clustering algorithms
C. Regression models
D. Rule sets
✅ Answer: B. Clustering algorithms

Q.6. In clustering, intra-cluster distance should be:
A. Maximized
B. Ignored
C. Minimized
D. Constant
✅ Answer: C. Minimized

Q.7. A dendrogram is used in:
A. Linear regression
B. Decision trees
C. Hierarchical clustering
D. Logistic regression
✅ Answer: C. Hierarchical clustering

Q.8. What does a cluster represent in data?
A. A single outlier
B. Random distribution
C. A group of similar items
D. A regression line
✅ Answer: C. A group of similar items

Q.9. What is one key benefit of hierarchical clustering?
A. Fixed number of clusters
B. Always produces K clusters
C. Allows dynamic cluster selection
D. Requires no computation
✅ Answer: C. Allows dynamic cluster selection

Q.10. In agglomerative clustering, you start with:
A. One large cluster
B. Outliers only
C. No clusters
D. Each point as its own cluster
✅ Answer: D. Each point as its own cluster

Q.11. In divisive clustering, the process starts with:
A. Zero clusters
B. One all-inclusive cluster
C. Predefined centroids
D. Noise removal
✅ Answer: B. One all-inclusive cluster

Q.12. Which clustering algorithm is partitional?
A. K-means
B. Divisive hierarchical
C. Agglomerative hierarchical
D. DBSCAN
✅ Answer: A. K-means

Q.13. Which metric measures cluster cohesion?
A. Mean squared error
B. SSE (Sum of Squared Errors)
C. Accuracy
D. Precision
✅ Answer: B. SSE (Sum of Squared Errors)

Q.14. Cluster separation is best defined as:
A. Distance within a cluster
B. Spread of individual values
C. Distance between clusters
D. Distance from the origin
✅ Answer: C. Distance between clusters

Q.15. A silhouette coefficient close to 1 means:
A. Poor clustering
B. Overfitting
C. Good clustering
D. Irrelevant distance measure
✅ Answer: C. Good clustering

Q.16. The silhouette value is calculated using:
A. SSE and precision
B. Average distance to own cluster vs. others
C. Regression residuals
D. Entropy
✅ Answer: B. Average distance to own cluster vs. others

Q.17. In K-means, the centroid is:
A. The oldest point
B. A random sample
C. The mean of the cluster
D. Always fixed
✅ Answer: C. The mean of the cluster

Q.18. K-means converges when:
A. Centroids stop changing
B. Loss increases
C. New clusters are added
D. No clusters remain
✅ Answer: A. Centroids stop changing

Q.19. K-means struggles when clusters have:
A. Equal sizes
B. Globular shapes
C. Outliers or differing densities
D. Only numeric data
✅ Answer: C. Outliers or differing densities

Q.20. The time complexity of K-means is:
A. O(n)
B. O(K)
C. O(n * K * I * d)
D. O(n³)
✅ Answer: C. O(n * K * I * d)

Q.21. What is the objective function of K-means?
A. Entropy maximization
B. Sum of Squared Error minimization
C. Variance normalization
D. Cluster separation maximization
✅ Answer: B. Sum of Squared Error minimization

Q.22. Which scenario illustrates a limitation of K-means?
A. Equal-sized clusters
B. Non-globular clusters
C. Low-dimensional data
D. Clean datasets
✅ Answer: B. Non-globular clusters

Q.23. What can help improve K-means clustering performance?
A. Adding noise
B. Using fixed centroids
C. Removing outliers
D. Decreasing dimensionality
✅ Answer: C. Removing outliers

Q.24. What shape of clusters does K-means assume?
A. Circular/globular
B. Rectangular
C. Irregular
D. Spiral
✅ Answer: A. Circular/globular

Q.25. In hierarchical clustering, once merged, clusters:
A. Can be split again
B. Cannot be split
C. Must include outliers
D. Are re-evaluated every iteration
✅ Answer: B. Cannot be split

Q.26. What does the height of a dendrogram linkage represent?
A. Number of outliers
B. Distance between merged clusters
C. Data point count
D. Cluster density
✅ Answer: B. Distance between merged clusters

Q.27. Agglomerative clustering starts with:
A. A full tree
B. Multiple random centroids
C. Singleton clusters
D. One universal cluster
✅ Answer: C. Singleton clusters

Q.28. Divisive clustering ends with:
A. Singleton clusters
B. One cluster
C. Two clusters only
D. No clusters
✅ Answer: A. Singleton clusters

Q.29. Which clustering method produces a nested hierarchy of clusters?
A. DBSCAN
B. K-means
C. Hierarchical clustering
D. Binning
✅ Answer: C. Hierarchical clustering

Q.30. A drawback of hierarchical clustering is:
A. Low time complexity
B. Easily handles noise
C. Decisions are irreversible
D. Requires number of clusters upfront
✅ Answer: C. Decisions are irreversible

Q.31. Cluster cohesion is best when:
A. Points are far apart
B. Points are close together
C. Clusters overlap
D. SSE is high
✅ Answer: B. Points are close together

Q.32. Cluster separation improves when:
A. SSE increases
B. Silhouette values decrease
C. Inter-cluster distance increases
D. Clusters merge
✅ Answer: C. Inter-cluster distance increases

Q.33. The silhouette coefficient can be negative when:
A. Clusters are perfect
B. Points are poorly assigned
C. SSE is minimized
D. All clusters are equal
✅ Answer: B. Points are poorly assigned

Q.34. What is the range of silhouette coefficient?
A. 0 to 10
B. -1 to 1
C. 1 to 100
D. 0 to infinity
✅ Answer: B. -1 to 1

Q.35. Which clustering method is sensitive to initialization?
A. Hierarchical
B. Binning
C. K-means
D. DBSCAN
✅ Answer: C. K-means

Q.36. What happens to SSE with each K-means iteration?
A. Increases
B. Decreases or stays the same
C. Randomly fluctuates
D. Resets
✅ Answer: B. Decreases or stays the same

Q.37. One method to determine the number of clusters is:
A. Confusion matrix
B. Elbow method using SSE
C. Pearson correlation
D. ROC curve
✅ Answer: B. Elbow method using SSE

Q.38. Cluster validation helps to:
A. Increase overfitting
B. Justify classification error
C. Evaluate clustering quality
D. Improve regression accuracy
✅ Answer: C. Evaluate clustering quality

Q.39. Which of the following is a distance measure?
A. Cosine similarity
B. Regression error
C. Gini index
D. Variance
✅ Answer: A. Cosine similarity

Q.40. What is SSB in clustering?
A. Sum of Squared Bias
B. Supervised Separation Baseline
C. Between-cluster sum of squares
D. Single sample bootstrap
✅ Answer: C. Between-cluster sum of squares

Q.41. What does a high Silhouette score indicate?
A. Poor clustering
B. Random groupings
C. Well-separated, compact clusters
D. Overlapping clusters
✅ Answer: C. Well-separated, compact clusters

Q.42. Which algorithm does NOT require specifying the number of clusters beforehand?
A. K-means
B. Hierarchical Clustering
C. Binning
D. Gaussian Mixture Model
✅ Answer: B. Hierarchical Clustering

Q.43. The time complexity of basic Hierarchical Clustering is:
A. O(n)
B. O(n log n)
C. O(n²)
D. O(n³)
✅ Answer: D. O(n³)

Q.44. Which clustering method uses a proximity matrix?
A. K-means
B. Agglomerative Hierarchical Clustering
C. Rule-based Segmentation
D. Binning
✅ Answer: B. Agglomerative Hierarchical Clustering

Q.45. What does the "elbow point" in the Elbow Method represent?
A. Max SSE
B. Optimal number of clusters
C. Highest silhouette score
D. Most compact cluster
✅ Answer: B. Optimal number of clusters

Q.46. Rule-based segmentation is generally:
A. Scalable and dynamic
B. Static and dependent on expert rules
C. Based on zero-knowledge models
D. Only used for images
✅ Answer: B. Static and dependent on expert rules

Q.47. A major drawback of rule-based segmentation is:
A. Too much computation
B. Needs redesign for each new objective
C. Non-interpretable results
D. Always supervised
✅ Answer: B. Needs redesign for each new objective

Q.48. Binning segmentation requires:
A. Zero domain knowledge
B. Random assignment
C. A clear business goal
D. Hierarchical trees
✅ Answer: C. A clear business goal

Q.49. Clustering based on zero knowledge typically uses:
A. Rule engines
B. Graph networks
C. Clustering algorithms like K-means
D. Decision trees
✅ Answer: C. Clustering algorithms like K-means

Q.50. Which type of clustering creates a tree structure?
A. Partitional
B. Binning
C. Hierarchical
D. Centroid-based
✅ Answer: C. Hierarchical

Q.51. Which of the following is a partitional clustering method?
A. Divisive clustering
B. Hierarchical clustering
C. K-means
D. Dendrogram
✅ Answer: C. K-means

Q.52. Which is true for partitional clustering?
A. Generates overlapping clusters
B. Outputs nested clusters
C. Produces non-overlapping subsets
D. Always uses decision trees
✅ Answer: C. Produces non-overlapping subsets

Q.53. A cluster with high cohesion will have:
A. High within-cluster distance
B. High SSE
C. Low within-cluster distance
D. Low silhouette score
✅ Answer: C. Low within-cluster distance

Q.54. The proximity matrix stores:
A. Cluster labels
B. Distance between points
C. Error values
D. Tree heights
✅ Answer: B. Distance between points

Q.55. Hierarchical clustering’s dendrogram helps in:
A. Reducing features
B. Selecting regression models
C. Visualizing cluster merges
D. Training classifiers
✅ Answer: C. Visualizing cluster merges

Q.56. The Silhouette score compares:
A. SSE and SSB
B. Intra-cluster and inter-cluster distances
C. Initial and final centroids
D. Centroids and medoids
✅ Answer: B. Intra-cluster and inter-cluster distances

Q.57. What is the key disadvantage of divisive clustering?
A. Slow runtime
B. Complexity in splitting
C. Cannot visualize as dendrogram
D. Needs many initial centroids
✅ Answer: B. Complexity in splitting

Q.58. What is the centroid in K-means?
A. Point with minimum error
B. Middle point of a dataset
C. Mean of all points in the cluster
D. Closest point to origin
✅ Answer: C. Mean of all points in the cluster

Q.59. When should clustering be used?
A. To predict future values
B. To group similar items without labels
C. For classifying known categories
D. For deep learning
✅ Answer: B. To group similar items without labels

Q.60. A real-world example of clustering is:
A. Loan approval
B. Image classification
C. Customer segmentation
D. Spam filtering
✅ Answer: C. Customer segmentation

Q.61. In cluster analysis, inter-cluster distance should be:
A. Minimized
B. Ignored
C. Maximized
D. Same as intra-cluster distance
✅ Answer: C. Maximized

Q.62. Which metric measures within-cluster similarity?
A. SSB
B. SSE
C. Entropy
D. ROC AUC
✅ Answer: B. SSE

Q.63. The term ‘partitional clustering’ refers to:
A. Creating overlapping clusters
B. Creating non-overlapping groups
C. Creating nested sub-clusters
D. Creating probabilistic groupings
✅ Answer: B. Creating non-overlapping groups

Q.64. Which of the following is not a type of clustering?
A. Partitional
B. Hierarchical
C. Relational
D. Density-based
✅ Answer: C. Relational

Q.65. Which of these algorithms can create a dendrogram?
A. K-means
B. DBSCAN
C. Hierarchical clustering
D. SVM
✅ Answer: C. Hierarchical clustering

Q.66. In clustering, high cohesion and low separation indicates:
A. Poor clusters
B. Effective clusters
C. Random groups
D. Need for normalization
✅ Answer: A. Poor clusters

Q.67. Which clustering algorithm is sensitive to outliers?
A. K-means
B. DBSCAN
C. Hierarchical (complete linkage)
D. Spectral clustering
✅ Answer: A. K-means

Q.68. DBSCAN stands for:
A. Density-Based Spatial Clustering of Applications with Noise
B. Distributed Batch Spatial Clustering Algorithm for Networks
C. Data Binary Sampling and Clustering Analysis Node
D. Dynamic Batch Sampling for Clustering and Navigation
✅ Answer: A. Density-Based Spatial Clustering of Applications with Noise

Q.69. A key limitation of K-means is:
A. It works only on text data
B. It requires pre-specified number of clusters
C. It is not iterative
D. It performs poorly on numerical data
✅ Answer: B. It requires pre-specified number of clusters

Q.70. Which of the following is true for agglomerative clustering?
A. Starts with one large cluster
B. Splits clusters at each iteration
C. Merges closest clusters step-by-step
D. Produces non-overlapping partitions
✅ Answer: C. Merges closest clusters step-by-step

Q.71. The silhouette coefficient is best when it is:
A. Closer to 0
B. Closer to -1
C. Close to 1
D. Negative
✅ Answer: C. Close to 1

Q.72. What’s a major computational drawback of hierarchical clustering?
A. Cannot be visualized
B. Requires labeled data
C. High time and space complexity
D. Cannot handle numeric attributes
✅ Answer: C. High time and space complexity

Q.73. The centroid in clustering is:
A. The most frequent label
B. Median value of the dataset
C. Geometric center of cluster data
D. A random data point
✅ Answer: C. Geometric center of cluster data

Q.74. Which clustering type can result in nested clusters?
A. K-means
B. Rule-based
C. Hierarchical
D. DBSCAN
✅ Answer: C. Hierarchical

Q.75. Cluster separation is typically measured by:
A. SSE
B. SSB
C. Gini index
D. Entropy
✅ Answer: B. SSB

Q.76. Which segmentation method is least dependent on prior knowledge?
A. Rule-based
B. Binning
C. Clustering (zero-knowledge)
D. Decision-tree segmentation
✅ Answer: C. Clustering (zero-knowledge)

Q.77. K-means clustering updates centroids by:
A. Selecting random points
B. Selecting closest points
C. Calculating cluster mean
D. Selecting furthest point
✅ Answer: C. Calculating cluster mean

Q.78. Cluster analysis can be used for:
A. Classification only
B. Regression models
C. Discovering hidden groups in data
D. Creating labels
✅ Answer: C. Discovering hidden groups in data

Q.79. Which clustering method is good for non-globular shapes?
A. K-means
B. Hierarchical (single linkage)
C. Rule-based
D. PCA
✅ Answer: B. Hierarchical (single linkage)

Q.80. What happens when K-means converges?
A. Labels change frequently
B. Centroids change randomly
C. Centroids stabilize and do not change
D. Clusters are dissolved
✅ Answer: C. Centroids stabilize and do not change

Q.81. Which of the following is NOT an advantage of hierarchical clustering?
A. Dendrogram visualization
B. No need to pre-define cluster number
C. Less sensitive to outliers
D. High memory usage
✅ Answer: D. High memory usage

Q.82. The combination of clustering and visualization is most visible in:
A. DBSCAN
B. Dendrograms
C. Decision Trees
D. Heatmaps
✅ Answer: B. Dendrograms

Q.83. Which of these can measure individual point’s clustering quality?
A. SSE
B. Silhouette coefficient
C. Centroid error
D. Accuracy
✅ Answer: B. Silhouette coefficient

Q.84. A low silhouette score for a point suggests:
A. Good cluster placement
B. Unclear cluster membership
C. High confidence
D. High cohesion
✅ Answer: B. Unclear cluster membership

Q.85. Cluster cohesion is highest when:
A. SSE is low
B. SSE is high
C. Points are distant
D. SSB is high
✅ Answer: A. SSE is low

Q.86. Which measure helps you validate the separation of clusters?
A. Inertia
B. Confusion matrix
C. SSB
D. Precision
✅ Answer: C. SSB

Q.87. What happens to SSE as K increases in K-means?
A. Increases
B. Decreases
C. Remains constant
D. First increases then decreases
✅ Answer: B. Decreases

Q.88. One method to estimate optimal K is:
A. Z-score
B. Elbow Method
C. Confusion Matrix
D. Mean Imputation
✅ Answer: B. Elbow Method

Q.89. Customer segmentation helps in:
A. Random assignment of clusters
B. Forecasting GDP
C. Targeted marketing strategies
D. Solving supervised problems
✅ Answer: C. Targeted marketing strategies

Q.90. Which clustering technique is most useful when you don’t know the data distribution?
A. Rule-based
B. Binning
C. Zero-knowledge clustering
D. Regression
✅ Answer: C. Zero-knowledge clustering

Q.91. The main goal of clustering in customer segmentation is to:
A. Predict purchase value
B. Group customers with similar behavior
C. Classify customers by name
D. Track customer emails
✅ Answer: B. Group customers with similar behavior

Q.92. Which clustering method can work well without knowing the number of clusters?
A. K-means
B. DBSCAN
C. PCA
D. Logistic Regression
✅ Answer: B. DBSCAN

Q.93. In clustering, a silhouette score of -1 indicates:
A. Perfectly clustered point
B. Wrong cluster assignment
C. No distance between clusters
D. Ideal cohesion
✅ Answer: B. Wrong cluster assignment

Q.94. A good clustering model should have:
A. High intra-cluster distance
B. High inter-cluster distance
C. Low inter-cluster similarity
D. Low SSE and low SSB
✅ Answer: B. High inter-cluster distance

Q.95. Market basket analysis is most similar to:
A. Regression
B. Classification
C. Association rules
D. Clustering
✅ Answer: C. Association rules

Q.96. When applying clustering to marketing, the result helps in:
A. Spam detection
B. Churn prediction
C. Segment-based campaigns
D. Database normalization
✅ Answer: C. Segment-based campaigns

Q.97. Which metric decreases when clusters are tight and compact?
A. Silhouette Score
B. SSE
C. SSB
D. Entropy
✅ Answer: B. SSE

Q.98. What does a dendrogram represent?
A. Frequency of values
B. Decision boundaries
C. Hierarchical clustering process
D. Data normalization steps
✅ Answer: C. Hierarchical clustering process

Q.99. An advantage of DBSCAN over K-means is:
A. Faster training time
B. Works well with high-dimensional data
C. Handles noise and outliers
D. Always produces same clusters
✅ Answer: C. Handles noise and outliers

Q.100. Which of the following clustering algorithms can discover clusters of arbitrary shape?
A. K-means
B. DBSCAN
C. Linear SVM
D. Random Forest
✅ Answer: B. DBSCAN

Q.101. In clustering, choosing the right value of K (in K-means) impacts:
A. Overfitting
B. Interpretability
C. Cluster validity
D. Labeling accuracy
✅ Answer: C. Cluster validity

Q.102. A key difference between classification and clustering is:
A. Clustering uses labeled data
B. Classification is unsupervised
C. Clustering is unsupervised
D. Both are supervised
✅ Answer: C. Clustering is unsupervised

Q.103. The Davies-Bouldin Index is used to:
A. Calculate classification error
B. Evaluate clustering performance
C. Optimize learning rate
D. Choose model architecture
✅ Answer: B. Evaluate clustering performance

Q.104. Clusters that are too small and fragmented are often a result of:
A. Large K
B. Small K
C. High bias
D. High inertia
✅ Answer: A. Large K

Q.105. A high SSB (Sum of Squares Between) implies:
A. Overlapping clusters
B. Well-separated clusters
C. Poor cohesion
D. High variance within clusters
✅ Answer: B. Well-separated clusters

Customer Segmentation and Clustering Techniques - K-Means, DBSCAN, Hierarchical | MCQs

MCQs on Customer Segmentation and Clustering

Statistics and Probability Course Outline

Mergers and Acquisitions

Events in Probability: An In-Depth Exploration

Mergers and Acquisitions | 100+ MCQs with Answers | Part 1

Customer Segmentation and Clustering Techniques - K-Means, DBSCAN, Hierarchical | MCQs

MCQs on Customer Segmentation and Clustering

You might like