Sample and Population in Statistics

Introduction

In statistics, the concepts of sample and population are fundamental. Understanding these concepts is essential for conducting proper statistical analysis and making accurate inferences.

Definitions

Population: A population includes all elements (individuals, items, or data points) that we are interested in studying. It is the entire set of subjects or events of interest. For example, if we are studying the heights of all adults in a city, the population would be all adults in that city.

Sample: A sample is a subset of the population. It includes some, but not all, elements from the population. A sample is used to make inferences about the population because it is often impractical or impossible to collect data from the entire population. For example, if we select 100 adults from the city to measure their heights, this group of 100 adults is the sample.

Key Differences

Size:

Population: Usually large or infinite.
Sample: Smaller, manageable subset of the population.

Parameters vs. Statistics:

Population characteristics are called parameters (e.g., population mean $\mu$ , population variance $\sigma^2$ ).
Sample characteristics are called statistics (e.g., sample mean $\bar{x}$ , sample variance $s^2$ ).

Representation:

Population: Complete data set.
Sample: Representative subset intended to reflect the population.

Importance of Sampling

Sampling is essential because:

It is often impractical or impossible to collect data from the entire population.
Sampling saves time and resources.
Properly chosen samples allow for reliable inferences about the population.

Types of Sampling Methods

Simple Random Sampling: Every element in the population has an equal chance of being selected. This can be achieved using random number generators or drawing lots.
Stratified Sampling: The population is divided into strata (groups) based on a characteristic (e.g., age, gender), and random samples are taken from each stratum. This ensures representation from all strata.
Cluster Sampling: The population is divided into clusters (usually based on geography or organization), and entire clusters are randomly selected. This is useful when the population is large and widely dispersed.
Systematic Sampling: Every nth element from a list is selected after a random starting point. This method is simpler but can introduce bias if the list has a pattern.
Convenience Sampling: Samples are chosen based on ease of access. This method is quick and easy but often biased and not representative of the population.

Examples and Numerical Illustrations

Example 1: Estimating the Average Height of Adults in a City

Population: All adults in the city.

Sample: 100 randomly selected adults from the city.

Objective: Estimate the average height of all adults in the city.

Suppose the heights (in cm) of the sample are: 165, 170, 172, 168, 160, ..., 175 (100 values in total).
Sample Mean ( $\bar{x}$ ) : The average height of the sample. $\bar{x} = \frac{\sum_{i=1}^{100} x_i}{100} = 167 \, \text{cm}$
Population Mean ( $\mu$ ) : The average height of the population, which we estimate using the sample mean.

Example 2: Determining the Proportion of Defective Items in a Batch

Population: All items in a batch.

Sample: 50 randomly selected items from the batch.

Objective: Estimate the proportion of defective items in the entire batch.

Suppose 5 out of 50 sampled items are defective.
Sample Proportion ( $\hat{p}$ ) : The proportion of defective items in the sample. $\hat{p} = \frac{5}{50} = 0.1$
Population Proportion (p): The proportion of defective items in the population, estimated using the sample proportion.

Central Limit Theorem and Sampling

The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean will be approximately normally distributed if the sample size is large enough, regardless of the population distribution. This allows us to use the sample mean to make inferences about the population mean.

Example 3: Applying CLT

Population: Test scores of all students in a school, with unknown mean $\mu$ and variance $\sigma^2$ .

Sample: Scores of 30 randomly selected students.

Suppose the sample mean score is 75 and the sample standard deviation is 10.
According to the CLT, the distribution of the sample mean will be approximately normal with:
- Mean $\mu_{\bar{x}} = \mu$
- Standard Error $SE_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{10}{\sqrt{30}} \approx 1.83$

If we want to estimate the population mean score, we can construct a confidence interval around the sample mean using the standard error.

Conclusion

Understanding the concepts of sample and population is crucial in statistics for making reliable inferences. Sampling allows us to gather information efficiently and make predictions about the population. Proper sampling methods and the application of the Central Limit Theorem help ensure that our inferences are accurate and meaningful.

Sample and Population in Statistics