11-13-2025, 02:14 PM
Statistics & Probability Essentials — Data, Averages, Chance & Distributions
Statistics and probability help us understand data, make predictions, and measure uncertainty.
From science and business to medicine and machine learning — these ideas appear everywhere.
This thread introduces the fundamentals in a simple, beginner-friendly way.
-----------------------------------------------------------------------
1. Types of Data
Qualitative (categorical):
• colours
• names
• types
Quantitative (numerical):
Separated into:
• Discrete data (whole numbers)
• Continuous data (any value in a range)
Examples:
• number of goals scored → discrete
• height, weight, temperature → continuous
-----------------------------------------------------------------------
2. Averages (Measures of Central Tendency)
Mean:
Add all values, divide by number of values.
Median:
Middle value when data is ordered.
Mode:
Most frequent value.
Range:
Difference between max and min.
Example:
Data: 4, 7, 9, 9, 12
• mean = 41 / 5 = 8.2
• median = 9
• mode = 9
• range = 12 − 4 = 8
-----------------------------------------------------------------------
3. Frequency Tables
Example:
| Score | Frequency |
|-------|-----------|
| 1 | 3 |
| 2 | 7 |
| 3 | 5 |
Total responses = 3 + 7 + 5 = 15
Mean from table:
-----------------------------------------------------------------------
4. Representing Data
• bar charts
• pie charts
• histograms
• line graphs
• scatter graphs
Scatter graphs:
Used to show correlation.
Types of correlation:
• positive
• negative
• none
-----------------------------------------------------------------------
5. Probability Basics
Probability always lies between 0 and 1.
0 = impossible
1 = certain
Probability of an event:
Example:
Rolling an even number on a dice = 3/6 = 1/2
-----------------------------------------------------------------------
6. Mutually Exclusive Events
If events cannot happen at the same time:
Example:
Rolling a 1 or a 6 = 1/6 + 1/6 = 2/6
-----------------------------------------------------------------------
7. Independent Events
Events that do NOT affect each other:
Example:
Flipping two coins:
P(Heads then Heads) = 1/2 × 1/2 = 1/4
-----------------------------------------------------------------------
8. Conditional Probability (Simple Version)
Probability of A happening given B has already happened:
Example:
If 3 cards are red out of 5 total and you remove one red, the probabilities change.
-----------------------------------------------------------------------
9. Distributions (Beginner Overview)
Uniform distribution:
All outcomes are equally likely (e.g., dice roll).
Normal distribution (bell curve):
Real-life measurements often follow this:
• height
• test scores
• measurement errors
Characteristics:
• symmetric
• mean = median = mode
• 68% of data within 1 standard deviation
• 95% within 2 SD
-----------------------------------------------------------------------
10. Standard Deviation (Spread of Data)
Measures how much data varies from the mean.
Small SD → data close together
Large SD → data spread out
Simple example:
Data: 8, 9, 10
Mean = 9
SD is small because all values are close.
-----------------------------------------------------------------------
11. Common Mistakes
❌ Thinking probability can be over 1
✔ It must be between 0 and 1
❌ Forgetting to divide by total outcomes
✔ Always count the full set
❌ Confusing independent with mutually exclusive
✔ independent = do not affect each other
✔ mutually exclusive = cannot both happen
❌ Mixing histograms with bar charts
✔ histograms = continuous data
✔ bars touch
❌ Thinking the mean is always best
✔ median is better for skewed data
-----------------------------------------------------------------------
12. Practice Questions
1. Calculate the mean of: 5, 12, 7, 7, 9
2. A bag has 3 blue, 5 red, 2 green. P(red)?
3. Roll two dice. P(sum = 8)?
4. Data: 14, 18, 22, 25, 27. Find median.
5. Identify the type of correlation for points trending upward on a scatter graph.
6. In a class test: mean = 60, SD = 2. Which class is more consistent: SD = 2 or SD = 10?
-----------------------------------------------------------------------
Summary
This post covered:
• types of data
• averages
• frequency tables
• probability rules
• independence
• conditional probability
• distributions
• standard deviation
• practice questions
Statistics & probability help us understand uncertainty, patterns, and real-world behaviour — essential for science, computing, economics, and research.
Statistics and probability help us understand data, make predictions, and measure uncertainty.
From science and business to medicine and machine learning — these ideas appear everywhere.
This thread introduces the fundamentals in a simple, beginner-friendly way.
-----------------------------------------------------------------------
1. Types of Data
Qualitative (categorical):
• colours
• names
• types
Quantitative (numerical):
Separated into:
• Discrete data (whole numbers)
• Continuous data (any value in a range)
Examples:
• number of goals scored → discrete
• height, weight, temperature → continuous
-----------------------------------------------------------------------
2. Averages (Measures of Central Tendency)
Mean:
Add all values, divide by number of values.
Median:
Middle value when data is ordered.
Mode:
Most frequent value.
Range:
Difference between max and min.
Example:
Data: 4, 7, 9, 9, 12
• mean = 41 / 5 = 8.2
• median = 9
• mode = 9
• range = 12 − 4 = 8
-----------------------------------------------------------------------
3. Frequency Tables
Example:
| Score | Frequency |
|-------|-----------|
| 1 | 3 |
| 2 | 7 |
| 3 | 5 |
Total responses = 3 + 7 + 5 = 15
Mean from table:
Code:
(1×3 + 2×7 + 3×5) / 15 = 2.13-----------------------------------------------------------------------
4. Representing Data
• bar charts
• pie charts
• histograms
• line graphs
• scatter graphs
Scatter graphs:
Used to show correlation.
Types of correlation:
• positive
• negative
• none
-----------------------------------------------------------------------
5. Probability Basics
Probability always lies between 0 and 1.
0 = impossible
1 = certain
Probability of an event:
Code:
P(event) = number of favourable outcomes / total outcomesExample:
Rolling an even number on a dice = 3/6 = 1/2
-----------------------------------------------------------------------
6. Mutually Exclusive Events
If events cannot happen at the same time:
Code:
P(A or B) = P(A) + P(B)Example:
Rolling a 1 or a 6 = 1/6 + 1/6 = 2/6
-----------------------------------------------------------------------
7. Independent Events
Events that do NOT affect each other:
Code:
P(A and B) = P(A) × P(B)Example:
Flipping two coins:
P(Heads then Heads) = 1/2 × 1/2 = 1/4
-----------------------------------------------------------------------
8. Conditional Probability (Simple Version)
Probability of A happening given B has already happened:
Code:
P(A|B)Example:
If 3 cards are red out of 5 total and you remove one red, the probabilities change.
-----------------------------------------------------------------------
9. Distributions (Beginner Overview)
Uniform distribution:
All outcomes are equally likely (e.g., dice roll).
Normal distribution (bell curve):
Real-life measurements often follow this:
• height
• test scores
• measurement errors
Characteristics:
• symmetric
• mean = median = mode
• 68% of data within 1 standard deviation
• 95% within 2 SD
-----------------------------------------------------------------------
10. Standard Deviation (Spread of Data)
Measures how much data varies from the mean.
Small SD → data close together
Large SD → data spread out
Simple example:
Data: 8, 9, 10
Mean = 9
SD is small because all values are close.
-----------------------------------------------------------------------
11. Common Mistakes
❌ Thinking probability can be over 1
✔ It must be between 0 and 1
❌ Forgetting to divide by total outcomes
✔ Always count the full set
❌ Confusing independent with mutually exclusive
✔ independent = do not affect each other
✔ mutually exclusive = cannot both happen
❌ Mixing histograms with bar charts
✔ histograms = continuous data
✔ bars touch
❌ Thinking the mean is always best
✔ median is better for skewed data
-----------------------------------------------------------------------
12. Practice Questions
1. Calculate the mean of: 5, 12, 7, 7, 9
2. A bag has 3 blue, 5 red, 2 green. P(red)?
3. Roll two dice. P(sum = 8)?
4. Data: 14, 18, 22, 25, 27. Find median.
5. Identify the type of correlation for points trending upward on a scatter graph.
6. In a class test: mean = 60, SD = 2. Which class is more consistent: SD = 2 or SD = 10?
-----------------------------------------------------------------------
Summary
This post covered:
• types of data
• averages
• frequency tables
• probability rules
• independence
• conditional probability
• distributions
• standard deviation
• practice questions
Statistics & probability help us understand uncertainty, patterns, and real-world behaviour — essential for science, computing, economics, and research.
