CENTRAL TENDENCY AND DISPERSION:
- Central Tendency:
- Arithmetic Mean: Average of values; sensitive to outliers.
- Median: Middle value in an ordered dataset; better than mean when data has outliers.
- Mode: Most frequent value; datasets can be unimodal, bimodal, or trimodal.
- Dealing with Outliers:
- Trimmed Mean: Excludes extreme values from both ends.
- Winsorized Mean: Replaces extreme values with nearest percentiles.
- Measures of Location:
- Quantiles: General term (e.g., quartiles, quintiles, percentiles).
- Interquartile Range: Difference between 75th and 25th percentiles.
- Box-and-Whisker Plot: Visual tool for showing data spread and potential outliers.
- Dispersion (Risk):
- Range: Difference between maximum and minimum values.
- Mean Absolute Deviation (MAD): Average absolute difference from mean.
- Sample Variance: Average of squared deviations (divided by n−1).
- Standard Deviation: Square root of variance; interpretable in original units.
- Coefficient of Variation (CV): Standard deviation divided by mean; allows comparison across datasets.
- Downside Risk:
- Target Downside Deviation: Measures deviations below a specific target (e.g., the mean); focuses only on negative risk.
SKEWNESS, KURTOSIS, AND CORRELATION:
- Symmetry in Distributions: A symmetrical distribution has equal frequency of gains and losses around the mean; asymmetry indicates skewness.
- Skewness Types:
- Positive Skew (Right-skewed): Outliers are above the mean; mean > median > mode.
- Negative Skew (Left-skewed): Outliers are below the mean; mean < median < mode.
- Skew affects the mean most and pulls it in the direction of the skew.
- Sample Skewness:
- Measures asymmetry using cubed deviations from the mean.
- Positive skewness means right-skewed; negative means left-skewed.
- Values > |0.5| are significant.
- Kurtosis:
- Measures peakedness of a distribution.
- Leptokurtic: More peaked with fatter tails; Platykurtic: Flatter; Mesokurtic: Normal kurtosis.
- Excess Kurtosis = Kurtosis − 3; used to compare with normal distribution.
- Higher excess kurtosis and negative skew increase investment risk.
- Scatter Plots & Correlation:
- Scatter plots visualize variable relationships; can show linear and nonlinear patterns.
- Correlation coefficient (ρ) standardizes covariance, ranges from −1 to +1.
- ρ = +1 (perfect positive), ρ = −1 (perfect negative), ρ = 0 (no linear relationship).
- Correlation Considerations:
- Correlation ≠ causation.
- Outliers can distort correlation.
- Spurious correlation may occur due to a third variable or random chance (e.g., humorous examples from Tyler Vigen).
KEY CONCEPTS :
- Measures of Central Tendency:
- Mean: Arithmetic average; sample mean applies to a sample.
- Median: Middle value in ordered data.
- Mode: Most frequent value; modal interval used for continuous data.
- Trimmed/Winsorized Mean: Reduce outlier impact by omitting or capping them.
- Quantiles:
- Values dividing data: quartiles (4 parts), quintiles (5), deciles (10), percentiles (100).
- Measures of Dispersion:
- Range: Difference between max and min.
- MAD: Average absolute deviation from the mean.
- Variance/Standard Deviation: Average squared deviation and its square root.
- Coefficient of Variation (CV): Ratio of standard deviation to mean.
- Semideviation: Measures downside risk.
- Skewness and Kurtosis:
- Skewness: Right-skewed (mean > median > mode); left-skewed is the reverse.
- Kurtosis: Measures tail weight; leptokurtic (fat tails), platykurtic (thin tails).
- Excess Kurtosis: Compared to normal distribution kurtosis of 3.
- Correlation:
- Measures linear association (−1 to +1).
- Scatter plots show nonlinear trends.
- Correlation ≠ causation; spurious correlations may arise.