[K-MOOC] Data Analytics for Forecasting and Classification: 1-1. Regression analysis, Simple regression model, Model estimation - Hubert Life (out2)

2019년 3월 29일 금요일

[K-MOOC] Data Analytics for Forecasting and Classification: 1-1. Regression analysis, Simple regression model, Model estimation

  • Regression Analysis
    • In order to explain a variable, to analyze statistical causal relationships between related variables
    • independent variable: causes
    • dependent variable: outcomes
  • Regression Model
    • Simple Regression Model
      • 𝑿 ⇨ 𝒀
      • Observation: (𝑿₁,𝒀₁), (𝑿₂,𝒀₂), ... , (𝑿𝘯,𝒀𝘯) (𝑛 is observation number)
      • Simple Regression Model: 
        • 𝒀𝑖 = 𝜷₀ + 𝜷₁𝑿𝑖 + 𝑖,    𝑖 = 1,2, ... , 𝑛
          • 𝑖: error term. 
            • Assume that it follows a normal distribution with mean 0 and variance 𝛔²
            • 𝑖~𝙉𝙤𝙧(0,𝛔²)
          • 𝑿 is not random variable, but a given value
          • so, three parameters need to be estimated
            • 𝜷₁: slope of the linear equation
            • 𝜷₀: intercept
            • 𝛔²: variance of the error term
      • Estimation of intercept 𝜷₀ and slope 𝜷₁
        • Using least squares method
        • to minimize the objective function 𝐐
        • objective function 𝐐
          • sum of the square of the difference between the observed value of dependent variable 𝒀, and the fitted value provided by the model on the linear line 𝜷₀ + 𝜷₁𝑿𝑖
          • 𝐐 = ∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖)²
      • How to?
        • (𝑿,𝒀) is observed value, so let 𝐐 be a function of 𝜷₀ and 𝜷₁ 
        • and partially differentiate 𝐐 with respect to 𝜷₀
          = -2∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖) = 0
        • and partially differentiate 𝐐 with respect to 𝜷₁
          = -2∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖)𝑿𝑖 = 0
        • estimated equation: 𝒀-hat = 𝜷₀-hat + 𝜷₁-hat * 𝑿
      • Estimation of variance of the error term 𝛔²
        • Using sample variance of the residuals
          • residual
            substract the estimated value from the observed value of 𝒀
            𝒆𝑖 = 𝒀𝑖 - 𝒀-hat = 𝒀𝑖 - 𝜷₀-hat + 𝜷₁-hat * 𝑿𝑖
          • SSE
            resudual/error sum of squares
            = ∑(𝒀𝑖 - 𝒀𝑖-hat)²
          • estimate 𝛔² by using MSE
            𝛔²-hat = MSE(Mean Squared Error) = SSE / 𝑛-2
            (𝑛-2) is  degree of freedom

댓글 없음: