In a multiple regression analysis ssr 1000 and sse 200 the f statistic for this model is

We've covered quite a bit of ground. Let's review the analysis of variance table for the example concerning skin cancer mortality and latitude (Skin Cancer data).

Analysis of Variance

SourceDFAdj SSAdj MSF-ValueP-ValueConstantResidual ErrorTotal
1 36464 36464 99.80 0.000
47 17173 365    
48 53637      

Model Summary

SR-sqR-sq(adj)
19.12 68.0% 67.3%

Coefficients

PredictorCoefSE CoefT-ValueP-ValueConstantLat
389.19 23.81 16.34 0.000
-5.9776 0.5984 -9.99 0.000

Regression Equation

Mort = 389 - 5.98 Lat

Recall that there were 49 states in the data set.

  • The degrees of freedom associated with SSR will always be 1 for the simple linear regression model. The degrees of freedom associated with SSTO is n-1 = 49-1 = 48. The degrees of freedom associated with SSE is n-2 = 49-2 = 47. And the degrees of freedom add up: 1 + 47 = 48.
  • The sums of squares add up: SSTO = SSR + SSE. That is, here: 53637 = 36464 + 17173.

Let's tackle a few more columns of the analysis of variance table, namely the "mean square" column, labeled MS, and the F-statistic column labeled F.

Definitions of mean squares

We already know the "mean square error (MSE)" is defined as:

\(MSE=\dfrac{\sum(y_i-\hat{y}_i)^2}{n-2}=\dfrac{SSE}{n-2}\)

That is, we obtain the mean square error by dividing the error sum of squares by its associated degrees of freedom n-2. Similarly, we obtain the "regression mean square (MSR)" by dividing the regression sum of squares by its degrees of freedom 1:

\(MSR=\dfrac{\sum(\hat{y}_i-\bar{y})^2}{1}=\dfrac{SSR}{1}\)

Of course, that means the regression sum of squares (SSR) and the regression mean square (MSR) are always identical for the simple linear regression model.

Now, why do we care about mean squares? Because their expected values suggest how to test the null hypothesis \(H_{0} \colon \beta_{1} = 0\) against the alternative hypothesis \(H_{A} \colon \beta_{1} ≠ 0\).

Expected mean squares

Imagine taking many, many random samples of size n from some population, estimating the regression line, and determining MSR and MSE for each data set obtained. It has been shown that the average (that is, the expected value) of all of the MSRs you can obtain equals:

\(E(MSR)=\sigma^2+\beta_{1}^{2}\sum_{i=1}^{n}(X_i-\bar{X})^2\)

Similarly, it has been shown that the average (that is, the expected value) of all of the MSEs you can obtain equals:

\(E(MSE)=\sigma^2\)

These expected values suggest how to test \(H_{0} \colon \beta_{1} = 0\) versus \(H_{A} \colon \beta_{1} ≠ 0\):

  • If \(\beta_{1} = 0\), then we'd expect the ratio MSR/MSE to equal 1.
  • If \(\beta_{1} ≠ 0\), then we'd expect the ratio MSR/MSE to be greater than 1.

These two facts suggest that we should use the ratio, MSR/MSE, to determine whether or not \(\beta_{1} = 0\).

Note! because \(\beta_{1}\) is squared in E(MSR), we cannot use the ratio MSR/MSE:

  • to test \(H_{0} \colon \beta_{1} = 0\) versus \(H_{A} \colon \beta_{1} < 0\)
  • or to test \(H_{0} \colon \beta_{1} = 0\) versus \(H_{A} \colon \beta_{1} > 0\).

We can only use MSR/MSE to test \(H_{0} \colon \beta_{1} = 0\) versus \(H_{A} \colon \beta_{1} ≠ 0\).

We have now completed our investigation of all of the entries of a standard analysis of variance table. The formula for each entry is summarized for you in the following analysis of variance table:

Source of VariationDFSSMSF
Regression 1 \(SSR=\sum_{i=1}^{n}(\hat{y}_i-\bar{y})^2\) \(MSR=\dfrac{SSR}{1}\) \(F^*=\dfrac{MSR}{MSE}\)
Residual error n-2 \(SSE=\sum_{i=1}^{n}(y_i-\hat{y}_i)^2\) \(MSE=\dfrac{SSE}{n-2}\)  
Total n-1 \(SSTO=\sum_{i=1}^{n}(y_i-\bar{y})^2\)    

However, we will always let Minitab do the dirty work of calculating the values for us. Why is the ratio MSR/MSE labeled F* in the analysis of variance table? That's because the ratio is known to follow an F distribution with 1 numerator degree of freedom and n-2 denominator degrees of freedom. For this reason, it is often referred to as the analysis of variance F-test. The following section summarizes the formal F-test.

The formal F-test for the slope parameter \(\beta_{1}\)

The null hypothesis is \(H_{0} \colon \beta_{1} = 0\).

The alternative hypothesis is \(H_{A} \colon \beta_{1} ≠ 0\).

The test statistic is \(F^*=\dfrac{MSR}{MSE}\).

As always, the P-value is obtained by answering the question: "What is the probability that we’d get an F* statistic as large as we did if the null hypothesis is true?"

The P-value is determined by comparing F* to an F distribution with 1 numerator degree of freedom and n-2 denominator degrees of freedom.

In reality, we are going to let Minitab calculate the F* statistic and the P-value for us. Let's try it out on a new example!

What is SSE in multiple regression?

The difference between SST and SSR is remaining unexplained variability of Y after adopting the regression model, which is called as sum of squares of errors (SSE).

Which equation describes the multiple regression model?

The multiple regression equation explained above takes the following form: y = b1x1 + b2x2 + … + bnxn + c. Here, bi's (i=1,2…n) are the regression coefficients, which represent the value at which the criterion variable changes when the predictor variable changes.

What is the purpose of multiple regression?

Multiple regression is a statistical technique that can be used to analyze the relationship between a single dependent variable and several independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the value of the single dependent value.

What is one of the problems related to Multicollinearity in multiple regressions?

What Problems Do Multicollinearity Cause? Multicollinearity causes the following two basic types of problems: The coefficient estimates can swing wildly based on which other independent variables are in the model. The coefficients become very sensitive to small changes in the model.

Toplist

Neuester Beitrag

Stichworte