How many data points are necessary in order to use linear regression in predictive analytics?

$\begingroup$

What would be a "reasonable" minimal number of observations to look for a trend over time with a linear regression? what about fitting a quadratic model?

I work with composite indices of inequality in health (SII,RII), and have only 4 waves of the survey, so 4 points (1997,2001,2004,2008).

I am not statistician, but I have the intuitive impression 4 points are not sufficient. Do you have an answer, and/or references ?

Thanks a lot,

Françoise

Peter Flom

95.2k35 gold badges144 silver badges280 bronze badges

asked Sep 23, 2012 at 12:02

$\endgroup$

2

$\begingroup$

Peters rule of thumb of 10 per covariate is a reasonable rule. A straight line can be fit perfectly with any two points regardless of the amount of noise in the response values and a quadratic can be fit perfectly with just 3 points. So clearly in almost any circumstance it would be proper to say that 4 points is insufficient. However, like most rules of thumb it does not cover every situation. Cases where the noise term in the model has a large variance will require more samples than a similar case where the error variance is small.

The required number of sample points does depend on objects. If you are doing exploratory analysis just to see if one model (say linear in a covariate) looks better than another (say a quadratic function of the covariate) less than 10 points may be enough. But if you want very accurate estimates of the correlation and regression coefficients for the covariates you could need more than 10 per covariate. An accuracy of prediction criterion could require even more samples than accurate parameter estimates. Note that the variance of the estimates and prediction all involve the variance of the models error term.

answered Sep 23, 2012 at 12:47

$\endgroup$

6

Not the answer you're looking for? Browse other questions tagged regression or ask your own question.

How many data points do you need for linear regression?

The usual rule of thumb is 10 points for each independent variable. How are your indices measured? If they include estimates of variability, then two could be enough (using a t-test or its analog).

Can you do a regression with 3 data points?

Of course it can be appropriate.

Is linear regression used in predictive analytics?

Linear regression is the most commonly used method of predictive analysis. It uses linear relationships between a dependent variable (target) and one or more independent variables (predictors) to predict the future of the target.

How many variables are needed for regression analysis?

Linear regression can only be used when one has two continuous variables—an independent variable and a dependent variable. The independent variable is the parameter that is used to calculate the dependent variable or outcome. A multiple regression model extends to several explanatory variables.

Toplist

Neuester Beitrag

Stichworte