The Pearson product moment correlation examines the relationship between what type of variables

  • Entry
  • Reader's guide
  • Entries A-Z
  • Subject index

Correlation, Pearson

The Pearson correlation coefficient (also known as Pearson product-moment correlation coefficient) r is a measure to determine the relationship (instead of difference) between two quantitative variables (interval/ratio) and the degree to which the two variables coincide with one another—that is, the extent to which two variables are linearly related: changes in one variable correspond to changes in another variable. In fact, a variety of different correlation coefficients (such as phi correlation coefficient, point-biserial correlation, Spearman’s rho, partial correlation, and part correlation) have been developed over the years for measuring relationships between sets of data, and the Pearson correlation coefficient (also referred to Pearson’s r) is the most common measure of correlation and has been widely used in the sciences as a measure of ...

The Pearson product moment correlation examines the relationship between what type of variables

locked icon

Sign in to access this content

Sign in

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life

  • Read modern, diverse business cases

  • Explore hundreds of books and reference titles

sign up today!

The Pearson product moment correlation examines the relationship between what type of variables

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License. Please cite as follow: Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project SOGA: Statistics and Geospatial Data Analysis. Department of Earth Sciences, Freie Universitaet Berlin.

Adapted From: Introductory Statistics: Concepts, Models, and Applications
David W. Stockburger

Comments in italics are from Linda Woolf


DEFINITION

The Pearson Product-Moment Correlation Coefficient (r), or correlation coefficient for short is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect.

The correlation coefficient may take on any value between plus and minus one.

The Pearson product moment correlation examines the relationship between what type of variables

The sign of the correlation coefficient (+ , -) defines the direction of the relationship, either positive or negative. A positive correlation coefficient means that as the value of one variable increases, the value of the other variable increases; as one decreases the other decreases. A negative correlation coefficient indicates that as one variable increases, the other decreases, and vice-versa.

Remember that correlation does not mean causation. One can not draw cause and effect conclusions based on correlation.

There are two reasons why we can not make causal statements:

  1. We don't know the direction of the cause - Does X cause Y or does Y cause X?
  2. A third variable "Z" may be involved that is responsible for the covariance between X and Y

Taking the absolute value of the correlation coefficient measures the strength of the relationship. A correlation coefficient of r=.50 indicates a stronger degree of linear relationship than one of r=.40. Likewise a correlation coefficient of r=-.50 shows a greater degree of relationship than one of r=.40. Thus a correlation coefficient of zero (r=0.0) indicates the absence of a linear relationship and correlation coefficients of r=+1.0 and r=-1.0 indicate a perfect linear relationship.

UNDERSTANDING AND INTERPRETING THE CORRELATION COEFFICIENT

The correlation coefficient may be understood by various means, each of which will now be examined in turn.

Scatterplots

We can graph the data used in computing a correlation coefficient. Essentially, with the Pearson Product Moment Correlation, we are examining the relationship between two variables - X and Y. By plotting each data pair (you will have sets of scores for X and Y), you will have created a graph call a scatterplot or scatterdiagram. Thus, if you were graphing height and weight for all of the children in a first grade class with twenty students, you would place a single dot on the graph at the point where each student's height and weight intersect. For correlation, it does not make any difference which variable goes on the x-axis and which variable goes on the y-axis. However, for linear regression, the variable that is the predictor goes on the x-axis. The variable being predicted goes on the y-axis.

To evaluate the degree of relationship, one looks at both the slope of the line that would best fit through the data points as well as the degree of scatter from that same line. In correlation, we do not draw the line; in linear regression, we compute the position of the line. The more dispersed the data points, the lower the correlation. The closer all of the data points are to the line, in other words the less scatter, the higher the degree of correlation.

The scatterplots presented below perhaps best illustrate how the correlation coefficient changes as the linear relationship between the two variables is altered. When r=0.0 the points scatter widely about the plot, the majority falling roughly in the shape of a circle. As the linear relationship increases, the circle becomes more and more elliptical in shape until the limiting case is reached (r=1.00 or r=-1.00) and all the points fall on a straight line.

A number of scatterplots and their associated correlation coefficients are presented below in order that the student may better estimate the value of the correlation coefficient based on a scatterplot in the associated computer exercise.

When examining the scatterplots below, examine both the size (degree of the relationship) as well as sign (positive or negative correlation).


r = 1.00

The Pearson product moment correlation examines the relationship between what type of variables


r = -.54

The Pearson product moment correlation examines the relationship between what type of variables


r = .85

The Pearson product moment correlation examines the relationship between what type of variables


r = -.94

The Pearson product moment correlation examines the relationship between what type of variables


r = .42

The Pearson product moment correlation examines the relationship between what type of variables


r = -.33

The Pearson product moment correlation examines the relationship between what type of variables


r = .17

The Pearson product moment correlation examines the relationship between what type of variables


r = .39

The Pearson product moment correlation examines the relationship between what type of variables


More examples

The Pearson product moment correlation examines the relationship between what type of variables


Slope of the Regression Line of z-scores

The correlation coefficient is the slope (b) of the regression line when both the X and Y variables have been converted to z-scores. The larger the size of the correlation coefficient, the steeper the slope. This is related to the difference between the intuitive regression line and the actual regression line discussed above.

This interpretation of the correlation coefficient is perhaps best illustrated with an example involving numbers. The raw score values of the X and Y variables are presented in the first two columns of the following table. The second two columns are the X and Y columns transformed using the z-score transformation.

That is, the mean is subtracted from each raw score in the X and Y columns and then the result is divided by the sample standard deviation. The table appears as follows:

X

Y

zX

zY

 

12

33

-1.07

-0.61

 

15

31

-0.07

-1.38

 

19

35

-0.20

0.15

 

25

37

0.55

.92

 

32

37

1.42

.92

         
 

20.60

34.60

0.0

0.0

The Pearson product moment correlation examines the relationship between what type of variables
=

8.02

2.61

1.0

1.0

There are two points to be made with the above numbers: (1) the correlation coefficient is invariant under a linear transformation of either X and/or Y, and (2) the slope of the regression line when both X and Y have been transformed to z-scores is the correlation coefficient.

Computing the correlation coefficient first with the raw scores X and Y yields r=0.85. Next computing the correlation coefficient with zX and zY yields the same value, r=0.85. Since the z-score transformation is a special case of a linear transformation (X' = a + bX), it may be proven that the correlation coefficient is invariant (doesn't change) under a linear transformation of either X and/or Y. The reader may verify this by computing the correlation coefficient using X and zY or Y and zX. What this means essentially is that changing the scale of either the X or the Y variable will not change the size of the correlation coefficient, as long as the transformation conforms to the requirements of a linear transformation.

Variance Interpretation

The squared correlation coefficient (r2) is the proportion of variance in Y that can be accounted for by knowing X. Conversely, it is the proportion of variance in X that can be accounted for by knowing Y.

The squared correlation coefficient is also known as the coefficient of determination. It is one of the best means for evaluating the strength of a relationship. For example, we know that the correlation between height and weight is approximately r=.70 If we square this number to find the coefficient of determination - r-squared=.49 Thus, 49 percent of one's weight is directly accounted for one's height and vice versa.

Back to Statistics Main Page

What kind of variables are needed for a Pearson's correlation?

You should use the Pearson correlation coefficient when (1) the relationship is linear and (2) both variables are quantitative and (3) normally distributed and (4) have no outliers.

What does the Pearson product

Pearson's Product Moment Correlation Coefficient measures the degree of correlation there may be between two variables. It is best used when results have already been plotted on a scatter graph and there is an indication of a linear relationship between the two factors.

What type of relationships does a Pearson's product moment of correlation assess?

The Pearson product-moment correlation will only test for the linear relationship (positive or negative) between the two variables instead of curvilinear or zero relationships.

What is Pearson correlation analysis used for?

Pearson's correlation is used when you want to see if their is a linear relationship between two quantitative variables. The research hypothesis is just that, expecting to find a linear relationship between those variables.