Linear Regression: Rising Tuition Costs

Introduction

This project is investigating the average cost of college by year. The data used in this project is publicly available here.

Libraries

This linear regression was created using the ordinary least squares (OLS) method and the StatsModels model package.

Import Data

This code is importing the data and displaying the first five rows of the dataset.

Simple Linear Regression

The simple linear regression coefficents are estimated using the least squares criteria. Meaning, we are finding the line that minimizes the sum of squared residuals, or sum of the squared errors. This code estimates the model coeffcients for the college tution data.

The R-squared shown above is possibly the most important measurement produced in the summary table. R-squared indicates that our model explains 57% of the change in our 'Tuition' variable.

The Prob (F-Statistic) uses tells the accuracy of the null hypothesis, meaning the likelihood that our variable year's effect on tuition is 0. For our model, it is telling us 0.0000000000233% chance of this.

Model Visualization

This code creates a basic scatterplot of our data then adds our linear regression line to the plot.