# Introduction to Linear Regression Using Spreadsheets with Real Estate Data

ModelingEast 2024posted by ODSC Community March 4, 2024

Editor’s note: Roberto Reif is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Introduction...

Editor’s note: Roberto Reif is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Introduction to Linear Regression using Spreadsheets with Real Estate Data,” there!

In the real estate industry, predicting property sales prices presents a significant challenge due to several influencing factors.  These factors include square footage, number of bathrooms, year the property was built, year of sale, quality of the schools in the area, crime rate, the recentness of the kitchen and bathroom renovations, and more.

#### In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

A method for estimating the sales price for a property is using linear regression models.  These models predict a continuous variable (e.g. price of sale) based on one or more independent variables (e.g. square feet).  The equation for a linear regression model that uses only one variable, known as a simple linear regression model, is expressed by:

Price=m×SquareFeet+b

Here, the m denotes the slope for the independent variable (square feet) and b represents the intercept.  Building such a model involves collecting historical data of sold properties, determining optimal values for m and b that minimize the price prediction errors, and then applying the model to predict the sales price of new properties.

Below is a table that shows a sample dataset displaying various property variables alongside their respective sale prices.  Each row represents a property that has been sold.

TABLE 1: Example of a dataset containing several properties with the year built, the number of square feet, the number of baths, the property type, and the sales price.

In this blog the focus is on a simple linear regression model, using only the square footage, as depicted in Figure 1.  Each blue dot represents a property from our dataset (a row in Table 1), displaying a general trend where higher square footage correlates with higher sale prices. The reason the dots do not follow a perfect straight line is because we have not considered other variables in the model such as the number of bathrooms or year built.

FIGURE 1: Price of sale as a function of the square footage

In Figure 1 we have added a straight line to indicate the line of best fit to the data.  The equation is presented in the graph and should be read as the price of a property is approximated by \$458 times the square footage of the property plus an offset of \$143,427.   In simple terms, each additional square foot adds \$458 to the price of the property.  Also, a property with zero square feet will cost \$143,427.  Although this last statement does not make practical sense, it helps the model make better predictions.  For instance, using the model, a 2,000 square feet property would be predicted to sell for:

Price=\$458×2,000+\$143,427=\$1,059,427

To determine the value for the slope of \$458 and the intercept of \$143,427 various methods such as the least-squares method or gradient descent can be employed.  Spreadsheet tools such as Excel and Google Sheets have implemented functions such as LINEST for this purpose.  The LINEST function receives two parameters as input, the known values of y (sales price) and the known values of x (square feet).  It is important to note that there are other parameters in x that can be included such as number of bathrooms, year built, etc.  For this example, the expression in the spreadsheet tool would be written like this:

=LINEST(PRICE,SQFT)

The output of this expression is shown below.

FIGURE 2: Results from the LINEST Function displaying the slope and intercept

To learn more about linear regression and its application in real estate data, come check out this Workshop at ODSC East 2024. In this hands-on workshop, you will work with spreadsheet tools (Excel or Google Sheets), to build simple and multiple linear regression models using real estate data and make predictions that minimize the determined price error.  For more courses in data science and data analytics, check out ScholarU.

#### In-Person Data Engineering Conference

April 23rd to 24th, 2024 – Boston, MA

At our second annual Data Engineering Summit, Ai+ and ODSC are partnering to bring together the leading experts in data engineering and thousands of practitioners to explore different strategies for making data actionable.