# Machine Learning – Regression

Date: August, 2017, Location: Quito, Pichincha, Ecuador.

Framework

Actividad WBS (Linear Regression – )

Este material es una copia textual de (This material is a textual copy of):

### Linear Regression

So the first subject that we’re going to be discussing in this course is regression. And particularly, in this lecture, we will be discussing something called linear regression.

So this is the most basic form of regression. And to introduce it, I’ll use a very simple problem, the problem of Old Faithful. So with this problem, what we have is a geyser that’s erupting for a certain amount of time. And then not erupting for a certain amount of time.

And what we want to do is we want to come up with a way to predict when is the next eruption going to be. And so one way that we might do this is to collect some data. So collect pairs of inputs and outputs. Where the input might be how long the geyser erupted for in minutes. And then the output would be after that eruption, how long we had to wait for the next eruption in minutes.

And so what linear regression is going to do is it’s going to take the input. In this case, how long the geyser erupted for. And try to predict the output. How long it’s going to now be silent until the next eruption. So let’s look at the data, at what we have and how we might wanna model this. So here, we clearly see a trend. So here, we clearly see a trend. In the x axis, we have how long the geyser erupted for. And then in the y axis, we have how long we have to wait for the next eruption.

So now the question is can we meaningfully predict the time between eruptions using only the duration of the last eruption?

So by looking at this data, we might want to come up with some sort of a simplifying function. Where you simply have to take the input, which is how long the geyser erupted for. Perform a function on that. And then the output will be a prediction of how long we think we’re going to have to wait for the next eruption. And by looking at this, the most natural choice is to model this as a linear function. So this is the most simple regression setting where we want to do linear regression. Where we have an input that we believe is related according to some linear function to the output. The model that we’re going to consider is called linear regression where we have the output which is the waiting time is a function where we have some constant w naught times the last duration of the eruption, how long it erupted for, times a wait w1. And so w naught and w1 are two values that we’re going to now learn from the data. So this is an example of linear regression in this case. Because we are simply learning a line to fit the data.

So how would this translate to higher dimensions, just to give some intuitions? For example, imagine that we have two inputs. And we wanna predict an output based on those two inputs. In this case, we would use the function, also a linear function. Where we say that the output is approximately equal to some offset or bias plus the first input times a wait on the first input plus the second input times a wait on the second input. And so here in this plot, you can see this. Where we have the inputs that are two dimensional. And the output again is one dimensional. And instead of a line in this case, we’re learning a plane through the data. So let’s look at a more basic definition of the regression problem. So what we have as data are inputs. We’ll call those x that are d dimensional in rd. These inputs have many names. We can call them measurements, covariates, features, independent variables. And I’ll probably switch back and forth between these different names throughout the lectures. And the outputs are the corresponding response or dependent variable y, which is a real value. So this is the data that we’re dealing with. We have inputs, and those can be in rd, and outputs that are real valued. Real valued numbers that we wanna predict. And the goal of the regression problem is to define a function f that maps an input to an output. So the function f takes x as an input, that’s an rd. And maps that to a value in r, which is in the output space. Such that the output can be reasonably assumed to be approximately equal to the mapping of the input through the function. Along with some parameters or free variables of the model w…And the goal is now to learn those parameters’ given data.

Okay, so let’s look at the simplest linear regression model and a way that we can learn it through least squares. So the model again takes an input xi. Passes it to a function with parameters’ w. And predicts an output yi associated with xi according to this function where we have the waits, w naught. … And then we have a dot product or a element-wise product of the wait wj with the jth dimension of input xi. So xi is an rd. We take the jth dimension of xi. Multiply it by a wait wj. Do this for each dimension and sum those values up. And add the bias to it. And predict the output to be approximately equal to this linear function. So we’ve defined the model. And we now collect some training data. Meaning we collect pairs of instances of inputs and outputs that we know through measuring them or obtaining them in some way. So we have n pairs, x1 y1, where y1 is the corresponding response for input x1, through xn yn. And now the goal, and this is pervasive throughout machine learning, is to use this data to learn the vector w. Such that we can make predictions or we can approximate outputs according to that function. So what does it mean to find the vector w? How can we find these values for w that give us this prediction according to this linear function? So in order to know what a good value for the vector w is, we need to define an objective function. And what this does is it basically tells us what are good values for the vector w and what are bad values.

So for least squares, the objective function is the most straightforward one you could think of. It’s the sum of the squared errors. So here we have the output yi from our training dataset. And we subtract our prediction of what yi is according to our linear regression model. That’s the error of our prediction. We then square that value so it’s always a positive number. And then sum up those values to get the total sum of squared errors of our model. Written by: Larry Francis Obando – Technical Specialist

Escuela de Ingeniería Eléctrica de la Universidad Central de Venezuela, Caracas.

Escuela de Ingeniería Electrónica de la Universidad Simón Bolívar, Valle de Sartenejas.

Escuela de Turismo de la Universidad Simón Bolívar, Núcleo Litoral.