# Linear Model

The scientific method is frequently used as a guided approach to learning. Linear statistical methods are widely used as part of this learning process.

Linear models describe a continuous response variable as a function of one or more predictor variables. They can help you understand and predict the behavior of complex systems or analyze experimental, financial, and biological data. Linear regression is a statistical method used to create a linear model.

#### Mathematical Modelling:

Let Y" align="absmiddle" /> be the dependent variable of dimension  be the independent variables,  and  are unknown parameters, the the linear model can be written as:

Y_{n \times 1}=X_{n \times p} \theta_{p \times 1} + \epsilon_{n \times 1}" align="absmiddle" />

where  and  {D(.) implying dispersion}

#### Usage:

• Prediction: Estimates of the individual parameters are of less importance for prediction than the overall influence of the x variables on y. However, good estimates are needed to achieve good prediction performance.
• Data Description or Explanation: The scientist or engineer uses the estimated model to summarize or describe the observed data.
• Parameter Estimation: The values of the estimated parameters may have theoretical implications for a postulated model.
• Variable Selection or Screening: The emphasis is on determining the importance of each predictor variable in modeling the variation in y" align="absmiddle" />. The predictors that are associated with an important amount of variation in y" align="absmiddle" /> are retained; those that contribute little are deleted.
• Control of Output: A cause-and-effect relationship between y" align="absmiddle" /> and the x variables is assumed. The estimated model might then be used to control the output of a process by varying the inputs. By systematic experimentation, it may be possible to achieve the optimal output.

#### Classification:

Linear Model can mainly be classified in 3 types:

• Simple linear regression: models using only one predictor
• Multiple linear regression: models using multiple predictors
• Multivariate linear regression: models for multiple response variables

#### Parameter Estimation:

We apply method of Least Squares to estimate the parameter \theta" align="absmiddle" />, which involves the minimization of the error sum of squares L, given by:

y-X\theta)'(y-X\theta)=\sum_{i=1}^n(y_i-\sum x_{ij}\theta_j)^2" align="absmiddle" />

Differentiating L w.r.t.  and equating the derivative to 0, we obtain the following set of linear equations, also called the Normal Equations:

\hat{\theta}=X'y" align="absmiddle" />

where \hat{\theta}" align="absmiddle" /> is an estimator of \theta" align="absmiddle" />, referred to as the least square estimate.

Predicted values are , where , is the hat matrix, which is idempotent, i.e H’H=I.

Exercise: Check that the normal equations are consistent, (i.e. admits a solutions) whatever be the rank of X.

(Hint: y \in C(X') \Rightarrow X'y \in C(X'X)" align="absmiddle" /> where C(X) means the column space of X)

Now, suppose, X \sim N(0,I_n)" align="absmiddle" />. Then  iff A is idempotent, where k=rank(A).

Now, we may compute the mle (Maximum Likelihood Estimator) of  and . After some trivial calculations, we arrive at the following estimates:

assuming rank of X is p. (Otherwise, we can use the Generalized Inverse, but let’s not go into that in this post.)

Now, we can comment on the distributions of the estimates obtained.

#### R codes

Here are some useful codes for R software:

Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)           #fit data
summary(fit)                                                            # show results

Other useful functions
coefficients(fit)                                                        # model coefficients
confint(fit, level=0.95)                                          # Confidence Intervals for model parameters
fitted(fit)                                                                  # predicted values
residuals(fit)                                                           # residuals
anova(fit)                                                                # anova table
vcov(fit)                                                                   # covariance matrix for model parameters
influence(fit)                                                          # regression diagnostics

No topic of Statistics is fully understood till it is applied on some real data. So one reading this should try to apply the given method to a real dataset for complete comprehension. You can use R or Matlab or Python, whichever suits you better.

You may find datasets in the