Submitted By Funtic

Words 1818

Pages 8

Words 1818

Pages 8

Multi-Variate Modelling including Lagged variables and Dummy Variables

2

Topics for Today

• Multi-variate relationships • Correlation matrices • Doing a multiple regression in Excel • Multi-collinearity • Lagged variables • Dummy variables

▫ For modelling qualitative data ▫ For modelling seasonality

3

Multi-Variate Relationships

• So far we have only looked at Time Series. These are where:

. . . . one dependent variable, eg: sales, temperature . . . . varies with time

• We have identified no underlying drivers of the relationship • We just made forecasts one or more periods ahead • These are commonly used business models . . . . but the business world is not that simple:

▫ The variables we need to forecast do not just depend on time ▫ Multi-variate models are required ▫ We can then identify the ‘levers’ to pull to ‘drive’ our variable

4

An Example

In previous years this was a double module

• Attendance at tutorials varied as the year progressed • Time is one factor but other factors could be:

▫ ▫ ▫ ▫ ▫ ▫ Students’ perception that the tutorial will help them pass Weather conditions: eg temperature on morning of tutorial Time of day for the tutorial (9am tutorials are not popular) Students dropping out of the module or the university Volume of background reading in the recommended texts Assignment marks achieved – low marks produce attendance

5

An Example

How suitable are these other factors?

Suitable

• • • • Tutorial week: this is equivalent to the passage of time Students’ perception that the tutorial will help them pass Weather conditions: seems likely to have an impact Amount of reading required . . . . . . . . large amounts may increase or decrease attendance

Not Suitable

• Time of day: tutorials are the same time each week • Students dropping out is…...

...Regression Models Student Name Grantham University BA/520 – Quantitative Analysis Instructor Name April 6, 2013 Abstract This paper will refer to regression models and the benefits that variables provide when developing and examining such models. Also, it will discuss the reason why scatter diagrams are used and will describe the simple linear regression model and will refer to multiple regression analysis as well as the potential uses for this type of model. Regression Models Regression models are a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. Inference based on such models is known as regression analysis. The main purpose of regression analysis is to predict the value of a dependent or response variable based on values of the independent or explanatory variables. According to Render, Stair, and Hanna (2011) they are two reasons for which regression analyses are used: one is to understand the relation between various variables and the second is to predict the variable's value based on the value of the other. Variables provide many advantages when creating models. One of the......

Words: 1282 - Pages: 6

...Regression Analysis: Basic Concepts Allin Cottrell∗ 1 The simple linear model Suppose we reckon that some variable of interest, y, is ‘driven by’ some other variable x. We then call y the dependent variable and x the independent variable. In addition, suppose that the relationship between y and x is basically linear, but is inexact: besides its determination by x, y has a random component, u, which we call the ‘disturbance’ or ‘error’. Let i index the observations on the data pairs (x, y). The simple linear model formalizes the ideas just stated: yi = β0 + β1 xi + ui The parameters β0 and β1 represent the y-intercept and the slope of the relationship, respectively. In order to work with this model we need to make some assumptions about the behavior of the error term. For now we’ll assume three things: E(ui ) = 0 2 2 E(ui ) = σu E(ui u j ) = 0, i = j u has a mean of zero for all i it has the same variance for all i no correlation across observations We’ll see later how to check whether these assumptions are met, and also what resources we have for dealing with a situation where they’re not met. We have just made a bunch of assumptions about what is ‘really going on’ between y and x, but we’d like to put numbers on the parameters βo and β1 . Well, suppose we’re able to gather a sample of data on x and y. The task ˆ of estimation is then to come up with coefﬁcients—numbers that we can calculate from the data, call them β0 and ˆ1 —which serve as estimates of the unknown......

Words: 1464 - Pages: 6

...Multi-regression Analysis Summer 2013 EC315: Quantitative Research Methods Professor Scott Sowder Introduction One day I was sitting in class with my classmates. Our GPA, the number of classes were are taking, ages, IQ and the amount of time we spend studying were all different. I became curious and wanted to know what effect the different variables had on the student’s GPA, if any. So I decided to a survey of 30 students with varies GPAs, IQs, ages, number of classes being taken and the time they spend studying. My hypothesis statement is that all the independent variables will have the same or no effect on the dependent variable. The alternate statement is that at least one of the independent variables will have an effect on the dependent variable. I have applied a 95% confidence level, which means I am 95% sure that I will be able to show that at least one of the independent variables will have an effect on the GPA. Variable Identification My dependent variable is the students’ GPA. I chose GPA as my dependent variable because it relies on the other variables. The remaining variables are my independent variables. I chose them because they could all have an effect on a student’s GPA. Student | GPA (4.0) | # Classes | Age | IQ | Study time | 1 | 3.2 | 4 | 29 | 119 | 12 | 2 | 3.1 | 2 | 31 | 118 | 8 | 3 | 3.7 | 1 | 28 | 135 | 6 | 4 | 3.5 | 3 | 22 | 129 | 13 | 5 | 2.8 | 4 | 22 | 110 | 15 | 6 | 3.0 | 3 | 24 | 115 | 15 | 7 | 3.8 | 2 | 24 |...

Words: 1525 - Pages: 7

...Q1: All the regressions were performed. Output can be made available if needed. See outputs for Q2 in appendix. Q2: Select the model you are going to keep for each brand and explain WHY. Report the corresponding output in an appendix attached to your report (hence, 1 output per brand) We use Adjusted R Squared to compare the Linear or Semilog Regression. R^2 is a statistic that will give some information about the goodness of fit of a model. In regression, the Adjusted R^2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1 indicates that the regression line perfectly fits the data. Brand1: Linear Regression R^2 | 0.594 | SemiLog Regression R^2 | 0.563 | We use the Linear Regression Model since R-squared is higher. Brand 2: Linear Regression R^2 | 0.758 | SemiLog Regression R^2 | 0.588 | We use the Linear Regression Model since R-squared is higher Brand 3: Linear Regression R^2 | 0.352 | SemiLog Regression R^2 | 0.571 | We use the Semilog Regression Model since R-squared is higher Brand 4: Linear Regression R^2 | 0.864 | SemiLog Regression R^2 | 0.603 | We use the Linear Regression Model since R-squared is higher Q3: Here we compute the cross-price elasticity. Depending on whether we use linear or semi-log model, Linear Model Linear Model Semi-Log Model Semi-Log Model ` ...

Words: 609 - Pages: 3

...relationships between the variables. The relationships can either be negative or positive. This is told by whether the graph increases or decreases. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.069642247 R Square 0.004850043 Adjusted R Square -0.00471871 Standard Error 0.893876875 Observations 106 ANOVA df SS MS F Significance F Regression 1 0.404991362 0.404991 0.50686 0.478094147 Residual 104 83.09765015 0.799016 Total 105 83.50264151 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 5.506191723 0.363736853 15.13784 4.8E-28 4.784887893 6.2274956 4.7848879 6.22749555 Benefits -0.05716561 0.080295211 -0.711943 0.47809 -0.21639402 0.1020628 -0.216394 0.10206281 Y=5.5062+-0.0572x Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.161906 R Square 0.026214 Adjusted R Square 0.01685 Standard Error 1.001305 Observations 106 ANOVA df SS MS F Significance F Regression 1 2.806919 2.806919 2.799606 0.097293 Residual 104 104.2717 1.002612 Total 105 107.0786 Coefficients Standard Error t Stat P-value Lower 95% Upper......

Words: 653 - Pages: 3

...STATISTICS FOR ENGINEERS (EQT 373) TUTORIAL CHAPTER 3 – INTRODUCTORY LINEAR REGRESSION 1) Given 5 observations for two variables, x and y. | 3 | 12 | 6 | 20 | 14 | | 55 | 40 | 55 | 10 | 15 | a. Develop a scatter diagram for these data. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Develop the estimated regression equation by computing the values and. d. Use the estimated regression equation to predict the value of y when x=10. e. Compute the coefficient of determination. Comment on the goodness of fit. f. Compute the sample correlation coefficient (r) and explain the result. 2) The Tenaga Elektik MN Company is studying the relationship between kilowatt-hours (thousands) used and the number of room in a private single-family residence. A random sample of 10 homes yielded the following. Number of rooms | Kilowatt-Hours (thousands) | 12 9 14 6 10 8 10 10 5 7 | 9 7 10 5 8 6 8 10 4 7 | a. Identify the independent and dependent variable. b. Compute the coefficient of correlation and explain. c. Compute the coefficient of determination and explain. d. Test whether there is a positive correlation between both variables. Use α=0.05. e. Determine the regression equation (used Least Square method) f. Determine the value of kilowatt-hours used if number of rooms is 11. g. Can you use the model in (f.) to predict the kilowatt-hours if number of......

Words: 1184 - Pages: 5

...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per......

Words: 1561 - Pages: 7

...Acts 430 Regression Analysis In this project, we are required to forecast number of houses sold in the United States by creating a regression analysis using the SAS program. We initially find out the dependent variable which known as HSN1F. 30-yr conventional Mortgage rate, real import of good and money stock, these three different kinds of data we considered as independent variables, which can be seen as the factors will impact the market of house sold in USA. Intuitively, we thought 30-yr conventional mortgage rate is a significant factor that will influences our behavior in house sold market, which has a negative relation with number of house sold. When mortgage rate increases, which means people are paying relatively more to buy a house, which will leads to a decrease tendency in house sold market. By contrast, a lower interest rate would impulse the market. We believe that real import good and service is another factor that will causes up and down in house sold market. When a large amount of goods and services imported by a country, that means we give out a lot of money to other country. In other words, people have less money, the sales of houses decreased. Otherwise, less import of goods and services indicates an increase tendency in house sold market. We can see it also has a negative relationship with the number of house sold. Lastly, we have money stock as our third impact factor of house sold. We considered it has a positive relationship with the number of...

Words: 723 - Pages: 3

...Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Local Government Engineering Department (LGED) is a public sector organization under the ministry of Local Government, Rural Development & Cooperatives. The prime mandate of LGED is to plan, develop and maintain local level rural, urban and small scale water resources infrastructure throughout the country. Here, I considered LGED as the organization and considering a projects eight districts “available fund” as Independent variable and “development (length of development of road in km)” as dependent variable. The value of the variables are- Districts Fund, X (lakh tk) Development,Y (km) Panchagar 450 10 Thakurgaon 310 6.8 Dinajpur 1500 32 Nilphamari 1160 24.5 Rangpur 1450 31 Kurigram 450 9 Lalmonirhat 950 16 Gaibandha 1550 33 For the two variables “available fund” and “development”, the regression equation can be given as: Y= a + bX Where, Y = Development X = Fund b = rate of change of development a...

Words: 365 - Pages: 2

...Introduction Simple linear regression is a model with a single regressor x that has a relationship with a response y that is a straight line. This simple linear regression model is y = β0 + β1x + ε where the intercept β0 and the slope β1 are unknown constants and ε is a random error component. Testing Significance of Regression: H0: β1 = 0, H1 : β1 ≠ 0 The hypotheses relate to the significance of regression. Failing to reject H0: β1 = 0 implies that there is no linear relationship between x and y. On the other hand, if H0: β1 = 0 is rejected, it implies that x is of value in explaining the variability in y. The following equation is the Fundamental analysis-of-variance identity for a regression model. SST = SSR + SSRes Analysis of variance (ANOVA) is a collection of statistical models used in order to analyze the differences between group means and their associated procedures (such as "variation" among and between groups), developed by R. A. Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. P value or calculated probability is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true. VIF (the variance inflation factor) for each term in the model measures the combined effect of the dependences among the regressors on the variance of the term. Practical experience indicates that if any of...

Words: 483 - Pages: 2

...A) Estimated regression equation – First Order: y = β0 + β1x1 + β2x2 + ε Output of 1st Model | | | | | | | | | | | | | | Regression Statistics | | | | | | Multiple R | 0.763064634 | | | | | | R Square | 0.582267636 | SSR/SST | | ̂̂̂ | | | Adjusted R Square | 0.512645575 | | | | | | Standard Error | 547.737482 | | | | | | Observations | 15 | | | | | | | | | | | | | ANOVA | | | | | | | | df | SS | MS | F | Significance F | | Regression | 2 | 5018231.543 | 2509115.772 | 8.363263464 | 0.005313599 | | Residual | 12 | 3600196.19 | 300016.3492 | | | | Total | 14 | 8618427.733 | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Intercept | -20.35201243 | 652.7453202 | -0.031179101 | 0.975639286 | -1442.561891 | 1401.857866 | Age (x1) | 13.35044655 | 7.671676501 | 1.740225432 | 0.107375657 | -3.364700634 | 30.06559374 | Hours (x2) | 243.7144645 | 63.51173661 | 3.837313819 | 0.002363965 | 105.334278 | 382.0946511 | B) equation | ŷ= -20.3520124320994 + 13.3504465516772 x̂1 + 243.714464532425 x̂2 | C) Interpretation of β β̂1 = 13.35044655, If number of hours worked (x2) held fixed, we can estimate that every one-year increase in age (x1) the mean of annual earnings will increase by 13.35044655. β̂2 = 243.7144645, If age (X1) held fixed, we can estimate that every one hour (x2) of work increase, the mean of......

Words: 714 - Pages: 3

...Project Title: A STATISTICAL ANAYLYSIS OF NBA PLAYER SALARIES USING A MULTIPLE REGRESSION. ABSTRACT Basketball is one of the most popular sports in the world and National Basketball Association (NBA) is the most popular basketball league in the world. The NBA league is based on the United States of America and it consists of 30 teams. The NBA is so popular that the NBA finals are the 2nd most watched televised event in the U.S. after the NFL (National Football League) Super Bowl. Sometimes when we think about NBA players and the enormous amount of money they are making, we become a little jealous. It is well known about how some star players make so much money or are over-paid and yet can hardly form a sentence. The greatest challenge for the board of NBA has been how to harmonize the salaries. Due to this various people have tried to come up with different solutions .Some argue that height ,weight and physical strength play a big role in team winning but this is not the case as some players who are short help their teams win in several occasions. To solve this problem a multiple regression analysis will be utilized to analyze the salary data. A relationship will be established between the salary and performance variables. The other challenge will be choosing the model parameters that will be significant in order to be included in the model that will be developed. This can be solved by arranging the factors affecting an NBA player salary in a decreasing order of......

Words: 1819 - Pages: 8

...A linear regression is worth a maximum of 5 points; a multiple regression is worth a maximum of 10 points and both are worth 13 points. You must analyze the regression(s) that you do. The better job that you do, for instance, checking the residuals and for multicollinearity, the more points you get. If you choose to do a linear regression then you must compare the list price to the square footage. You are to project out the listing price of a home that has 2750 square feet. If you choose to do the multiple regression then you must compare the list price to all the varables in the data file, square footage, number of bedroom, number of bathrooms, number of cars in the garage, and the age. You must project out the cost of a house with 2650 square feet; that is 10 years old, with 3 bed and 3 bathrooms with a garage that sleeps 3 cars. Linear Regression Scatter Plot StatTools Report | | | | | Analysis: | Scatterplot | | | | | Performed By: | | | | | | Date: | Tuesday, January 24, 2012 | | | | Updating: | Live | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |...

Words: 692 - Pages: 3

...Significance of Regression Analysis In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted......

Words: 784 - Pages: 4

...------------------------------------------------- REYEM AFFAIR Regression Case Quantitative Methods II To ------------------------------------------------- Prof. Arnab Basu On October 21, 2011 By GROUP NO. 5 Bharati vishal (11110) akshay ram (11110) dhanashree vinayak shirodkar (11110) amol devnath kumbhare (11110) ajusal sugathan (11110) arun prabu (11110) ghule nilesh vishnu (11110) mudavath swetha (11110) Raja Simon J (1111052) sagar behera (11110) shreya sethi (11110) swati murarka (11110) Indian Institute Of Management, Bangalore Table of Contents S.No | Particulars | Pages | 1. | Executive Summary | 3-4 | 2. | Understanding of the Problem | 4 | 3. | Model Description | 5-13 | | Model 1Prediction interval Vs Confidence IntervalStep wise Regression: A closer lookTest of Model: Analysis of Results | 5-8 | | | 6 | | | 7 | | | 8 | | Model 2Test of Model: Analysis of Results | 9-13 | | | 11-13 | | Other Models | 13 | 4. | Conclusions and Recommendations | 14 | 5. | Appendix 1. Variables Entered/Removed 2. Model Summary 3. ANOVA 4. Coefficients 5. Residual Statistics | 15 | Executive Summary Reyem Affiar has recently found the below described condominium in Mid-Cambridge that he wants to purchase. Street Address : 236 Ellery Street Last Price : $169000 Area & Area Code : M/9 Bed : 2 Bath : 1 Rooms : 5 Interior : 1040 Condo : $175 Tax : $1121 RC :......

Words: 8503 - Pages: 35