School of Mathematics and Physics

SOR Level 2 Modules

 

  • SOR2002 Statistical Inference (1st semester)

Pre-requisite: SOR1020 & SOR1021

Introduction
The course builds upon the probability theory in SOR1001 and the statistical inference methods in SOR1002 to provide a second-level account of the principles and most important methods of estimation and hypothesis testing.

An important element of this module will be a weekly practical data analysis lab using the SAS software package. SAS is probably the leading statistical package used in industry. The lab sessions, approximately 2 hours long, will give students an opportunity to put the theory of lectures into practice. During the lab sessions students will be guided through practical tutorials on reading, accessing and editing data in SAS, descriptive statistical analyses, cross tabulations, graphs, charts, frequency tables, χ2 tests, t-tests and analysis of variance tests. Some of the lab time will also be allocated for project work.

The practical work in the lab sessions culminates in a group project. During the semester the student will work as a team member on a specific statistical investigation using the methods described in lectures and performed in the SAS lab sessions. This provides a valuable insight into the applications of statistics in industry allowing students to gain the necessary skills required for such a working environment. Near completion of the group project, each group will be required to give an oral presentation (approximately 5 mins per student) on the results of their analysis. This provides an opportunity for the group to receive feedback on their progress and helps consolidate their findings before submitting a final written report on their investigation.

Contents

    • Statistical Investigations: Understanding the problem. Collecting the data. Initial data analysis. Definitive analysis: modelling. Conclusions.
    • Initial Data Analysis: Data structure. Processing data: coding, input, screening, editing and modifying. Data quality.
    • Preliminary analysis: measures of location, dispersion; tables; graphs.
    • Sample Diagnostics: Testing for independence - non-parametric tests, serial correlations. Testing for normality - skewness and kurtosis, goodness-of-fit tests, probability plotting. Identification of outliers. Transformations.
    • Point Estimation of Parameters: Definitions of estimate, estimator, sampling distribution. Unbiasedness. Relative efficiency. Bias. Mean squared error.
    • Sufficiency: Fisher-Neyman factorization theorem. Regular exponential class of distributions.
    • Maximum Likelihood: Likelihood function. Calculation of MLE. Log relative likelihood function. Asymptotic properties of MLE. Applications.
    • Least Squares Estimation and Linear Regression: Standard linear model: matrix notation. Properties of LSE. Weighted least squares. Fitting a straight line. Multiple regression. Goodness-of-fit: residuals. Hypothesis tests and confidence intervals using the t-distribution.
    • Experimental Design and Comparative Studies: Principles of design - experimental unit, treatment, replication, randomization; factorial design. Analysis of variance. Completely randomized design. Randomized block design. Dichotomous treatment/risk and outcome studies. Sampling schemes - cross-sectional; longitudinal-cohort, case-control study.
    • Measures of association: rates, relative risk, odds ratio.
    • Significance Tests and Hypothesis Testing: Neyman-Pearson approach - critical region, Type I and Type II errors, significance level, power function. Best critical region. Generalized likelihood ratio test.
    • Computer intensive methods: randomization tests; Monte Carlo sampling.
    • Confidence Intervals: Construction - pivotal quantity; MLE procedure. Confidence region. Prediction interval.
    • Bayesian Methods: Prior and posterior distributions. Conjugate families. Point estimates, confidence regions, hypothesis testing. Prediction. Improper and non-informative priors.

Assessment

Exam 70%   Report 20%   Presentation 10%

 

  • SOR2003 Methods of Operational Research (2nd semester)

Pre-requisite: SOR1020 & SOR1021

Introduction
This course applies mathematical analysis to a series of problems which occur in business and industry. The analysis can be more far reaching if we use a deterministic model but a degree of uncertainty (e.g. about future events) is often an important feature of the situation and a stochastic model has to be used. The statistical knowledge assumed is that contained in SOR1001. Although novel ways of setting out the work may be used in some topics, the mathematical techniques required on this course are no more advanced than simple calculus and algebra and most practical problems require only arithmetic and the use of tables.

The aim of the course is to teach a range of simple techniques illustrating the application of mathematics and probability theory to the problems of business and industry. Apart from the first two chapters each chapter is a distinct and separate topic. Some topics (e.g., Forecasting) involve lengthy calculation and students are taught how to use a spreadsheet for the computation. Students who do not have access to a spreadsheet on a personal computer can use the Open Access Areas. Specific instructions on the use of the Excel spreadsheet is given on the course and there is a practical session in an Open Access Area.

Homeworks are an essential part of the learning process, but there is no continuous assessment.

Emphasis is placed on choosing the correct model for the circumstances and on presenting answers in a form intelligible to management. If a question is posed in words then the final answer should be in words and not left in algebra or in a table. The practical problems associated with obtaining data are discussed. The answer should be to a number of significant figures consistent with the accuracy of the original data, or rounded to an integer if that is appropriate.

Contents

    • Deterministic inventory including quantity discounts, common cycle production, constrained inventory and the use of Lagrange multipliers.
    • Stochastic inventory models including service levels.
    • Simple and adaptive forecasting.
    • The use of spreadsheets and their application to forecasting and equipment replacement.
    • The replacement of deteriorating equipment and the replacement of equipment liable to sudden failure.
    • Acceptance sampling by attribute and variable.
    • Network planning including PERT, speeding up, the use of LP, Gantt charts and resource smoothing.
    • Decision analysis including utility curves, decision trees and Bayesian statistics.

Assessment


Exam 70%   Report 20%   Presentation 10%

 

  • SOR2004 Linear Models (2nd semester)

Co-requisite: SOR2002. Make sure you are enrolled for this module in the 1st semester.

Introduction
The aim of this module is to cover linear models encompassing multiple linear regression and analysis of variance (ANOVA). These models are the workhorses of statistical data analysis and are found in virtually all branches of the sciences as well as in the industrial and financial sectors.

Multiple linear regression is concerned with modelling a measured response as a function of explanatory variables. For example, a pharmaceutical company might use a a regression model to relate the effectiveness of a new cancer drug to the patients age, gender, weight, diet, tumour size, etc. ANOVA is concerned with the analysis of data from designed experiments. A materials manufacturer for example, may wish to analyse the results from an experiment to compare the heat resisting properties of four different polymers.

Regression and ANOVA will be initially developed using a classic least squares approach and later the correspondence between least squares and the method of maximum likelihood will be examined. After a thorough development of linear models the groundwork will have been laid to allow an extension to the broader class of Genealized Linear Models (GLM). These permit regression models to be applied to situations where the recorded response is not normally distributed. One famous example of the use of GLM was the analysis of O-ring failures on the space shuttle Challenger.

An important element of this module will be a weekly practical data analysis class using the SAS software package. SAS is probably the leading statistical package used in industry. These classes, lasting up to three hours, will introduce the student to elementary data entry in SAS, elementary matrix manipulation using the SAS Interactive Matrix Language (IML) and analysis of data using linear and generalized linear models. Each week the student will complete a data analysis task using SAS and is required to submit a report the following week.

Contents

    • Multiple linear regression: ordinary least squares, model selection and diagnostics, weighted least squares.
    • Analysis of variance: Non-singular and singular cases; extra sum of squares principle, analysis of residuals, generalized inverse solution, estimable functions, testable hypotheses.
    • Experimental designs: completely randomized, randomized block, factorial, contrasts, analysis of covariance.
    • Generalised linear model: maximum likelihood and least squares, exponential family, Poisson and logistic models, model selection for GLM.

Assessment

Exam 70%   Report 20%   Presentation 10%