SOR Level 3 Modules
Prerequisites: None
Lecturers: Prof J Kohanoff, Dr M Gruening
Introduction
Operational Research is the application of quantitative analysis to problems outside of the physical sciences and in particular to the problems of business, industry and administration. However, all the techniques can be used outside of Operational Research and emphasis is given to formulation of problems and recognition of the appropriate technique. Blackboard examples are given for each technique and the homework problems are drawn from a wide field. Most techniques are in the form of algorithms and for hand calculations the problems have to be fairly simple, but indications are given how computers could be applied to bigger problems in the project and presentation components.
The aim of this course is to develop competence in two of the most important mathematical techniques used in Operational Research, in formulating problems and expressing answers clearly.
This course covers the two main generalpurpose mathematical techniques used in Operational Research together with some specialized applications. These are all optimisation techniques, although they also give insight into the problems and the interpretation of answers is stressed. Calculus cannot be used because either (i) the objective function and the constraints cannot be expressed simply, or (ii) the constraints dominate the problem and the optimal solution will be at the edge of the feasible region, or (iii) the variables are not continuous (e.g., they are integer).
Some students may have met some of the techniques in courses at other levels, but no previous knowledge is assumed, and more mathematical rigour and understanding is required than at those levels. However, the mathematical knowledge assumed does not extend beyond that in Level 1. The main knowledge assumed is elementary linear algebra (bases, linear dependence of vectors, matrix notation, partitioning of matrices). No knowledge of economics is required but the economic interpretation of some results is explored. Apart from a passing reference to stochastic Dynamic Programming this course is purely deterministic, and no knowledge of statistics is required.
Contents
The scope of Operational Research, formulating a problem from a verbal description.
Dynamic programming: formulation, principle of optimality, value iteration, applications including equipment replacement, allocation, production planning and optimal routes. Special algorithms for optimal routes.
Linear programming: formulation, theory, Primal Simplex Method, interpretation of the final tableau, Revised Simplex Method, duality theory including economic interpretation, Dual Simplex Method, Postoptimal Analysis, Transportation and Assignment problems. A wide variety of practical problems and applications is discussed.
Assessment
Exam 70% Project 20% Presentation 10%
 SOR3004 Linear Models (^{1st} semester) NOT TO BE TAKEN AFTER SOR2004
Prerequisite: SOR2002. Make sure you are enrolled for this module in the 1st semester.
Lecturer: Dr H Mitchell
Introduction
The aim of this module is to cover linear models encompassing multiple linear regression and analysis of variance (ANOVA). These models are the workhorses of statistical data analysis and are found in virtually all branches of the sciences as well as in the industrial and financial sectors.
Multiple linear regression is concerned with modelling a measured response as a function of explanatory variables. For example, a pharmaceutical company might use a a regression model to relate the effectiveness of a new cancer drug to the patients age, gender, weight, diet, tumour size, etc. ANOVA is concerned with the analysis of data from designed experiments. A materials manufacturer for example, may wish to analyse the results from an experiment to compare the heat resisting properties of four different polymers.
Regression and ANOVA will be initially developed using a classic least squares approach and later the correspondence between least squares and the method of maximum likelihood will be examined. After a thorough development of linear models the groundwork will have been laid to allow an extension to the broader class of Genealized Linear Models (GLM). These permit regression models to be applied to situations where the recorded response is not normally distributed. One famous example of the use of GLM was the analysis of Oring failures on the space shuttle Challenger.
An important element of this module will be a weekly practical data analysis class using the SAS software package. SAS is probably the leading statistical package used in industry. These classes, lasting up to three hours, will introduce the student to elementary data entry in SAS, elementary matrix manipulation using the SAS Interactive Matrix Language (IML) and analysis of data using linear and generalized linear models. Each week the student will complete a data analysis task using SAS and is required to submit a report the following week.
Contents
 Multiple linear regression: ordinary least squares, model selection and diagnostics, weighted least squares.
 Analysis of variance: Nonsingular and singular cases; extra sum of squares principle, analysis of residuals, generalized inverse solution, estimable functions, testable hypotheses.
 Experimental designs: completely randomized, randomized block, factorial, contrasts, analysis of covariance.
 Generalised linear model: maximum likelihood and least squares, exponential family, Poisson and logistic models, model selection for GLM.
Assessment
Exam 70% Report 20% Presentation 10%

SOR3008 Statistical Data Mining (2nd semester)
Prerequisite: SOR2004 or currently enrolled in SOR3004 in semester 1.
Lecturer: Dr R Rollins
Introduction
In the 1990's there was an explosive growth in both the generation and collection of data due mainly to the advancement of computing technology in processing and storage of data and the ease of scientific data collection. As a result, overwhelming mountains of data are being generated and stored. For example, in the business world large supermarket chains such as WalMart and Sainsbury's collect data amounting to millions of transactions per day. In the US all healthcare transactions are stored on computers yielding terabyte databases which are constantly being analysed by insurance companies. There are huge scientific databases as well. Examples include the human genome database project and NASA's Earth Observatory System. This has brought about a need for vital techniques for the modelling and analysis of these large quantities of data: data mining.
Data Mining is the process of selection, exploration, and modelling of large quantities of data to discover previously unknown regularities or relations with the aim of obtaining clear and useful results for the owner of the database. The application of data mining includes many different areas, such as market research (customer preferences), medicine, epidemiology, risk analysis, fraud detection and more recently within bioinformatics for modelling DNA.
This module will focus on data mining techniques which have evolved from and are strongly based on statistical theory.
Contents
Introduction to Data Mining.
Exploratory data analysis: Principal Component Analysis; Multiple Imputation.
Cluster analysis: Hierarchical clustering; Partitioning algorithms.
Classification: Nearest neighbour algorithms; Classification trees; Naïve Bayes Classifier; Bayesian networks; Ordinal Regression; Multinomial Logit; Conditional Logit; Nested Logit;Techniques for comparing classifiers  including bagging and boosting in ensemble methods.
Prediction (continuous targets): Regression Trees; Random Forests; Neural Networks; Support Vector Machines.
Association Rule mining.
Assessment
Exam 60% Coursework (Continuous assessments) 40%

SOR3012 Stochastic Processes and Risk (2nd semester)
Prerequisite: SOR1020 and SOR1021
Lecturer: Dr A Munaro
Introduction
Uncertainty or risk is a natural part of life. Continually faced with choices, we are required to make decisions based on the possible consequences or outcomes. We often use the phrase ‘a calculated risk’ to describe how we analyse a problem and come up with our decision. Now the question is: how do we calculate risk  mathematically.
Statistical analysis of a random process helps determine whether there are any underlying patterns. If there are, we can use this to our advantage in predicting the future  or at least make an educated guess! Not surprisingly, this topic is primarily recognised for its career opportunities where ‘risk’ assessment and planning matter: for example the provision of power stations, telecoms, transport planning, the spread of infectious diseases (swine flu, for example). A major application of this kind of mathematics is the financial services sector: insurance and investment, credit risk, capital market trading etc. Many students that have taken this course are currently employed as highlypaid actuaries.
Aside from the educational and commercial value of this subject, it leads into very interesting and fun topics such as measure theory, martingales and potential theory, noise, fractals, etc.
Contents
Probability theory, random variables and simulating random events
Dealing with multiple random variables. The conditional expectation theorem.
Discrete time Markov chains including gamblers ruin. Simulating discrete time Markov chains
Continuous time Markov chains including the poisson process and the M/M/1 Queue. Simulating continuous time Markov chains.
Simulating nonMarkovian processes.
Assessment
Assignments 1, 2 and 3 (10% each), Project (10%), Exam (60%)