Researchers : Dr. Kang Li, Dr Jian-Xun Peng and Dr Patrick Connally
IntroductionThe main theme of this project has been to introduce a new modelling approach through the integration of prior system knowledge in the form of fundamental engineering functions known as ‘engineering genes’ or ‘eng-genes’ in short, into a single-hidden layer neural structure.
Nonlinearity is an inherent property for complex systems in many areas, which arises from the first principle laws and empirical equations that govern the system behaviour. Mathematical modelling based on such priori knowledge usually leads to a set of linear and nonlinear ordinary and partial differential equations, which sometimes can be difficult to solve or requires extensive computing resources or have very limited number of samples (e.g. biomedical data). Obviously this ‘white-box’ method is not applicable for many real-life problems.
Alternatively, data-driven ‘black-box’ methods use much simpler, usually lumped, models to approximate the original systems, allowing the designer to adjust the balance between accuracy and computational complexity by altering the model type and model complexity. For black-box methods, the chosen model, such as multilayer perceptron, radial basis function, polynomial nonlinear autoregressive exogenous model (NARX) and indeed many other approximators, may bear little physical relation to the system and therefore the quality of the resulting model is heavily reliant on the identification data.
A ‘grey-box’ method is utilised in cases where it is desirable to incorporate some of the advantages of both of the white-box and black-box approaches. Depending on how, and how much, the priori engineering knowledge can be used, there exist different grey-box approaches. These though can be categorised into two general classes. The first chooses a conventional black-box model, and ‘priori’ knowledge is used to shape the parameter identification or model structure selection. The alternative is to start with a model originating from the mathematical relations, which describe the behaviour of the system. In the latter approach, physical modelling and system identification form two interacting paths, and most methods would assume that the model structure is known 'a priori', and the major modelling task is then to identify unknown parameters and unmodelled dynamics using black-box models. These two grey-box approaches have limitations when the engineering systems are either too complex to derive a simplified model or that the physical knowledge about the process is incomplete.
Despite many advances in nonlinear system modelling and identification in the literature, the eng-genes method for the first time breaks down a complex nonlinear system into basic (mathematical) elements (fundamental nonlinear functions and operators), namely ‘engineering genes’ from priori engineering knowledge, which are then composed and coded into an appropriate chromosome (model) representation associated with a specific structure (a generalised single-hidden layer neural structure in this project). The chromosome, or a group of chromosomes, then evolves to produce a model which best fits the system with improved performance and transparency.
This team has made substantial progress in the development of the eng-genes concept and associated algorithms and software.
Methods and results
1) Eng-genes modelling concept
The name ‘eng-genes’ derives from the view that basic nonlinear functions act as the fundamental building blocks of system behaviour. These functions can then be seen to be analogous to the genes present in the cells of the body. The model type most closely based upon this concept is the artificial neural network, which attempts to mimic the operation of biological neurons through parallel combinations of simple mathematical structures. The eng-genes concept, therefore, seeks to enhance the transparency and accuracy of neural models by integrating a ‘gene’ of fundamental system knowledge into each artificial neuron in the structure. This work is inspired directly from the original Kolmogorov's superposition theorem which has laid the foundation for the approximation analysis of artificial neural nets. Conventional neural networks derive their approximation capacity from the superposition of simple functions, where the constituent functions are mostly homogenous within the network, and chosen for ease of use and training. Thus, their relevance to the system under study is rarely investigated. Kolmogorov's theorem on the other hand also suggests that the appropriate set of functions is dependent on the system to be modelled. Hence, the natural means of including the system-related nonlinearities fundamental to the eng-genes concept is to use them as activation functions within the hidden layer of the neural model, aiming to improving the model transparency and generalisation performance. Mathematically, an eng-genes neural network used to model a MISO nonlinear system, where the activation function are ‘eng-genes’ or fundamental functions identifiable from first principle laws or empirical engineering equations.
For any neural network topology, the determination of the structure, in terms of the number of inputs and size of the hidden layer, is a necessary stage in the modelling process. However, the inclusion of system-derived nonlinearities as activation functions adds further complexity to the procedure. Firstly, there may in some cases be more potential activation functions available from the system knowledge than are feasible to include in a small, computationally efficient model, or some of the functions may not significantly affect the system's behaviour. In these cases, it is necessary to determine the most dominant and significant functions from the ‘pool’ or the ‘gene library’ collected from systems and processes of similar nature (governed by similar engineering principles and laws). Furthermore, these functions may be parametric, and hence the effectiveness of the neural model will also depend on selecting optimal parameter values. The resulting ‘eng-genes’ modelling procedure is illustrated in figure 1. The concept has been formally introduced in greater detail in publications.
2) Approximation and training analysis of Eng-genes model
This project initially concentrated on the application of existing conventional modelling techniques to eng-genes type networks, in an effort to both investigate the approximation properties of the new networks in comparison with conventional neural nets and to determine the areas in which new techniques would be necessary. One major part of this research effort has been to create the eng-genes networks using evolutionary algorithms like genetic algorithms and genetic programming^{ } which can both determines the structure of the network in terms of type and number of hidden-layer neurons and optimise the parameters associated with the activation functions and the network weights. These networks were compared to conventional MLP networks which had been optimised using the same genetic algorithm, and the eng-genes networks were shown to provide better generalisation performance for the application studied.
The earlier work on this aspect also suggested that due to the stochastic sampling and computationally-intensive nature of genetic algorithms, advanced methods and associated software are needed to improve the optimisation speed, otherwise it would not be well suited for practical applications. One proposed means of dealing with this was to ‘split’ the eng-genes modelling process into two steps, whereby the activation functions and their parameters (the ‘structural’ portion) would be determined offline by genetic algorithms. The neural network weights and biases could then be optimised by conventional neural network training techniques, many of which have computationally-efficient online variants.
The question thus arising regards the effectiveness of conventional neural network training algorithms when applied to networks utilising non-conventional activation functions. This issue was first approached by means of a case study, in which heterogeneous neural networks which had been previously structurally optimised via the use of genetic algorithms were trained from random starting points using the well-known Levenberg-Marquardt algorithm. While this work showed that eng-genes networks are capable of comparable or superior performance to conventional MLP's, a significant issue was raised. Specifically, it was found that in some cases, the eng-genes networks were more susceptible to unstable performance under parallel operation than similar MLP networks. To further investigate this phenomenon, this team examined the effect of nonconventional activation functions on the performance of standard (in this case gradient-based) neural network training algorithms. The chosen approach was to model simple target systems with very small neural networks, such that the correct choice of activation functions and network parameters would produce an exact match between system and model. The simplicity of the problem allowed for large numbers of networks to be trained for each system-model pair, resulting in meaningful statistical analysis of the difference between conventional and eng-genes-type neural networks. In addition, the small dimensionality of the problem allowed for visualisation of the solution space, giving an intuitive insight into the issues involved. Examination of both the statistics and the plotted surfaces resulted in the stability difficulties being addressed with a novel scheme combining prediction-error-based training with simulation-error-based training^{4}.
3) Advanced algorithms and software for eng-genes model construction and training
Several new algorithms were derived and a standalone software package was developed.
Through the course of the research, it became clear that advanced analytic methods are desirable for practical implementation of ‘eng-genes’ modelling due to their computational efficiency. However, the particular demands of eng-genes modelling, incorporating activation function selection, network structure determination and optimisation of both function and network parameters, proved beyond the capabilities of existing algorithm or framework. Clearly, the development of such an algorithm became a priority. The biggest challenge was that the eng-genes model is a heterogeneous neural network and the activation functions extracted directly from engineering priori knowledge can be difficult to train using conventional methods. Therefore, it was found to be beneficial to view the eng-genes concept as a ‘superset’ capable of comprising several other types of nonlinear model, at least from a mathematical perspective. For instance, RBF neural networks could be viewed as a special case of the eng-genes model in that only one type of hidden node is utilised, but possessing parameters inherent to both the nodes and to the network itself. Similarly, the creation of nonlinear autoregressive models with exogenous inputs could be viewed as a process analogous to the selection of nodes and determination of output weights in an eng-genes network, with the simplifying factor of pre-determined inherent parameters. Hence, rather than being approached as one very complex optimisation procedure, the desired algorithm could be built up by deriving suitable new techniques for each aspect of the process, with the added benefit that each would be novel and useful for a variety of applications and model types in its own right. The final eng-genes implementation could then be achieved by tightly integrating them together into a single efficient scheme.
So, more formally, the task at hand became both the selection of the activation function for each neuron, and the optimisation of the neuron's entire parameter set. In our research, the first step in this process was taken, when a novel fast algorithm for the selection of terms (hidden nodes with pre-determined parameters) and identification of linear parameters (output weights) in nonlinear models was introduced. When used in the creation of NARX models, this algorithm represents a significant improvement over the conventional Orthogonal Least Squares (OLS) algorithm in terms of both computational efficiency and numerical stability. As with all forward stepwise approaches of this type, the fact that the term selection is not exhaustive is the key to its efficiency. However, its result is not optimal. In order to address this issue, the forward term selection was supplemented by a backwards refinement scheme. Once all initial model terms have been selected, each is analysed in terms of its significance to the final model. Insignificant terms are replaced, resulting in a model with improved performance and model compactness. The term selection and subsequent refinement are both performed within a well-defined regression scheme, thus facilitating their integration with further techniques into an overall structure and parameter optimisation scheme for nonlinear models in general and eng-genes networks in particular.
While the issue of selecting the activation functions (or model terms) from a candidate pool was thus resolved, there remained the problem of determining their associated parameter sets. This was first addressed using a continuous forward algorithm for both the construction and optimisation of a general class of nonlinear model was introduced, and, in this case, applied to RBF neural models. The network grows by adding one node each time, and at each stage of construction, the new node is created ‘from scratch’ and its parameters are optimised via conjugate gradient method, with the linear output weight being determined by substituting the least-squares solution into the cost function. It was then noticed that the network performance could be further improved by introducing a hybrid forward algorithm, where the initial nodes are selected from a small pool of candidates and the node parameters are then optimised using second order Newton method, leading to a more comprehensive network construction algorithm. The efficiency of the schemes was confirmed by computational complexity analysis, and simulation results show it to typically outperform alternative techniques both in terms of computation time and model accuracy. Moreover, again to improve the compactness of the model obtained from the forward network construction, a two-stage mixed discrete-continuous method was proposed to allow the refinement of the network constructed in the forward phase^{9}, and a new Jaccobian matrix has been introduced for more accurate training of generalised single hidden layer neural nets (including eng-genes) with significantly improved perofmrance^{10}.
These core algorithms have then been extended and applied to various issues for the eng-genes neural modelling, and more broadly, the construction of generalised single-hidden layer networks, including the input selection, the creation of eng-genes networks, integrated framework for network construction^{ }, and real-time network construction, etc. The application studies showed that the eng-genes networks considerably outperform conventional MLP networks when network sizes were kept small. Here it must be noted that in order to maintain the transparency and interpretability of eng-genes networks, smaller network sizes are preferred in any case.
4) Advanced genetic algorithm, eng-genes software and applications
In parallel to the development of the analytic framework for eng-genes neural modelling, this team has also devoted effort to create the eng-genes networks using genetic algorithms, which can both determines the structure of the network in terms of type and number of hidden-layer neurons and optimise the parameters associated with the activation functions and the network weights. Firstly, adaptive bounding techniques were introduced into conventional genetic algorithm to enhance the searching speed and capability. Then a significant amount of effort was devoted to the development of user-friendly software to implement the eng-genes modelling procedure. From a practical standpoint, this software can be used for the creation and optimisation of various neural networks including multilayer perceptron, radial basis function and naturally the eng-genes networks via genetic algorithms. In this software, each network in the population is encoded into a mixed-type chromosome. The first half of the chromosome contains integer values representing the network structure in terms of the type and number of hidden nodes. The second half contains floating-point values, representing network weights and activation function parameters. The software provides flexibility in the effective use of available data, in that the user may select any data segment for modelling via a sliding window. Several options exist for visualisation of the optimisation process, allowing the user to
The software also provides functionality to transfer optimised networks to the Matlab environment. In addition, both the discrete and continuous forward selection algorithms have been implemented within the Matlab environment, allowing for a deterministic alternative to the genetic algorithm software for eng-genes network creation and optimisation. A sample screenshot of the software's user interface is shown in figure 2.
The publications produced in the course of the project demonstrate several potential application areas. Additionally, one stated aim of the investigation work into the eng-genes concept was the establishment of a ‘library’ of potential activation functions. The publications describe both numerous examples of such stock functions, as well as the process by which further such functions may be derived from the available system knowledge. The eng-genes concept and many derived algorithms have been applied to both simulated data and real plants, such as estimating the NO_{x} emission levels from several thermal power plants in UK and Italy (figure 3) and estimation of emissions in urban air in Belfast, modelling of a pH neutralisation process from real data, etc. The methods and algorithms have also been applied to systems biology in modelling the signalling pathway, bioinformatics in cancer gene detection and classification, nonlinear system control using eng-genes network, nonlinear system modelling using multiple neural nets, network based identification and control, and rule selection for fuzzy modelling, etc.
Acknowledgements
This work has been supported by the Engineering and Physical Sciences Research Council (EPSRC) for funding this project (GR/S85191/01), and European Social Fund.
Keynote speech
Invited lectures
Journal papers
Conference papers
Conference proceedings
Follow Us On: