The broad objective of this thesis is to apply and compare supervised learning techniques for prediction of nitrogen oxide pollutant emission from the recovery boiler of a Kraft pulp mill. In this task, we want to highlight a technique that is most suited and self-adapted to boiler transient operating conditions.

The Kraft process is an alkaline process to produce chemical pulp; cellulose fibers are dissociated from lignin by cooking the chips in a solution of sodium hydroxide (NaOH) and sodium sulfide (Na2S), called white liquor. The residual black liquor is concentrated and burned in a recovery furnace to yield an inorganic smelt of sodium carbonate (Na2CO3) and Na2S. The recovery boiler both regenerates the cooking chemicals and produces high pressure steam to the pulp mill, but the boiler is a major source of atmospheric pollutants in the mill. In particular nitrogen oxide formation is very complex because of several chemicals and dynamic mechanisms: thermal NOx, prompt NOx and fuel NOx.

Nowadays, there is an increasing demand in such industries for efficient data analysis tools, especially for pollutant monitoring and/or energy management. Literature reviews refer mainly on numerical solutions where a complete description of the process is needed and where stationary condition is often a working hypothesis. This is the case with the advanced data validation and reconciliation techniques that we evaluate. This technique is based on thermodynamic models, chemical and physical relationships within process parameters and equipment. This is helpful to highlight some lack of information about the process, but this approach failed to model accurately steam and fumes utilities operating points.

Indeed, in a Kraft recovery boiler, the total nitrogen oxide emission is dependent on several operating factors and heterogeneous conditions, e.g. operating fuels (black liquor or heavy fuel), furnace load, droplet size, air system operation, retention time, biomass characteristics,dots\

For such a complex problem, machine learning techniques may be used as alternative methods in engineering analysis and predictions. They involve algorithms that improve automatically through experience collected in historical databases. Among supervised learning techniques, we focus mainly on neural networks methods (static and dynamic architectures) and additionally on tree-based (regression tree and random forests) and linear ones. For each method, we evaluate its ability to predict NOx pollutant emission in varying conditions.

A random forest is a collection of uncorrelated regression trees, induced from bootstrap samples of the training data. Its internal estimates are also used to measure variable importance and allow us to classify relevant variables for a model inputs selection task. Note that we need some additional a priori knowledge to select the final inputs set.

Among static neural network structures, the multilayer perceptron is the most widely used, particularly the two-layer structure in which the input units and the output layer are interconnected with an intermediate hidden layer. The model of each neuron in the network includes a nonlinear activation function that is differentiable; this network can perform static mapping between an input space and an output space.

Within dynamic architectures, we distinguish those that have only feed-forward connections and those that have feedback (recurrent) connections. In this work, we focus mainly on NARX network (Nonlinear AutoRegressive model with eXogenous inputs) and additionally on Elman recurrent neural network. This last one incorporates an additional layer, called context layer, the nodes of which are the one-step delay elements embedded into the local feedback paths. Nevertheless, Elman's approach has some drawbacks associated with learning parameters scheme and temporal gradient approximation.

Particularly, the NARX network is used for input-output modeling of nonlinear dynamical systems. It is a recurrent model: model inputs are applied to a tapped-delay-line memory of n units and outputs are fed back to the input layer through another line of m units. The total model order s=n+m is therefore a key parameter and the method of Lipschitz numbers is a tool for estimating it. An advantage of NARX is that we can use standard backpropagation algorithm for neural network learning scheme. Furthermore, to increase model robustness, we average neural predictions over a set of individual neural predictors, this is helpful for reducing variance prediction across trials.

Despite the fact that generalization is done on the worst case configuration possible, we see that ensemble of NARX networks perform well on predicting NOx emissions during transient operations and Lipschitz numbers are very helpful for system orders estimations.

We illustrate the potential of a dynamic neural approach compared to the others in the nitrogen oxide prediction task. It is more suited to practical modeling needs and offers a modeling of time and memory. It allows us to monitor NOx pollution and possibly adjusting control variables and performing diagnostics.

The thesis is divided into seven chapters covering several publications.

Chapter 1 is about the Kraft process and its recovery boiler. We start with a short description of the Kraft pulp mill. Then we describe the Kraft recovery boiler, some chemical reactions in the furnace, the steam production equipments and the atmospheric pollutants. Finally we discuss about nitrogen oxide formation in the furnace, the effects of several operating conditions on its production.

Chapter 2 is about data mining, on what it is, on what it is used for and which are the main modeling cultures. This chapter deals with system identification, modeling approaches (white box, grey box, black box), some definitions about learning and modeling, and finally some links between modeling and optimization techniques.

Chapter 3 starts with a state-of-the-art about numerical simulation of a Kraft recovery boiler, then we apply and evaluate a data validation scheme for steam and fumes utilities modeling. Finally we discuss the application of artificial intelligence techniques within the framework of a recovery boiler.

Chapter 4 aims at selecting model inputs, starting with a supervised selection approach based on random forests. We introduce some methodological insights about tree-based methods, from a simple regression tree to random forests. Random forests internal estimates are used to measure the relative importance of each input variable in predicting a response, i.e. nitrogen oxide emission or high pressure steam production. Finally we discuss about some useful extra knowledge to take in account for the selection of final inputs.

Chapter 5 is about neural networks modeling, we introduce the perceptron, the multilayer perceptron, and the associated backpropagation algorithm. We discuss about static and dynamic architectures, especially the Elman recurrent neural network. Finally, we apply a multilayer perceptron and an Elman recurrent neural network for predicting the high pressure steam flow rate from the Kraft recovery boiler.

Chapter 6 presents some insights about input-output modeling of nonlinear dynamical systems, especially with NARX network. At the end, we explain the Lipschitz method that is applied for system orders estimation.

Chapter 7 summarizes some comparison results about supervised learning techniques applied to predict nitrogen oxide pollutant emission from the recovery boiler. This comparison involves neural network techniques, tree-based methods and multiple linear regression.

Finally, some research perspectives are presented and some conclusions are drawn.