Artificial neural network modeling of p-cresol photodegradation

Background The complexity of reactions and kinetic is the current problem of photodegradation processes. Recently, artificial neural networks have been widely used to solve the problems because of their reliable, robust, and salient characteristics in capturing the non-linear relationships between variables in complex systems. In this study, an artificial neural network was applied for modeling p-cresol photodegradation. To optimize the network, the independent variables including irradiation time, pH, photocatalyst amount and concentration of p-cresol were used as the input parameters, while the photodegradation% was selected as output. The photodegradation% was obtained from the performance of the experimental design of the variables under UV irradiation. The network was trained by Quick propagation (QP) and the other three algorithms as a model. To determine the number of hidden layer nodes in the model, the root mean squared error of testing set was minimized. After minimizing the error, the topologies of the algorithms were compared by coefficient of determination and absolute average deviation. Results The comparison indicated that the Quick propagation algorithm had minimum root mean squared error, 1.3995, absolute average deviation, 3.0478, and maximum coefficient of determination, 0.9752, for the testing data set. The validation test results of the artificial neural network based on QP indicated that the root mean squared error was 4.11, absolute average deviation was 8.071 and the maximum coefficient of determination was 0.97. Conclusion Artificial neural network based on Quick propagation algorithm with topology 4-10-1 gave the best performance in this study.


Background
Environmental pollution on a global scale has drawn the attention of scientists to the vital need for friendly chemically clean processes. Phenolic compounds such as cresols are widely used in manufacturing products including cresol-based resin, herbicides, pharmaceuticals, surfactants and petrochemical [1]. P-cresol, water solubility above 21.5 g L -1 at 25°C, has been listed as a persistent priority, toxic chemical and the quantitative structureactivity relationship which indicates a significant threat to the environment [2][3][4][5]. Recently, advanced oxidation processes (AOPs) have been used as one of the practical technologies for the removal of the persistent pollutants [6][7][8]. Among the various AOPs, zinc oxide (ZnO) as heterogeneous photocatalysis showed great photodegradation due to its ability to destroy a wide range of the pollutants at ambient temperature and pressure, without generation of harmful by-products [5,[9][10][11][12][13]. In the photodegradation process, the main effective operational parameters such as irradiation time, pH, photocatalyst amount and concentration of the pollutants were investigated under ultra violet (UV) and visible-light irradiation [13][14][15][16][17]. However, the complexities of the parameters behavior in the radiant energy balance, the spatial distribution of the absorbed radiation, mass transfer, and mechanisms of the photochemical degradation, cause misinterpretation of results. In addition, the kinetic of the photodegradation is quite difficult to determine [18]. The complexities have been big challenges for the traditional methods such as one-variable-at a time. The methods have been carried out by varying one parameter while fixing other variables constant. Since the variables are not completely independent of each other during the process, it might have adverse effects on the yield of the photodegradation [13,15]. On the other hand, the multivariate methods such as response surface methodology (RSM) consider the effect of the variables during the performance simultaneously, which could be a promising view [16]. However, the methods consider only two variables at a time, which could be a big disadvantage for the complex system with more than two variables. Furthermore, the method is involved with the complicated statistical calculation such as fitting process and regression analysis [17,19]. More recently, artificial neural networks (ANN) have been widely used for modeling chemical reaction processes [20][21][22][23]. The models possessed reliable, robust, and salient characteristics in capturing the non-linear relationships between variables in the complex system. Therefore, the successful photodegradation of many environmental organic pollutants such as ethylene-diamine-tetra-acetic acid [24], nitrogen oxides [25], nitrilotriacetic acid [26], 2,4-dihydroxybenzoic acid [27] and decolorization of CI Acid Blue 9 [28] was studied by the ANN. As observed, the ANNs simulated the behavior of the complex reaction system by using different algorithms such as Quick propagation (QP), Incremental backpropagation (IBP), Batch backpropagation (BBP) and Levenberg-Marquardt (LM) algorithm [20]. Among them, the backpropagation was a popular algorithm. Since the results of the photocatalytic processes under different conditions were estimated free of the complexities. In this work, the development of a multilayer feed-forward neural network model was used to predict the photodegradation% of p-cresol by ZnO under UV irradiation. To optimize the modeling, the results of the algorithms were compared by minimized root mean squared error (RMSE) and the percentage of absolute average deviation (AAD) while the coefficient of determination (R 2 ) was maximized.

Experiment
Materials and methods P-cresol (99.5%, fluka), NaOH (99% Merck), H 2 SO 4 (95%-97%) and other required chemicals were of reagent grade, obtained from Merck and were used without further purification. The ZnO (99%, merck) has a surface area of 3.3 m 2 /g measured by static BET using thermo finnigan sorptomatic 1990 series analyzer. The particle size of ZnO recorded on nanophox facility was 0.4-0.5 μm. Band gap measured using PerkinElmer Lambda 35 UV/ Vis/NIR was 3.02 eV. In all photocatalytic experiments, a litter of mixture ZnO with known quantities and p-cresol was irradiated for an appropriate time. Photocatalytic experiments were performed in a non-continuous mode (batch) binary reactor fitted with 6 W UV-A lamp [15]. The mixture was magnetically stirred (200 rpm) to maintain even distribution of suspension throughout the reactor and to eliminate mass gradient. Air was blown into the reaction solution using an air pump at a flow rate of 150 L/h to make the produced gas volatile (CO 2 ), increase solution fluidization and finally to make oxygen accessible. Flowing cooled water into the binary cylinder kept the temperature at around 25°C. At specific time intervals, samples were drawn from the bulk solution. The samples were filtered through a 0.45 μm polytetrafluroethylene (PTFE) membrane. In order to compare the efficiency of the photocatalytic degradation of p-cresol, the filtrates were analyzed by UV-Visible spectrometry (Shimadzu, UV-1650pc) at the maximum absorption wavelength of p-cresol (277 nm). It should be mentioned that the small positive error of UV-Visible spectrometry in comparison with HPLC was ignored in this paper (results not shown). The percentage of photocatalytic degradation of p-cresol was calculated using Equation (1).
where C 0 = initial concentration of p-cresol, C = concentration of p-cresol after photo irradiation. All photocatalytic degradation experiments were carried out in duplicate. The initial photocatalytic degradation was investigated in the dark, in the absence of the photocatalyst and at normal pH (7.5). Results showed only 6% and 7% of p-cresol was photolysed and adsorbed by the UV irradiation and the photocatalyst surface respectively [29].

Experimental design
The modeling of the photodegradation was carried out by NeuralPower software version 2.5 which is used in several researches [30,31]. To design the experiments, irradiation time, pH, amount of photocatalyst and p-cresol concentration were selected as independent variables (inputs); while photodegradation (%) was selected as the dependent variable (output). The design was performed in the laboratory to obtain the actual responses. The experimental values were then used for ANN modeling. The data were randomly divided into three sets as training, testing and validation data (Table 1) using the option available in the software. The training and testing data were used to compute and ensure robustness of the network parameters, respectively. The testing stage was also utilized to avoid over fitting by using control errors [32]. To assess the predictive ability of the generated model, validation data were considered [21]. The data consisted of six additional experiments which were in the range of values given for ANN modeling and excluded from training and testing (Table 1).

The ANN description
Generally, ANNs are mathematical models, which consist of connected units (neurons/nodes) in different layers. The models are usually used to infer a function from observations of a particular process. The network consists of different layers which are inter-connected by parallel nodes. The nodes are simple artificial neurons, which mimic a biological neural network. The inter-connections of nodes are qualified by the associated weights. The layers include first layer (input), which sends data via the weights to the nodes of the second layer (hidden layer), and then to the third layer (output) [33]. Multi-layer perceptron is a class of networks, which consists of multiple layers with computational units. The units are usually interconnected in a non-directed cycle way such as feed-forward neural network. In the network, the hidden layers can be more than one layer, but a single hidden layer is universally suggested. The number of hidden nodes is obtained by trial and error calculation, which is examined from 1 to 'n' nodes. The inter-connection type of layers is multilayer normal feed-forward [33]. In the net, all the nodes of a particular layer are connected to all the nodes of the next layer. In addition, the inputs for hidden and output layers are calculated by performing a weighted summation of all the inputs received from the former layer. The weighted sum of the inputs is transferred to the hidden nodes, where it is transformed using a   transfer function [34]. The output of hidden nodes in turn, acts as inputs to output nodes where it undergoes similar or different transformation. In this case, the ANN was trained by using the learning algorithmic program which included QP, IBP, BBP and LM algorithm [20] while the connection types were the multilayer feedforward. The transfer function was the logarithmic sigmoid for both hidden and output layers [35]. Since the sigmoidal function is bounded between 0 and 1 the input and output data is normalized to the range by the software scaling. The scaled data are passed into the input layer, propagated to hidden layer to reach to the output layer of the network. The number of hidden node is obtained by trial and error calculation. The nodes as output layer  firstly acts as a summing junction which combines and modifies the inputs from the previous layer using the equation (2) [24], where 'y i ' is the net input to node 'j' in hidden or output layer, x i are the inputs to node j (or outputs of previous layer). The 'w ij ' is the weights which represents the strength of the connection between the 'i th ' node and 'j th ' node. The 'i' is the number of nodes and 'b j ' is the bias associates with node 'j'. Moreover, the learning rate and momentum coefficient for the networks are chosen by the software default values [33]. The results of the process are appeared as RMSE (Eq. 3) which is based on the difference between actual and predicted values (Table 1).
where 'n' is the number of points, 'y i ' is the predicted values and 'y di ' is the actual values. The minimum RMSE of the examined nodes demonstrate the desirable networks. Furthermore, the performance of the ANN models was assessed on the basis of the coefficient of determination (R 2 ) and the percentage of absolute average deviation (AAD) between the models predicted-actual values of the network. The R 2 and AAD are calculated as equations (4 and 5), Predicted photodegradation (%) Actual photodegradation (%) Figure 4 The scatter plot of ANN predicted photodegradation versus actual photodegradation for validation data.
where 'n' is the number of points, 'y i ' is the predicted value, 'y di ' is the actual value, and 'y m ' is the average of the actual values. Therefore, the appropriate topologies are determined by minimum RMSE and ADD while the R 2 is at maximum value.

Results and discussion
The ANN model training In order to determine the optimum number of neurons in the hidden layer, a series of topologies was examined, in which the number of neurons was varied from 1 to 20. For instance, each topology was repeated 15 times to avoid random correlation due to the random initialization of the weights [36]. The RMSE was used as the error function. Moreover, the R 2 and the AAD were used as a measure of the predictive ability of the network. Decision on the optimum topology was based on the minimum error of testing set values. These topologies have the lowest RMSE for the training and testing sets. Figure 1 illustrates the performance of the network for testing data versus of the number of neurons in the hidden layer using IBP, QP, BBP and LM algorithms. According to the RMSE the network with 8 hidden neurons produced the best performances when IBP algorithm was employed. Similarly, the best results were obtained with 10 hidden neurons using QP algorithm. The network with 18 hidden neurons produced the best results for BBP algorithm and best results obtained with 11 hidden neurons using LM algorithm. Therefore, the optimum topologies of the networks were 4-8-1, 4-10-1, 4-18-1 and 4-11-1 for IBP QP, BBP and LM algorithms respectively. Table 2 summarized the results of RMSE, R 2 and AAD for the used algorithms in testing set. As shown, QP was at minimum of RMSE and AAD while, its R 2 was at the highest value in comparison with the other algorithms. Therefore, the performance of QP with 4-10-1 topology was more effective than IBP, BBP and LM algorithms. Table 1 presents the predicted values of the photodegradation for QP, IBB, BBP and LM algorithms. Figure 2 shows the predicted values versus the photodegradation actual values for the training set. As the comparison of the scatter plots demonstrates, the QP predicted model was well fitted to the actual values (R 2 = 0.9997). Therefore, the QP with 4-10-1 topology (Figure 3) was considered as efficient model training for the photodegradation.

Model validation
The predictive ability of the generated model (4-10-1) was validated using a series of data which was excluded from training and testing data set (Table 1). Figure 4 presents the prediction versus actual of the photodegradation which was obtained in laboratory. As illustrated, the RMSE was 2.31; the R 2 was 0.98 and the AAD% was 4.4 which indicated the great predictive accuracy of the model. Figure 5 demonstrates the importance of effective parameters on the photodegradation as an output of the model. As shown the importance values of the parameters was photocatalyst amount > pH > p-cresol concentration > irradiation time in the selected range of the variables (Table 3). For further information, the optimum amounts of the importance factors, which were photocatalyst and pH, presented in Figure 6.

Conclusion
The modelling of p-cresol photodegradation was carried out by the ANN. The photodegradation was performed in ZnO suspension and under UV irradiation. The model contained input, hidden and output layers. The inputs included irradiation time, pH, photocatalyst amount and concentration of p-cresol while the photodegradation % was the output. To obtain the optimum model, ANN was trained by QP, IBP, BBP and LM algorithms. The minimum RMSE values through the repeating data were used as indicator to determine the number of nodes in the hidden layer for each algorithm. According to the minimum RMSE, the 4-10-1, 4-8-1, 4-18-1 and 4-11-1 topologies were selected for the algorithms. To compare the optimized topologies, the RMSE was used as the error function; R 2 and ADD employed as an index of the network predictive ability. The comparison of the algorithm indicated that the QP had minimum RMSE, 1.3995, AAD%, 3.05, and maximum R 2 , 0.98, for the testing set. Furthermore, the results of QP validation were 0.97, 2.3 and 4.4 for R 2 , RMSE and AAD% respectively. In conclusion, the QP gave the best performances and was selected as the process' model.