Application of principal component analysis in the pollution assessment with heavy metals of vegetable food chain in the old mining areas

Background The aim of the paper is to assess by the principal components analysis (PCA) the heavy metal contamination of soil and vegetables widely used as food for people who live in areas contaminated by heavy metals (HMs) due to long-lasting mining activities. This chemometric technique allowed us to select the best model for determining the risk of HMs on the food chain as well as on people's health. Results Many PCA models were computed with different variables: heavy metals contents and some agro-chemical parameters which characterize the soil samples from contaminated and uncontaminated areas, HMs contents of different types of vegetables grown and consumed in these areas, and the complex parameter target hazard quotients (THQ). Results were discussed in terms of principal component analysis. Conclusion There were two major benefits in processing the data PCA: firstly, it helped in optimizing the number and type of data that are best in rendering the HMs contamination of the soil and vegetables. Secondly, it was valuable for selecting the vegetable species which present the highest/minimum risk of a negative impact on the food chain and human health.


Background
Since ancient times, plant based food has played an important role in human nutrition, being a very important source of antioxidants, vitamins and minerals [1][2][3]. Modern nutrition requires greater consumption of vegetables and fruits, because of their role on the quality of life [1]. On the other hand, plant food, especially the one which is consumed without prior processing, such as raw vegetables, is the first link of the food chain through which macro and micro metals can go directly into the body. To ensure the functioning of various enzyme systems, the human body needs most micro metals, but some such as Cd, Pb, and Hg have toxic effects. Toxicology indicates that other heavy metals, such as Cu, Ni, Mn, Fe, Cr, V, Mo, can appear in the list of harmful metals, if their concentration exceeds certain limits [4,5].
For vegetables obtained in uncontaminated areas, the levels of these metals are low, generally below the permissible limits. The situation is quite different when these crops are obtained in geogenic or anthropogenic contaminated areas, such as the mining areas where these metals are exploited. Especially after the industrial revolution, the need for metals in modern society led to the development of metal mining. Because environmental pollution problems have generally been neglected over the centuries, accumulated residues in these important areas have led to increased pollution of soil and plants (both spontaneous and cultivated). Only in recent decades has society realized the negative effects that heavy metals contamination has on the environment and on human health. Nowadays, heavy metals have been acknowledged as factors of "Global Change" scenarios. Climate change may affect HMs bioavailability in soils and hence the entire food chain [6].
Romania is recognized as one of the European countries with polymetallic mining (Fe, Pb, Cu, Zn, Au, Ag) since antiquity [7]. Research conducted in Romania in recent decades has clearly shown the extent of the anthropogenic pollution of soil and plant food (both of these mining areas as well as of the large urban areas) with various heavy metals [8][9][10][11][12][13]. Such research, which calls attention to the increased contamination of soils and plants in areas contaminated by mining activities or in large conurbation areas, has been performed worldwide [14][15][16][17][18].
Together with soil, vegetable food constitutes the next link that can increase HMs accumulation in the food chain and that can make them a hazard for consumption. Recently, many studies on various plant foods have addressed this issue [19][20][21][22][23][24][25][26][27][28][29][30][31][32]. If Fe, Mn, Zn, Cu, Ni are essential for plant growth, other HMs (Cd, Pb, Cr, Hg, Ag) are toxic. In normal situations, plants' defences and homeostasis by complex mechanisms are processes that limit the accumulation of HMs [33], for example in special situations in contaminated areas where HMs accumulation in some vegetables can be high and dangerous for human health [19,[30][31][32].
For proper growth and development of all animals, including humans, Fe, Mn, Zn and Cu can be considered trace minerals with a central role in many metabolic processes throughout the body. They are essential as catalysts in many enzyme and hormone systems which influence growth, bone development, feathering, enzyme structure and function. On the other hand, it has been proven that, in large amounts, these metals can cause oxidative stress in the animal body, which can on the one hand be beneficial, killing tumour cells, but on the other hand it may have a negative role inducing cancer by oxidative DNA lesion [34]. Moreover, science has proven the existence of structural interactions between heavy metals and functional peptides [35][36][37]. Another source of metals in animal-based food for humans is the addition of metals in forage, which can get into meat or other animal food products and hence in the human body, where it can influence human health in a positive or negative way [38][39][40].
The wide range of aspects regarding the presence of HMs in the food chain and their implications on human health requires further research in this field in a unitary approach on the environment which the humankind is part of. The old mining areas affected by anthropogenic contamination with heavy metals are particular areas in which the concentration of one or more heavy metals exceeds normal values in most soils and in some agricultural products used as plant food, such as vegetables and fruit, or even animal products (meat, eggs, and milk).
Given the variety and diversity of data, research in these areas involves not only modern analytical methods with sensitivity, specificity and high accuracy to obtain valid results on HMs content in soil and food but also complex statistical methods that provide the big picture in what they are concerned. Multivariate statistical techniques are the right tool for viewing and analysing some matrices of complex data [41]. PCA and cluster analysis (CA) are two unsupervised methods that allow us to deduce how certain variables (metals concentration, other parameters of the soil or plants) that characterize objects (soil, plant) determine their association. If the CA method is used for samples grouping original variables, PCA estimates the correlation structure of the variables by finding hypothetical new variables (principal components -PC) that account for as much as possible of the variance (or correlation) in a multidimensional data set. These new variables are linear combinations of the original variables [42]. This method helps us to identify groups of variables (i.e. heavy metals concentrations or other soil or plant parameters) based on the loadings and groups of samples (soil or vegetable species) based on the scores.
To understand the complex connection between soil or plant samples and heavy metals contents was used chemometric technique PCA. It is based on eigenanalysis of the covariance or correlation matrix. Each variable has a loading which show how well a variable is taken into account by the model components. They reflect how much each variable contributes to the meaningful variation (or correlation) in the data and to interpret variables relationship. Each sample has a score along each model component which shows the location of the sample in this model and can be used to detect sample patterns, groupings, similarities or differences [43,44]. In practice, it will ignore higher number PC axes that explain only a small proportion of variance in the species data [45]. The importance of a variable in a PC model is indicated by the size of its residual variance. This is useful for the variable selection; a variable with little explained variance may be removed without adding more changes to the PC model. It is not restriction in the number of variables, the rule for multiple regressions that the number of variables must be smaller than the number of objects does not apply in PCA case. The closer the similarity between the objects, the fewer terms are needed in the expansion to achieve certain approximation goodness [46]. In order to simplify plotting, PCA may be used for reduction of the data set to only two variables (the first two components).
The purpose of our work is to assess the complex phenomenon of pollution of the vegetable food chain in old mining areas with heavy metals by the Principal Component Analysis. For this purpose, many PCA models were constructed and they were computed with different markers. As main markers for pollution were used two types of markers: simple markers, represented by the HMs concentrations in contaminated soils and by the HMs concentrations in the vegetables consumed by the population in these areas. It was also employed a complex marker, namely Target Hazard Quotiens (THQ). This marker connects the metals concentrations in food with their toxicity, quantity and quality of food consumption and body mass of consumers [58]. The use of this complex parameter is more extensive in evaluating the potential health risk of HMs present in various foods [58][59][60][61][62].

Assessment of soil data by principal component analysis
Soil is the main link in the food chain. From soil, vegetables can take both nutrients and toxic elements, such as heavy metals, directly by root adsorption and indirectly, through foliar absorption of contaminated soil particles. Extensive exploitation of metals by mining left traces in the soils of these areas (R and M), compared with the reference area (Ref ) in which there were no mining activities. The levels of heavy metals found in soil samples from the three areas cultivated with different crops of vegetables are presented in Table 1. All these data were selected as variables for Principal Component Analysis and computed the PCA1 -soil model.
Compared with normal contents and alert values, in accordance with the Romanian legislation (Table 2), the most frequent and pronounced excess of normal values was observed for Mn followed by Zn, Cu and Pb. For both Ni and Cd, no excess of the legally admissible contents was found in contaminated areas or in the reference area.
To understand the association of soil samples from the three areas, depending on heavy metals content, Principal Components Analysis was applied using PAST software   using division by their standard deviations. The eigenvalues give a measure of the variance accounted by the corresponding eigenvectors (components) [42]. Given the large scale of values for metal concentrations (from unit to thousand), to standardize the data, we performed logarithmic data transformation. From scree plot graph of eigenvalues of the PCA1-soil model ( Figure 1) it can be seen that the first two PCs are enough to explain 94.2% of the pattern variation. Concentrations in Cd, Pb and Zn were major contributors to PC1 while the Cu concentration was the major contributor to PC2. The two factors can separate well the two areas (R and M) with anthropogenic pollution caused by mining, one from each other and both from the reference (Ref) unpolluted area ( Figure 2). The Cu concentration is mainly responsible for separating the M area; this variable is the one with the highest positive loadings on PC2 ( Figure 1). This area is well-known for copper exploitation, and the concentration of copper in the investigated soils generally exceeds ATV values (Table 1). PC1 contributes most significantly for the separation of R area, which is known for its deposits of polyminerals, especially Pb. Along Pb, Cd and Zn there are some other metals with a significant contribution on PC1 (Figure 1). In the R area these metals have concentrations that exceed the permissible amount of ATV (for Pb and Zn) or get very close to it (for Cd). Fe, Mn and Ni are metals that in terms of geochemical investigation are common for the three areas and also show reduced contribution of the two PC components.
PCA can be seen as an ordination technique that constructs the theoretical variable that minimizes the total residual sum of squares after fitting a straight line to the data for each species. PCA does so by choosing best values for the site, i.e. the site scores. A positive score means that the concentration of variables increases along the PC axis; a negative score means that the concentration of variables decreases along the axis and a score near 0 means that the concentration is poorly (linearly) related to the PC axis. The direction of the variable arrows indicates the direction in which the concentration of the corresponding species increases most, and the length of the arrows equals the rate of change in that direction. In the perpendicular direction, the fitted concentration is constant [45].  influence on the reference sample group. Based on these considerations, can be eliminated the variables Fe, Mn and Ni from the model without affecting its quality. For the new PCA2-soil model (see Additional file 1), with only 4 variables (Cu, Zn, Pb and Cd), the two factors explain 98.6% of model variance, which means that variables that were eliminated had the role of noise, contributing to decreased quality of the model. Thus, by using PCA, was obtained a reduction of the number of analyses from 7 to 4 (and also the analytical cost) needed to correctly characterize and classify the soil samples from the areas under research, in terms of pollution with toxic or potentially toxic heavy metals.

Assessment of the vegetables-related data by principal component analysis
To characterize by PCA the heavy metals contamination of vegetables in the afore-mentioned areas, according to location (contaminated or uncontaminated areas), first was constructed a PCA1 -plant model, containing only the concentration of HMs in vegetables ( Table 3). The quality of the model was poor in vegetable samples classification according to location (see Additional file 2). To improve the model, in addition to analytical data of heavy metals concentrations in plants, was also used some agrochemical soil characteristics, because the translocation of metals in plants is dependent on the agrochemical characteristics of soil, such as pH and clay content [10,12].
Similarly with the data related to soil, the vegetablesrelated data were standardized by logarithms and processed with the same PAST software. Scree plot and PC loadings for this PCA2 -plant model are presented in Figure 3. One can see that, in this model, the first two PCs explained only 84.4% of the variance. Graphical representation of the PCA2 -plant model, biplot of PC1 and PC2 (Figure 4), distinctly separate only areas with anthropogenic pollution of the soil (R, M) from the unpolluted area (Ref ). This model, built on the metals content of plants and the two agrochemical parameters, cannot make a clear differentiation between the two areas with anthropogenic pollution (R and M). In the same way, no other combination of PC1 and PC3 and PC4, with a lesser degree for variance explanation, provides good differentiation between these three areas.
In the reference area, although slightly-acid pH is favourable for the mobility of heavy metals, high clay content of the soil reduces their translocation in plants. PC2 and its components are the factors that differentiate this area from the other two (M and R), which are characterized mainly by PC1 and its components (Figure 4).
Fe, Mn, Zn and Ni are the metals found in high concentrations in vegetables in the reference area. In the vegetables from the areas with anthropogenic contamination (R and M), Pb, Cu and Cd have the highest concentrations; they also present the highest concentrations in the soils from these areas. These metals are strongly absorbed from this low-clay soil, although its neutral pH is less favourable for the mobility of metals [10,12]. Similar agrochemical characteristics of the soils in areas contaminated with heavy metals (pH, humus and clay) as well as the high heavy metals contents of these soils, account for the similarity in their make their translocation in plants. The significant differences found between the Cu and Pb concentrations in the soils of the two contaminated areas (M and R) were not also found in the vegetables grown in the two areas. This can be explained by the homeostasis of plants, which, by mechanisms that are specific for each species, limits excessive accumulation of heavy metals in their bodies [33]. The different accumulation of metals in vegetables can be observed from the results obtained by PCA sorting of vegetable species using PCA3-plant model. This model was computed also only with heavy metals contents of vegetables and its PC1 and PC2 biplot is presented in Figure 5.
The first two PCs explain 85.6% from model variance and can perform a relatively good separation between some vegetable species.

Principal component analysis of THQ data
Target Hazard Quotients is a complex parameter used for the estimation of potential health risks associated with long term exposure to different pollutants, respectively heavy metals, as in our case. For its calculation, besides the metals content of vegetables, other parameters were also involved, which refer to the metals toxicity (oral reference doses, RfD), the duration and intensity of exposure (exposure frequency, exposure duration, average exposure) and to individual characteristics (average body weight) as well [58][59][60][61][62]. So, THQ is a complex parameter used in health risk assessment of heavy metals which provides a better picture related to the content of metals in soils and vegetables than using a simple parameter [61]. It was developed by the Environmental Protection Agency (EPA) in the US to avoid underestimation of the risk and is calculated by the general formula (1) [58]: Where, EF is exposure frequency; FD is the exposure duration, DIM is the daily metal ingestion (mg person -1 day -1 ) and RfD is the oral reference dose (mg Kg -1 day -1 ); W is the average body weight (Kg) and T is the average exposure time for noncarcinogens (365 days year -1 × number of exposure years).
A small value of the index (<1) shows reduced health hazard and a value between 1 and 5 represents a concern level for health hazard [58]. THQ parameters used in PCA -THQ model are presented in Table 4. They were computed from data presented in our previous work [61]. Figure 6 presents the results of PCA sorting of vegetables species, applied to THQ data for investigated metals and vegetables.
The first two PCs explain 84.6% from model variance ( Figure 6) and can show good separation between some species of vegetables, better than the PCA3 -plant model, which uses as variables only the metals concentrations in vegetables ( Figure 5). Better grouping was obtained for parsley roots (R1, M1, Ref1), onion bulbs (R3, M3, Ref3), carrot leaves (R5, M5, Ref5) and cucumber (R8, M8, Ref8) that were well separated from other plants species by PC1 and PC2. From these vegetables, parsley root (R1, M1) is on the PC1 and PC2 positive side, which means that it is associated with the highest THQ values for all metals. In other words, it presents the greatest health risk, especially in contaminated areas R and M. Onion bulbs (M3, Ref3) are on the opposite side, the negative side of the two PCs, which means that they have the lowest THQ values for metals, its lowest health risk respectively, except for contaminated R area. Carrot leaves (R5, M5, Ref5), located on the negative side of PC1 but on the positive side of PC2, are associated with higher THQ values of Fe, Zn and Mn, metals with a low toxicity, so with reduced health risk. Cucumbers (R8, M8), located on the positive side of PC1 and on the negative side of PC2, are associated with higher Cd and Pb concentrations and THQ values for these toxic metals, so with increased health risk, especially in contaminated R and M areas. The other plants, carrot roots (R2, M2, Ref2), parsley leaves (R4, M4, Ref4), cabbage (R6, M6, Ref6), lettuce (R7, M7, Ref7) and green beans (R9, M9, Ref9 ) are not satisfactorily separated, being merged in the centre of the axes. These vegetables grown in contaminated R and M areas are associated with high THQ values, especially for toxic metals Cd and Pb, which mean that their consumption presents a major health risk. In the reference area, these plants are associated with low levels of THQ, so the health risk linked to their consumption is low.

Conclusions
This legacy of heavy metals pollution generated by industrial society put pressure on human health all over the world. Finding a solution for this situation is a permanent task of researchers, which involves not only finding new and advanced analytical methods to identify quality and quantity of contaminants, but also applying complex statistical methods that allow an overall assessment of the interaction of these contaminants in the food chain and the health risk associated with their consumption by humans. In our study, application of PCA for analysing these complex data provides: optimization of analytical procedures by selecting for analysis only 4 variables (Cu, Zn, Cd and Pb) with maximum involvement in pollution assessment, and thus reduction of the analytical costs; differentiation among contaminated areas and types of contamination, based on soil analyses; in this case the emphasis of the pollution with Pb in R zone and with Cu in M zone; selection of vegetables species which the highest (parsley roots, lettuce, beans and cucumbers) or lowest (onions, carrot leaves) health risk in contaminated areas; selection of the best markers which can establish the foods with the highest/ lowest risk on human health in affected areas; in this case the complex THQ parameter permits a better classification of the vegetables with high risk of toxicity for the human health produced by the heavy metals; building and viewing models that can make it easier to understand the complex phenomenon of environmental pollution and health risk.

Experimental
Detailed description of the location of investigated sites, the preparation of soil and vegetable samples, the analysis of heavy metals and quality control are presented in our previous work [61]. Briefly, the study areas are located in the South West of Romania, in Banat County (see Experimental site location in the Additional file 4). Soil and vegetable samples were collected from subsistence farms located in the two contaminated areas (R and M) and one reference area (Ref). The first contaminated area (R) is located around Ruschita village which is the mining centre of Poiana Rusca Mountain. In this area, soil has a gritty texture, the clay content is between 18-22% and the pH is near to 7.8 (neutral) [63]. The second contaminated area (M) is located around Moldova Noua town, close to the Danube river. Soil from this area also has a gritty texture, clay contents between 18-20% and a pH near to 7.6 (neutral) [63]. The reference area, Borlova village, near the town of Caransebes, is located on the Sebes river valley, at the foot of Muntele Mic Mountain, a non-polluted area with less industry. Soil from this area has a fine texture, clay content between 28-32% and a pH near to 6.6 (slightly acid) [63]. From each area (R, M and Ref) were collected 9 average soil samples, from 0 to 20 cm deep. Average soil samples were prepared from 10 individual soil samples. The extraction of HMs from soil samples were made by the wet procedure with a mixture of mineral acids (HCl, HNO 3 , 3:1 ratio) and from vegetables by plant ash digestion with 0.5N HNO 3 . The plant ash was obtained by burning plant samples 8 h at 550°C in the furnace (Nabertherm B150, Lilienthal, Germany). HMs were analysed from solutions by flame atomic absorption spectrometry (FAAS) in University Environmental Research Test Laboratory using the flame atomic absorption spectrophotometer with high resolution continuum source (Model ContrAA 300, Analytik Jena,Germany), fitted with a specific condition for particular metals using appropriate drift blanks. NCS Certified Reference Material-DC 85104a and 85105a (China National Analysis Center for Iron&Steel), were analysed for quality assurance. Per cent recovery means were: Fe (92%), Mn (95%), Zn (102%), Cu (105%), Ni (99%), Pb (94%), Cd (105%). The variation coefficients were below 10%. Detection limits (μg/g) were determined by the calibration curve method: Fe (0.15), Mn (0.19), Zn (0.43), Cu (0.13), Ni (0.14), Cd (0.01), Pb (0.05) [61].
The levels of heavy metals for soil samples from these areas are presented in Table 1 and the data for average HMs contents in vegetables, compiled according to Harmanescu et al., 2011 [61], in Table 3.

Statistics
The data were statistically analysed using a statistical package PAST [42]. The concentrations of metals contents were expressed in terms of means and standard deviation, and the figures with the mean values. Statistical significance was computed using Pair-Samples T-Test, with a significance level of p.