Skip to main content
  • Poster presentation
  • Open access
  • Published:

Comparison of some linear regression methods – available in R – for a QSPR problem

An important task in science and technology is modeling a property y by several variables x. In QSPR (quantitative structure-property relationships) the x-variables are often numerical molecular descriptors, and the y-variable is a chemical or physical property. Several efficient regression methods are available to find appropriate regression coefficients b1, b2,..., bm and the intercept b0 for a linear model

with for the predicted property and m the number of x-variables.

Efficient means that model generation is possible for data with more variables than objects, for data with highly correlating variables, and that the complexity of the model is optimized for best prediction performance (not necessarily for best fit).

The compared methods comprise PLS (partial least-squares) regression, robust PLS, PCR (principal component regression), ridge regression, and lasso regression as implemented in the free software system R [1] by the package "chemometrics" described in [2]. The strategy "repeated double cross validation" [2] has been applied to optimize the model complexity (i.e. to find the optimum number of PLS components), and to estimate the prediction errors for new cases. The QSPR problem used is modeling the gas chromatographic retention indices of 209 polycyclic aromatic compounds characterized by 467 molecular descriptors.

References

  1. Software R: A language and environment for statistical computing, version 2.2.7. 2008, Vienna, Austria: R Development Core Team, Foundation for Statistical Computing, [http://www.r-project.org]

    Google Scholar 

  2. Varmuza K, Filzmoser P: Introduction to multivariate statistical analysis in chemometrics. 2009, CRC Press, Boca Raton, FL, USA.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Varmuza, K., Filzmoser, P. Comparison of some linear regression methods – available in R – for a QSPR problem. Chemistry Central Journal 3 (Suppl 1), P37 (2009). https://doi.org/10.1186/1752-153X-3-S1-P37

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1752-153X-3-S1-P37

Keywords