A combined 3D-QSAR and docking studies for the In-silico prediction of HIV-protease inhibitors

Background Tremendous research from last twenty years has been pursued to cure human life against HIV virus. A large number of HIV protease inhibitors are in clinical trials but still it is an interesting target for researchers due to the viral ability to get mutated. Mutated viral strains led the drug ineffective but still used to increase the life span of HIV patients. Results In the present work, 3D-QSAR and docking studies were performed on a series of Danuravir derivatives, the most potent HIV- protease inhibitor known so far. Combined study of 3D-QSAR was applied for Danuravir derivatives using ligand-based and receptor-based protocols and generated models were compared. The results were in good agreement with the experimental results. Additionally, docking analysis of most active 32 and least active 46 compounds into wild type and mutated protein structures further verified our results. The 3D-QSAR and docking results revealed that compound 32 bind efficiently to the wild and mutated protein whereas, sufficient interactions were lost in compound 46. Conclusion The combination of two computational techniques would helped to make a clear decision that compound 32 with well inhibitory activity bind more efficiently within the binding pocket even in case of mutant virus whereas compound 46 lost its interactions on mutation and marked as least active compound of the series. This is all due to the presence or absence of substituents on core structure, evaluated by 3D-QSAR studies. This set of information could be used to design highly potent drug candidates for both wild and mutated form of viruses.


Background
Human immunodeficiency virus (HIV) is a retrovirus that is peril to human health, responsible to cause AIDS, an immunodeficiency syndrome. The disease presents a serious health care challenge because each year it affects an increasing number of people across the globe [1]. To combat disease, several new drugs were approved by FDA which reduces the morbidity and mortality of HIV infection. These drugs are categorized as HIV-Reverse transcriptase (HIV-RT), HIV-Integrase (HIN-IN) & HIV-Protease inhibitors (HIV-PIs), the major targeted enzymes of HIV life cycle. HAART (highly active antiretroviral therapy) is the most promising anti-AIDS therapy including these inhibitors in combination. The major obstacle in the use of HAART therapy is resistance that virus develops [2]. The hyper-mutability of HIV, drug resistance and their side effects are the biggest challenge to develop an effective anti-AIDS therapy.
HIV-1 Protease is emerging as one of the major druggable target for the development of new chemotherapeutics. HIV protease inhibitors, restrain the viral maturation by preventing the formation of structural and functional proteins and form immature, non-infectious virus. However, it is highly prone to develop mutations, since it is a homodimer and a single mutation of gene causes double mutation of enzyme [3]. Structurally, HIV protease is a homodimer protein, containing 99 amino acids in each chain, with an active site located at the dimer interface [4]. The protein is composed of three regions; catalytic core (Asp25, Gly27, Ala28, Asp29 and Asp30), flap (Ile47, Gly48, Gly49, and Ile50) and the Cterminal region (Pro81, and Ile84). From literature, Asp25, Gly27, Ala28, Asp29 and Gly49 are known to be highly conserved residues to which a potent inhibitor may bind strongly. Mutations of HIV protease at Val32, Ile50 and Ile84 (hydrophobic residues, close to binding pocket) are responsible for the resistance to most FDA approved drugs due to loss of Vander Waal interactions [5]. Almost all FDA approved anti-AIDs drugs are resistant to I84V mutant virus and became ineffective against disease.
The failure of drug therapies against mutated virus protein encouraged the scientists to develop more potent, effective and stable second generation HIV-PIs, but still the HIV-PI therapies are associated with the serious problems that limit their significance and effectiveness [6]. In order to take a forward step for prediction and guidance of more effective drug, 3D-QSAR studies were conducted as primitive step in finding new inhibitors using a dataset of 102 (R)-hydroxyethylamino sulfonamides derivatives from literature [7].
3D-QSAR technique is subdivided into ligand-based and structure-based methods. Ligand-based approach is frequently applicable in the absence of experimentally resolved protein crystal structure whereas, structurebased method extract the protein bound ligand information for the generation of align model [8][9][10]. In the present work, both strategies were applied to generate the CoMFA and CoMSIA models and their comparison with reference to the most active moiety Darunavir (hydroxyethylamino sulfonamides derivatives). Extensive research is ongoing that used different scaffolds, methodology and algorithms for predicting better results.
Darunavir (DRV) is one of the most attracting targets as it is the most active molecule among eleven FDA approved drugs of present time [11]. The obtained models revealed the significance of stereoelectronic properties, hydrogen bonding characteristics and structure variations leading to changes in the interaction profile. The influences of grid distances, alignment methods and combination of charges were explored out of which the best model was selected. Additionally, molecular docking of compounds explored the binding affinity of highly active and least active compounds with its receptor by using GOLD docking suit [12]. The purpose of the study was to validate the experimental results obtained with Darunavir derivatives and to predict the compound that may developed into a more potent HIV inhibitor based on outcomes extracted from the current study.

Results and discussion
Protease active site is composed of catalytic triad having two C2 symmetrical monomeric units, Asp25 (25')-Thr26 (26')-Gly27 (27'). This triad is surrounded by amino-acids, classified into S1 (1') and S2 (2') sub-sites, which mostly include the hydrophobic amino-acids [13]. However, on ligand binding, Protease behaves as asymmetrical monomer [14]. Darunavir, an FDA approved drug has shown extensive hydrogen bonding with protease backbone, especially with S2 sub-site of protease, Figure 1 Core structure and dataset alignment. a) Core structure of danuravir derivatives with marked points used for alignment, b) Ligandbased alignment by using most active compound 32 as template, c) Structure-based alignment using cognate ligand of 3QOZ.pdb as reference.
moreover it also retained interaction with mutated protein [15].
In the present work, the additive model of Jorissen R.N. et.al., [7] was further subjected to 3D-QSAR using CoMFA & CoMSIA techniques and the generated contour maps were further validated by molecular docking.

Statistics of the ligand-based models
The reliability of CoMFA and CoMSIA models were highly dependent on the better alignment of molecules in a three dimensional space. The database alignment implemented in Sybyl7.3 [16] was used to align 102 compounds using most active compound 32 as a template. The core structure of compound 32 was chosen as a structural element for superimposition of all other compounds ( Figure 1a). The alignment is shown in Figure 1b and c. The statistical model of training and test tests (Tables 1 and 2) generated for the initial data set was depicted in Table 3. From the results, it can be deduced that lowering the grid space showed negative impact on the model. The default value of the grid space was selected as best and was used for further studies. To validate the model by external test set, activities of 24 compounds were predicted and the residual values for external and internal data sets were evaluated ( Table 2). The best model with convincing statistical results is shown in Table 3 and the residual value for the best model was found to be less than 1 in both training and test sets as mentioned in Tables 1 and 2. Furthermore CoMSIA was applied on the same dataset and the results are tabulated in Table 4.

Statistics of the receptor based models
In ligand-based approach, several combinations of charges and grid spacing were used. Among them, the model generated by using MMFF94 charges was retrieved as the best model with q 2 value of 0.74, standard error of prediction was 0.99 and the r 2 value of 0.96. The results are summarized in Table 4. For structurebased method, the bound conformation of Darunavir in the crystal structure of HIV protease (PDB: 3QOZ) [17,18] was used as a template to align the series of 102 compounds ( Figure 1c). As shown in Table 1, the structure-based QSAR method returned with the q 2 value of 0.682, r 2 of 0.938, F value of 178.46 and lower standard error of estimate and standard error of prediction with an average residual values of 0.077. While the r 2 value of the test set was 0.947. This statistical evaluation showed that the performance of the structurebased method was comparable to the ligand-based approach for CoMFA studies (Table 3). In CoMSIA, cross validated value of 0.664 and 0.751 was obtained for the structure-based and ligand-based methods, respectively. The CoMSIA analysis is tabulated in Table 4. Similarly, predictive r 2 value was 0.927 and 0.929 for structure-based and ligand-based methods, respectively.

Contour maps of CoMFA
CoMFA contours of different colors represented different fields i.e. steric (bulky favored-green whereas yellow is indicative of bulky disfavored area). Similarly, blue and red regions described electron donating and accepting groups would be favored or disfavored, respectively. Figure 2a and 2c displayed CoMFA generated steric and electrostatic contour maps for ligand-based and structure-based models, respectively. The most active compound 32 was superimposed on the steric and electrostatic contours maps for clear illustration.
The analysis of contour maps generated by ligand and structure-based methods showed that the electronegativity (red polyhedral) is favored at R 1 position in compound 32 where 3-phenyloxaolidin-2-one ring is present. While, the presence of prop-1-ene group at this position in compound 46 has a negative effect on the biological activity depicted in Figure 2b (ligand-based) and 2d (structure-based). Similarly, electropositivity (blue contours) is favored between benzene ring and nitrogen of 3-phenyloxaolidin-2-one in compound 32. The increase or decrease in electronegativity, represented by red contours at R 1 , indicated its effect on observed biological activities. If we compared compounds 28-31 with 7-14, it was found that they have huge difference in their inhibitory activity due to the difference in number of electronegative fluorine at R 1 position which buried near red isopleth. Even the compounds having propanone moiety at same position, more declined activity was observed. Second red polyhedral was observed near R 2 position, surrounded the isobutane moiety of compound 32, which demonstrated that the substitution of electronegative element at this position could further enhance the biological activity of the compound 32. At R 2 position, less bulky group would be favorable for biological activity, indicated by yellow polyhedral. Compounds 4, 6, 21 and 55 contained bulky group at this position and considered as less effective with inhibitory activity as compared to active. Similarly, comparison of compound 43 with template 32, it was revealed that replacement of 2-methyl thiophene with less bulky substituent at R 2 position would help to enhance its inhibitory activity. A large green polyhedral found near R 3 position indicating if replaced anisole moiety of compound 32 with more bulkier group would be beneficial for better activity.
Presence of methoxy phenyl at para position of compound 32, strongly favored the inhibitory activity as electronegative and bulky group is required at R 3 position.
Compounds which pose methoxy phenyl group at this position, showed activity not less than 8.38. While compounds 43 and 46 contained isoxazole group at this position, could be the reason of their reduced activity.

Contour maps of CoMSIA
The CoMSIA steric and electrostatic descriptors were found to be identical with the CoMFA generated models, which proved the consistency of the results. Moreover, the results of other three descriptors of CoMSIA also improved the drug prediction. The hydrogen bond donor and acceptor descriptors revealed the reason of higher activity of compound 32. At R 1 position of 32, the purple polyhedral is surrounded which showed that this is donor disfavored region. In compound 32, this donor disfavored region is supplemented by the presence of highly electronegative elements in 3phenyloxazolidin-2-one ring. At R 2 position hydrogen bond acceptor is disfavored (red polyhedral); at this position an alkyl chain is present in compound 32. At R 3 position a hydrogen bond acceptor is favored (magenta polyhedral), which is supplemented by the presence of methoxy group. In contrast, these properties are absent in least active compound 46 which possibly the reason of its lower activity.
The hydrophobic descriptor of CoMSIA is important to evaluate the hydrophobicity required to sustain the biological activity of any compound. At R 1 position, hydrophobicity is highly disfavored (white isopleth) whereas R 3 is hydrophobic favored (yellow contours) region. As shown in Figure 3c, compound 32 contained nitrogen containing hydrophilic moiety at R 1 position while this hydrophilic moiety is absent in compound 46 (Figure 3f ). In compound 32, the R 3 position is substituted with the phenyl-methoxy group while compound 46 contained hetero-atomic methyl-isoxazole moiety at R 3 position, showed that hydrophilic substitution at R 3 position would decrease the biological activity of compound 46. The CoMSIA contour maps of compound 32 and 46 with ligand-based and structure-based approaches are presented in Figures 3 and 4, respectively.

Docking results
To validate the 3D-QSAR results, docking simulation was performed and the most active compound 32 and least active compound 46 was evaluated for their binding interactions in the active site of protease and results were compared. Initially, the performance of docking software was tested by re-docking experiment. For this purpose, crystal structures of two proteins with their cognate ligands were retrieved from PDB and the cognate ligands were re-docked. The results are summarized in Table 5. The superimposed view of docked conformation and the reference ligand is presented in Figure 5a-b. Based on the re-docking results, GOLD was used for docking. The comparison of the scores attributed by two scoring functions as Gold-Score and Chem-Score also Where: q 2 : cross validated correlation coefficient, r 2 : non-cross validated correlation coefficient, r 2 pred : prediction of external test set for validation, F Fischer test values, C optimal number of Components, SEE Standard Error of Estimation, SEP Standard Error of Prediction, %1-5: percentage contribution of descriptors in the field, respectively, S Steric field, ES Electrostatic field, H Hydrophobic descriptor, D hydrogen bond Donor field, and A hydrogen bond Acceptor field.  showed the compound 32 to be more active than 46 in both wild and mutated proteins. However, Gold-score showed drastic difference between the scores of two compounds which can be assumed on this basis to more accurate than Chem-Score.
On the basis of docking analyses, it was revealed that compound with highest activity (32) ranked at top position as compared to least active compound 46. The docking scores were in correlation with 3D-QSAR and experimental results. The docked conformation of compound 32 in wild type (Figure 6a-b) and mutated proteins (Figure 6c-d) revealed that compound interacted with the binding pocket residues of targeted proteins through several favorable interactions including polar, hydrophobic, hydrogen bonding and the weak Van der Waal contacts.
The carbonyl oxygen of the core structure near R 1 position also mediated strong hydrogen bonding with the backbone amino group of Asp29' and Asp30'. Moreover, hydrophobic interactions were observed between Ile50 and core group of compound 32 and acetophenone with Arg8. Pro81' also mediated hydrophobic interaction with the methyl group of methoxybenzene present at R 3 position. Furthermore side-chains of S 1 ' residue Val82' mediated CH--π contact with the hydrophobic portion of the ligand at R 3 position. Val32' mediated CH 3 -π interactions with the core benzene of compound 32.
The observed docked conformation of compound 32 in the mutated protein (I84V) was flipped at~90°, showed in Figure 6c-d. Even with this orientation, the ligand was found to be interacting with several important residues including Gly27, Gly27', Asp25, Asp25', Asp29, Asp29', Ile50, GLy49' and Ile50'. In this case the Gly27 interacted with R 2 substitution and Gly27' with core structure of compound 32. A hydrogen bond was observed between the side chain oxygen of Asp25' and the hydroxyl of compound 32 (2.04Å). Furthermore Asp29' mediated a strong hydrogen bond with oxygen atom of R 1 3-phenyloxazolidin-2-one ring with the distance of 1.95Å. Moreover, the compound is stabilized by the hydrophobic interactions offered by Ile50, Gly49' and Ile50'. These interaction patterns of compound 32 with the wild type and mutated forms of protein suggested that the modification at R 2 position could increase the activity of compound. This hypothesis further confirms the results obtained by CoMFA.  The docked conformation of compound 46 in the wild type protein (Figure 6e-f ) revealed that it formed CH 3 -π interaction with side chain of Ile50 and Val82', however, core benzene of compound 46 also mediated aromatic interaction with Pro81'. On the other hand, Asp25 interacted with hydroxyl oxygen of core structure whereas Ile50' attracted towards oxygen of sulfonamide near R 3 substituent.
The terminal methoxy oxygen at R 1 mediated interactions with the wild type protein's amino group of Asp29 and Asp30 with the distance of 2.29Å and 1.7Å, respectively. The interactions of compound 46 with these residues were lost upon mutation (Figure 6g-h). The binding orientations of compound 32 and 46 ( Figure 6) revealed that compound 32 maintained its interactions with the active site residues in wild type as well as in mutated protein while compound 46 lost most of its binding interactions in mutated protein as shown in Table 6.

Conclusion
In the present work, comparison of ligand and structurebased 3D-QSAR using CoMFA and CoMSIA were derived for HIV-1 protease inhibitors. The statistics of both models were convincing and comparable. The model was significantly favored by internal and external predictions as well as visualization of contour maps. The effect of important structural characteristic of the potent inhibitor was predicted by  the generated model. From the predictions, it was evident that at R 1 position electronegativity is favored due to presence of Asp29 in its vicinity and hydrophobicity is disfavored which is relevant with the presence of methyloxazolidione ring in compound 32. Docking results also showed that terminal methoxy oxygen at R 1 mediated bidentate interactions with the amino group of Asp29 and Asp30 which was lost in compound 46. At R 2 position, bulkiness is disfavored whereas at R 3 ; hydrophobicity is favored which is evident by presence of methoxy phenyl in compound 32. The docking studies of most potent and least active inhibitors further verified the generated 3D-QSAR models and can be used as guidance for better drug development.

Dataset preparation
The dataset of 102 compounds was retrieved from literature reported by Jorissen R.N. et al., [7] and available in Additional file 1. 2D structures were drawn by Chem-Draw [19] and converted into 3D by MOE (Molecular Operating Environment) program [20]. The biological activities of all compounds were shown in Table 1 along with its negative logarithmic units, pIC 50 values. Stereochemistry and atom typing were confirmed for each compound. Three different charges i.e., GH, AM1BCC and MMFF94 were applied to the dataset and all three sets were subjected to the database alignment by using sybyl7.3 [16]. The database alignment is depicted in Figure 1. The core structure of most active compound 32 (pIC 50 = 12.10) was used as a template for alignment [21] in ligand-based QSAR. On the other hand, for structurebased QSAR, bound conformation of original compound was used as template for alignment.

CoMFA & CoMSIA 3D-QSAR models
The dataset of 102 compounds were segregated into training and test sets containing 78 and 24 compounds, respectively (Tables 1 and 2). Each set was constructed on basis of regular distribution of biological activities (Table 1). Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA) with 2Å grid spacing, sp 3 carbon probe atom with a charge of +1 and VdW radius of 1.52Å was used to calculate steric and electrostatic field descriptors. In order to reduce noise and improve efficiency, column filtering of 2.0 kcal mol -1 was used [16]. A default cutoff of 30 kcal

84' CH…π
Note: Distances of important interactions are shown in Å, however, all interactions mentioned here having distances of less than 3Å.
mol -1 was used for field energy calculations. Subsequently partial least square (PLS) analysis was performed to obtain 3D-QSAR model. The optimal number of components was determined by leave-one-out procedure (Cross validation) to build the statistical significant regression model. The quality of the model was judged by cross-validated coefficient q 2 which should not be less than 0.5. The external predictivity was calculated by conventional correlation coefficient r 2 [22,23].

Molecular docking by GOLD
The dataset of 102 compounds was subjected to docking in order to validate the QSAR results via GOLD docking suit [12]. The emphasis was totally on most active and the least active compounds to evaluate their quality of interaction as HIV-1 protease inhibitors. For docking, wild type (PDB: 3EKV) [24], and mutated I84V (PDB: 3NU9) [25] proteins were retrieved from Protein Data Bank (PDB) [26] in order to check the consistency of ligand's interactions even if mutated viral attack is present.
The cognate ligand and water molecules were removed, and polar hydrogens were added. Software was validated by re-docking and root mean square deviation (RMSD) calculations shown in Table 5 and Figure 5. Default GOLD docking parameters were used with Gold-score and Chem-score as scoring and rescoring functions. For each ligand, ten docked poses were saved and analyzed.

Additional file
Additional file 1: Darunavir derivatives with all sibstitutions. Core structure of darunavir with positions marked for substitutions and structures of substituents at R 1 , R 2 and R 3 positions along with their experimental inhibitory activities.