We hope that you enjoyed the recent Primer on Lipophilicity and found reading it to be edifying and and educational. Although it really is an honour to write pieces like that one for such clever, cultured readers, we do need to return to the style with which our loyal readers associate us. The article that today blunders into the cross hairs bills itself as a contemporary perspective on solubility and hydrophobicity although we wonder if its authors truly get physical in drug discovery. The article, as you might guess from the title, explores relationships between solubility and logP or logD and it is instructive to read what the authors have to say about their data-analytic philosophy:
“Data plots with lines of best fit and unity gave a representation of the data, albeit with a statistical analysis, which did not adequately convey the distribution of data because of the large numbers. The distribution of values was better conveyed through normalized bar graphs and box plots using binned hydrophobicity and/or solubility values, which better represent the distribution of data in a more visually amenable manner.”
To paraphrase: We couldn’t find what we wanted to when we analysed the data so we drew some pictures instead.
OK, this assessment may seem harsh and we do admit that plotting data is certainly a good thing, especially as a precursor to analysis. However, we have shown you previously that weak trends can be made to look a whole heap stronger by hiding or masking variation and when you plot data enough you can end up seeing what you think should be there. Also, if you’ve got enough data then even the weakest trend becomes significant and we respectfully draw the attention of our readers to the tale (as opposed to the tail) of the 55% coin. When presenting trends, it’s really important to remember a trend’s strength is even more important than its mere existence.
So let’s get back to business and we’d like you to take a look at Figures 6a and 6b which illustrate the relationships between aqueous solubility and two different calculated lipophilicities, namely logP and logDpH7.4 that have been predicted using the ACD software. Solubility is ‘quantified’ as a series of bars that indicate the relative proportions of compounds in poor, intermediate and good categories. So hopefully, you’re still with us but please speak up if not. The lipophilicity values have been ordered into bins and as regular readers of the Crapshoot you’ll be wondering why they just don’t plot the data instead of putting it into all these bins. Now when you look at Figures 6a and 6b you might be thinking that the data is evenly distributed across the bins but if you look at the fine print on top of each bar, you’ll see this is most definitely not the case. Furthermore when you compare these numbers for corresponding bins in the two plots you’ll see that the distribution of the data across bins differs in the two plots. Not that you’d guess that from just looking at the plots and it does make meaningful visual comparison of the plots difficult.
So the authors would have us believe that ACD logDpH7.4 is a more effective than ACD clogP as a predictor of aqueous solubility. Let’s take a look at how they do this. Basically the ‘analysis’ consists of looking at the bar charts in Figure’s 6a and 6b and stating:
“The clearer stepped differentiation within the bands is apparent when log DpH7.4 rather than log P is used, which reflects the conisderable [sic] contribution of ionization to solubility.”
In other words, a beauty contest for charts.
However, we’re not quite done yet because we still need to take a look at the Solubility Forecast Index (SFI) although we have nasty feeling that we’re not going to like it when we do. SFI is defined as the sum of clogDpH7.4 and the number of aromatic rings (#Ar) and the equivalent bar chart to Figures 6a and 6b is shown in Figure 9. We are going to take a much, much closer look at SFI in another Crapshoot but for now let’s just see what the authors have to say about the bar charts:
“This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log DpH7.4 alone.”
So I guess you’re all wondering what the difference is between “clearer stepped differentiation within the bands” and “an increase in resolution”. Please let us know if you do find out because we’d love to know as well. We’d also like to know exactly how the authors define resolution because to speak of an increase in resolution is to make a quantitative statement.We really don’t have any answer to this question so, as an instructive excercise, we suggest that our readers might attempt to describe the relationship between Figure 6a and Figure 9. Bonus points will be awarded for answers presented in Limerick format.
Of course the raison d'ĂȘtre of the Crapshoot is not just to seek the funny side of Drug Discovery and we also like to provide practical advice that will be seen as helpful and constructive. We advise the authors to seek the opinion of a professional statistician as to whether beauty contests for bar charts constitute a valid method for asserting that one parameter provides a quantitatively better description than another of solubility (or indeed any other property of interest). We also believe that editors of journals greatly value feedback from those who occasionally read those journals and so we offer the following advice. Find out who reviewed the manuscript for you and make sure that they don't do any more.
Friday, January 27, 2012
Solubility forecast index awarded 5.7 for artistic expression...
Labels:
categorical sin,
data analysis,
ddt,
gsk,
literature reviews,
stamp collecting
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment