The importance of multiple peptides hits depends upon their prevalence in the data set Protein probability is accurate because peptide probabilities are When different search engines agree, the peptide identification is more likely valid Searching with several search engines will find more peptides The best approximation for each peptide is learned from the distribution of all peptides The data set has enough correct matches so that two distributions can be fit to the histogramĬorrect proteins will have peptides in the correct peptide distributions The data set has enough spectra to fit curves to the histogram The data set has both correct and incorrect peptide spectrum matches Probabilities displayed are estimates of true probabilities What we can and will do is explore how you can get a feel for the accuracy of your data. This would so clutter the results that they would be incomprehensible. Scaffold can’t display error bars for each estimate. Within the accuracy of MS/MS experiments and algorithms, these numbers are indistinguishable. For example, you might think that a protein with a probability of 81% is more likely than one with a probability of 75%. The “sort of” part of the answer to the “When should you trust the statistics?” is short-hand for “Beware: Probabilities displayed are estimates.”ĭon’t be fooled by the significant digits displayed since each estimated probability comes with error bars that you can’t see. But you might reasonably ask, “How can I tell if these assumptions hold on my data sets?” All software makes assumptions: as you will see below Scaffold also gives you tools to check its assumptions. The “most of the time” part of the answer is because the assumptions (Table 1) underlying Scaffold have been validated on a wide variety of data sets. But do you really believe these computer algorithms? Should you? Or perhaps a better question is, “When should you trust the statistics?”įor Scaffold, the answer is “most of the time” and “sort of”. This works neatly if you believe Scaffold’s statistical algorithms give correct probabilities. It transforms the search engine scores into statistical probabilities that makes protein identifications easier to validate. Scaffold validates MS/MS based protein identifications by analyzing tandem mass spectrometry data that has been processed by several search engines. How can you recognize when this is happening? What you can do about it? But if you are doing something different, the underlying assumptions may not hold this could mean wrong or missed identifications. Doing proteomics the same way that the lab which created a given program does proteomics, then these assumptions probably hold and the statistics will probably make sense.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |