Protein quantitation using DIA/SWATH mass spectrometry relies on using high quality peptide MS/MS spectral libraries, however building such libraries to ensure deep proteome coverage can be time consuming and expensive. In order to address this issue various computational approaches for merging archived or external libraries were created and evaluated, including efforts from our group [1]. Such approaches are appealing, since they promise to expand the set of proteins that can be quantitated via DIA/SWATH, and potentially at low costs, considering the in-silico nature of the process. However, when using larger publicly available reference libraries for extension, the risk of introducing computational artefacts by these approaches can increase as well, and particularly so if the datasets themselves are large.
Here we describe the ways in which SWATH quantitative datasets obtained using local libraries and larger extended libraries can differ, in the context of several proteomics datasets including a recently published large plasma proteomics experiment containing samples from neonates, young children and adults [2]. We also describe a few simple principles that can be used to evaluate the process of library extension itself, in order to ensure that the proteins are reliably detected and their quantitation is consistent and reproducible. These steps are summarised in a recently described workflow [3]. Implicit in it is a filtering of the set of proteins quantitated via this project of extension, which can be used as needed depending on individual project goals.