Query the NCBI databases with perl

From BITS wiki
Jump to: navigation, search

The easiest way is to let the NCBI-Ebot capture your steps, which you follow to reach your steps (by clicking on the different buttons and links present in NCBI search pages). The NCBI-Ebot then generates a perl script, which you can keep and adapt, or at least reuse with identical results.


Main advantages over performing manual searches over time

  • You store your method in a self-contained script and are able to re-run the script on regular basis to obtain updated results (reproducibility)
  • When adapting the code, you can include it in your own pipeline to build a software solution, based on the complex query but extending further (flexibility).
  • The Perl code provided by NCBI is clear enough and is expected to be stable in time unless NCBI drastically change their platform (simplicity).


Example usage

Search for a gene symbol (SOD) in the Gene database, then link-out to all SOD-related pubmed articles and perform a new search to limit this list to relevant papers. The final result is a list of articles which you may want to get as a reference list or as counts depending on your needs.


  • The two steps approach is more efficient than including the gene symbol ‘SOD’ directly in a standard PubMed query as it will avoid ambigous acceptation obtained from homonymous terms (some gene names are very common english words and can be taken out of the gene-name context).


We present in the PDF below a series of screen shots describing the few intuitive steps involved in creating the perl code (ony the top part of the code is added in appendix as the remaining ~1000+ code lines are standard subroutines shared by all eBot scripts).


link to the PDF file: (BITS_NCBI-eBot-tutorial.pdf).