Prevent Excel from altering your data

From BITS wiki
Jump to: navigation, search

Importing data into excel

You often get data from third party tools and need to pass through excel to add extra columns or simply clean the data a bit. This can lead to real problems if Excel 'decides' to change some of your data without telling you (and the nasty little pest will do without hesitations).

gene symbols and dates

Excel has the nasty habbit to translate some gene symbols to dates while importing from a text file. This is very nasty has it will often be overlooked when importing from a large file where only few rows will be affected.

The simple way to avoid this is to carefully check every page during Excel import and making sure that any column with ambiguous words will be treated as 'Text' rather than the default 'General'.

A list of ambiguous symbols for human was built and is presented below (download link)


ambiguous_symbols.png

The wrong way

The default format for the symbol column is 'General' an will lead to interpreting the symbols as they would be abbreviated dates.

"general format"
 
general-format.png
general-result.png

Mortasecca.png Warning: The gene symbols have become dates, using such file will make those genes unrecognized by downstream programs

The better way

Setting the column format to 'Text' forces Excel to leave the content as-is.

"text format"
 
text_format.png
text-result.png

number separators

Another often painful issue goes for decimal and thousand separators that might be set to a standard different from that of your country or for the next program reading the data. This can lead to tiny numbers turn into billions which can be a problem for things like 'p.values'.

The place to take care of separators is accessible by clicking on the 'Advanced...' button and set both parameters correctly.

"number separator"
general-format.png
=> click on Advanced...
decimal.png
thousands.png



[ Main_Page ]