Excel Worksheets
Previous Topic  Next Topic 

Stat/Transfer will read and write files from Excel.  It will read all versions and will write Version 2.1 files and files for Excel 97 and higher versions. Note that Excel 97 and higher versions support long strings (up to 32K) and more (up to 64K) records in a worksheet.

Excel 2007 has vastly increased limits over earlier versions. The maximum number of columns has been increased from 256 to 65,536 and the maximum number of rows has increased from 16,384 to over one million. Note however, this does not mean that Excel is an appropriate tool for very large datasets. You will find that the time to load very large files is agonizingly slow.

Excel 2013 has added a new “Strict Open XML” format.   Stat/Transfer also will read this format.

Standard extension: xls

Reading Excel Worksheet Files

Because worksheet files are in general not designed to hold statistical data, only worksheets in certain formats can be read.

Worksheets must be in worksheet database format or in certain modifications of this.  It is most straightforward to read worksheets in database format.

Database Format

Worksheet database files are structured worksheets where each row is a single case and each column contains a variable.  Data can consist of numbers (including serial date numbers), labels, or formulas.

The first non-blank row of a worksheet database file usually has strings in each column that give the names of the variables.  The data then begins in the next row. However, variable names may be in different rows or not present at all.

You have several options to specify what part of an input worksheet to read and how to read variable names.  An option allows you to read variable labels from the row after the variable names. 

Data Range

You can choose different ranges to be read in input worksheets by using the drop-down menu for Data Range in the Worksheets section of the Options dialog box or using the SET command, WKS-DATA-RANGE.

If you use Autosense (the default), Stat/Transfer will read to the first non-blank cell and use that as the upper left corner of the data range.  It will then read data until it encounters an entirely blank line.  This is the default behavior.  You can change the behavior with respect to blank rows by using the Blank Rows option.

Rather than have Stat/Transfer automatically sense the number of rows to be read, you can use the other options for Data Range to specify a range, either by giving a named range or by giving explicit coordinates.

When a range has been specified, Stat/Transfer's treatment of entirely blank rows will also be overridden.  They will be returned in your output data and, in addition, blank rows at the end of the worksheet, through the last row of the specified range, will also be returned.

Determining Variable Names

By default, Stat/Transfer will attempt to determine whether or not the data in the first non-blank row (or the first row of a specified range) are variable names or the first row of data.  It does so by looking for at least one column in which there is a string in the first row and a number in the second.  If this behavior is inappropriate for your worksheet (for example, if you have only string data), you can specify different options in the Field Name Row drop-down menu of the Worksheets section of the Options dialog box.  You can specify that variable names must be taken from the first non-blank row, that they be taken from a specific row or that the worksheet does not contain variable names, so that Stat/Transfer should assign them ('col1' through 'coln').

Determining the Data Types and Widths

After identifying the label row, Stat/Transfer will look at the entire column. If any cell contains a string, the entire column will be treated as string data (with numbers and dates converted to strings).  If the column is all numeric or blank, it will be treated as a numeric column.

The width of the column for each numeric variable and the format of the first non-blank data cell in that column are used, where possible, to set the default target, or output, types for the numeric variables.  If the format of the first data row has any decimal places (for example, F(2)), the target type will be 'float'.  On the other hand, if the cell format has no decimal places (for example, F(0)) the target types will be various flavors of integers which depend on the column width.  If the column width is less than three, the target type will be 'byte'.  If the column width is less than five, the target type will by 'int'.  Otherwise the target type will be 'long'.  Any date format in the first data row will set the target type to 'date'.

The maximum width of character variables is determined by examining the widths of all of the strings in a column.

Stat/Transfer is lenient in typing variables from worksheets.  If it is expecting a character variable and it encounters a number it will convert it to a string.

Combining Multiple Input Worksheet to a Single Output File

The option Concatenate Worksheet Pages, found in the Worksheets section of the Options dialog box allows you to combine worksheet pages into a single output file.  This option is appropriate if your worksheet contains many sheets that are identical in structure.  These can be then be combined into a single output file of any type.

For example, you may have a workbook that has 50 sheets, with one sheet for each state and the same variables on each sheet.  If you check this box, Stat/Transfer will effectively combine the sheets into one large input file, dropping the field names, if necessary, on the second and higher sheets. You will end up with the data from all of your worksheet pages in a single output file.

Writing Excel Worksheet Files

On output, Stat/Transfer will write variable labels in the first row of the worksheet.  Data values will be placed in the second and succeeding rows.

Column widths and formats will be determined by the variable information available.  Dates and character variables are straightforward.  For numerical data, information on the width and number of decimal places of variables, where available, is used to set the column widths and formats.

Missing Data

On input, blank cells and cells containing labels consisting of a single dot are read as missing.

If there is a string in your worksheet, such as 'NA', that you would like to have treated as a numeric missing value, you can specify it using the Numeric Missing Value option in the Worksheets section of the Options dialog box. 

When transferring data to worksheets from other formats, missing values will be written out to worksheets as blank cells.

Output Variable Types

The output variable type that results from each target variable type is given in the following table:

Target Type

Output Type






Numeric cell (formatted if information available)




Serial date number (with date format)


Time fraction (with date format if available)


Date number + time fraction

(with date/time format if available)