Crosstabulate data in a text file.
xtab.py is a Python module and command-line program that rearranges data from
a normalized format to a crosstabulated format.
A summary of its capability and usage is shown below. Full documentation
is available at http://xtab.osdn.io/.
Required and Optional Arguments
Copyright and License
You can install the program either manually or using Python's *pip* installation tool.
To use *pip*, run the following command in the console:
pip install xtab
Alternatively, the *xtab.py* file can be downloaded and placed wherever
You can use the xtab program to:
* Rearrange data exported from a database to better suit its
subsequent usage in statistical, modeling, graphics, or other
software, or for easier visual review and table preparation.
* Convert a single file (table) of data to a SQLite database.
* Check for multiple rows of data in a text file with the same
Required and Optional Arguments
The name of the input file from which to read data. This must be a text file,
with data in a normalized format. The first line of the file must contain
The name of the output file to create. The output file will be created as
a .csv file.
-r <column_name1> [column_name2 [...]]
One or more column names to use as row headers (space delimited). Unique
values of these columns will appear at the beginning of every output line.
-c <column_name1> [column_name2 [...]]
One or more column names to use as column headers in the output (space
delimited). A crosstab column (or columns) will be created for every unique
combination of values of these fields in the input.
-v <column_name1> [column_name2 [...]]
One or more column names with values to be used to fill the cells of the
cross-table. If n columns names are specified, then there will be n columns
in the output table for each of the column headers corresponding to values
of the -c argument. The column names specified with the -v argument will
be appended to the output column headers created from values of the -c
argument. There should be only one value of the -v column(s) for each
combination of the -r and -c columns; if there is more than one, a warning
will be printed and only the first value will appear in the output.
(That is, values are not combined in any way when there are multiple values
for each output cell.)
Prints output column headers in two rows. The first row contains values
of the columns specified by the -h argument, and the second row contains
the column names specified by the -v argument. If this is not specified,
output column headers are printed in one row, with elements joined by
underscores to facilitate parsing by other programs.
Use a temporary (sqlite) file instead of memory for intermediate storage.
-k Keep (i.e., do not delete) the sqlite file. Only useful with the "-f" option.
Unless the "-t" option is also used, the table name will be "src".
Name to use for the table in the intermediate sqlite database. Only useful
with the "-f" and "-k" options.
Log all error messages, to a file if the filename is specified or to the
console (stderr) if the filename is not specified.
Log the sequence of SQL commands used to extract data from the input file
to write the output file, including the result of each command.
Print a summary of the command-line arguments and exit.
* Multiple data values can be crosstabbed, in which case the output
will contain multiple sets of similar columns.
* Either one or two rows of headers can be produced in the output file.
One row is the default, and is most suitable when the output file will
be further processed by other software. Two rows facilitate readability
when the output contains multiple sets of similar columns.
* The xtab program does not carry out any summarization or
calculation on the data values, and therefore there should be
no more than one data value to be placed in each cell of the output
table. More than one value per cell is regarded as an error, and in
such cases only one of the multiple values will be put in the cell.
* Error messages can be logged to either the console or a file. If no
error logging option is specified, then if there are multiple values
to be put in a cell (the most likely data error), a single message
will be printed on the console indicating that at least one error of
this type has occurred. If an error logging option is specified,
then the SQL for all individual cases where there are multiple values
per cell will be logged.
* The SQL commands used to extract data from the input file for each
output table cell can be logged to a file.
* As an intermediate step in the crostabbing process, data are converted
to a SQLite table. By default, this table is created in memory.
However, it can optionally be created on disk, and preserved so that
it is available after the crosstabulation is completed.
* There are no inherent limits to the number of rows or columns in the
input or output files. (So the output may exceed the limits of some
* Input and output file names, and column names in the input file that
are to be used for row headings, column headings, and cell values are
all required as command-line arguments. If any required arguments are
missing, an exception will be raised, whatever the error logging option.
Copyright and License
Copyright (c) 2008, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version. This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details. The GNU General Public License is available at