Statistics Window
The Statistics Window shows statistics for the values in each
of the table's columns.
You can display it using the Column Statistics (
)
button when the chosen table is selected in the
Control Window's Table List.
The calculated values are displayed in a
JTable widget with a row for each
column in the main table, and a column for each of a number of
statistical quantities calculated on some or all of
the values in the data table column corresponding to that grid row.
The following columns are shown by default:
-
Name
- The name of the column in the main table represented by this grid row.
-
Mean
- The mean value of the good cells. For boolean columns, this is
the proportion of good cells which are True.
-
S.D.
- The standard deviation of the good cells.
-
Minimum
- The minimum value. For numeric columns the meaning of this is quite
obvious. For other columns, if an ordering can be reasonably defined
on them, the 'smallest' value may be shown. For instance string
values will show the entry which would be first alphabetically.
-
Maximum
- As minimum, but shows the largest values.
-
Good cells
- The number of non-blank cells.
Several additional items of statistical information are also calculated,
but the columns displaying these are hidden by default to avoid clutter.
You can reveal these by using the Display menu:
-
Index
- The index of the column in the table, i.e. the order in which
it is displayed.
-
$ID
- The unique identifier label for the column in the main table.
-
Sum
- The sum of all the values in the column. For boolean columns this
is a count of the number of True values in the column.
-
Variance
- The variance of the good cells.
-
Row of min.
- The index of the row in the main table at which the minimum value
occurred.
-
Row of max.
- The index of the row in the main table at which the maximum value
occurred.
-
Bad cells
- The number of blank cells; the sum of
this value and the Good cells value will be the same for each
column.
-
Cardinality
- If the column contains a small number of distinct values
then that number, the column's cardinality will be shown here.
Cardinality is the number of distinct values which appear in that column.
If the number of values represented is large (currently >50) or
a large proportion of the non-bad values (currently >75%) then
no value is shown.
The quantities displayed in this window are not necessarily those for
the entire table; they are those for a particular
Row Subset.
At the bottom of the window is the Subset For Calculations
selector, which allows you
to choose which subset you want the calculations to be done for.
By clicking on this you can calculate the statistics for different
subsets.
When the window is first opened, or when it is invoked from a menu
or the toolbar in the Control Window,
the subset will correspond to the current row subset.
The toolbar contains the following extra button:
-
Recalculate
- Once statistics have been calculated for a given subset they are
cached and not normally recalculated again.
Use this button if you want to force a recalculation because the
data may have changed.
For a large table the calculations may take a short while. While they are
being performed you can interact with the window as normal,
but a progress bar is shown at the bottom of the window.
If you initiate a new calculation (by pushing the Recalculate button or
selecting a new subset) or close the window during a calculation,
the superceded calculation will be stopped.