STATISTICS command

Syntax: STATISTICS x { s1\keyword { s2\keyword ... }}
STATISTICS\PEARSON x y { rcof prob }
STATISTICS\MOMENTS w x n { sout }
Qualifiers: \MESSAGES, \WEIGHTS, \MOMENTS, \PEARSON
Defaults: \MESSAGES, \-WEIGHTS
Examples: STATISTICS X
STATISTICS\-MESS X XMED\MEDIAN XMEAN\XMEAN
STATISTICS\WEIGHTS W X XVAR\VARIANCE XSUM\SUM
STATISTICS\MOMENTS Y X 3 M3

The STATISTICS command calculates various statistics for the input variable x, which can be a vector or a matrix. Specific statistics are chosen with qualifier keywords which are appended to the output parameters with the backslash, \. All vectors must be the same size.

Table 1 below shows the parameter qualifier keywords and corresponding output values for extrema. Table 2 shows the parameter qualifier keywords and corresponding output values for central measures. Table 3 shows the parameter qualifier keywords and corresponding output values for dispersion and skewness.

Keyword Output Value
\MAX maximum value of x
\IMAX index of the maximum if x is a vector
row index of the maximum if x is a matrix
\JMAX column index of the maximum if x is a matrix
\MIN minimum value of x
\IMIN index of the minimum if x is a vector
row index of the minimum if x is a matrix
\JMIN column index of the minimum value if x is a matrix
Table 1: Extrema keywords

KeywordOutput Value
\SUMarithmetic sum (unweighted)
\MEANarithmetic mean
\GMEANgeometric mean
\MEDIANmedian value
\RMSroot-mean-square
Table 2: Central measure keywords

KeywordOutput Value
\VARIANCEvariance
\SDEVstandard deviation
\ADEVaverage deviation
\KURTOSISkurtosis
\SKEWNESSskewness
Table 3: Dispersion and skewness keywords

Informational messages

The default is to display all the calculated statistics. If the \-MESSAGES command qualifier is used, and if at least one output scalar is entered, then the statistics values will not be displayed.

Weights

Syntax: STATISTICS\WEIGHTS w x { s1\keyword { s2\keyword ... }}

You must use the \WEIGHTS qualifier to indicate that a weight vector is present. Weights cannot be applied to matrix data.

A weighting factor, w[i] ≥ 0, could be the frequency, the probability, the mass, the reliability, or some other multiplier. The lengths of w and x must be equal.

Definitions

Suppose that x is a vector with N elements.

If a weight vector, w, is entered, remember to use the \WEIGHTS command qualifier. The length of w is assumed to also be N. If no weights are entered, let wi default to 1, for i = 1,2,...,N. Define the total weight: W = w1 + w2 + ... + wN

Sum

The sum is defined by x1 + x2 + ... + xN

Mean value

The mean value, M, is defined by

M = (1/W)*[w1x1 + w2x2 + ... + wNxN]

Geometric mean

The geometric mean, Gx, is defined if each xi ≥ 0 by:

Gx = exp(1/W)*[w1log(x1) + w2log(x2) + ... + wNlog(xN)]

Median

The median is the element of x which has equal numbers of values above it and below it. If N is even, the median is the average of the unique two central values.

Root-mean-square

The root-mean-square, RMS, is defined by

RMS = sqrt([1/W]*[w1x12 + w2x22 + ... + wNxN2])

Variance

The variance, μ, is defined by

μ = [N/W(N-1)]*[w1(x1-M)2 + w2(x2-M)2 + ... + wN(xN-M)2]

Standard deviation

The standard deviation, σ, is defined by σ = sqrt(μ)

Average deviation

The average deviation, or mean deviation, δ, is defined by

δ = (1/W)*[w1|x1-M| + w2|x2-M| + ... + wN|xN-M|]

Skewness

The skewness, or third moment, skew, is a nondimensional quantity that characterizes the degree of asymmetry of a distribution around its mean. The skewness is a pure number that characterizes only the shape of the distribution, and is defined by

skew = (1/W)*{w1[(x1-M)/σ]3 + w2[(x2-M)/σ]3 + ... + wN[(xN-M)/σ]3}

A positive value of skewness signifies a distribution with an asymmetric tail extending out towards more positive x; a negative value signifies a distribution whose tail extends out towards more negative x.

Kurtosis

The kurtosis, kurt, is a nondimensional quantity which measures the relative peakedness or flatness of a distribution, relative to a normal distribution. A distribution with positive kurtosis is termed leptokurtic; a distribution with negative kurtosis is termed platykurtic. An in-between distribution is termed mesokurtic. The kurtosis is defined by

kurt = w1[(x1-M)/σ]4 + w2[(x2-M)/σ]4 + ... + wN[(xN-M)/σ]4 - 3

where the -3 term makes the value zero for a normal distribution.

Moments

Syntax: STATISTICS\MOMENTS w x n { s }

If the \MOMENTS command qualifier is used, the nth moment of vector x, with weight w, is calculated and optionally stored in output scalar s. The moment number, n, can be any integer > 0.

s = (1/W)*[w1x1n + w2x2n + ... + wNxNn]

Linear correlation coefficient

Syntax: STATISTICS\PEARSON x y { r p }

Pearson's r, or the linear correlation coefficient, is widely used as a measure of association between variables that are continuous. For pairs of quantities (xi,yi), for i = 1,2,...,N, the linear correlation coefficient r is given by the formula:

where    is the mean of x, and    is the mean of y.

The value of r lies between -1 and +1, inclusive. It takes on a value of +1 when the data points lie on a straight line with positive slope, x and y increase together. The value +1 holds independent of the magnitude of this slope. If the data points lie on a straight line with negative slope, y decreases as x increases, then r has the value -1. A value of r near zero indicates that the variables x and y are uncorrelated.

r is a way of summarizing the strength of a correlation which is known to be significant, but it is a poor statistic for deciding whether an observed correlation is statistically significant, and/or whether one observed correlation is significantly stronger than another. The reason is that r is ignorant of the individual distributions of x and y, so there is no universal way to compute its distribution in the case of the null hypothesis.

The STATISTICS\PEARSON command returns Pearson's r in the scalar variable r. It also returns scalar p, the significance level at which the null hypothesis of zero correlation is disproved. A small value of p indicates a significant correlation.

where I is the incomplete Beta function and t is defined by:

Examples

Suppose you have a vector X=[1.2;2.1;3.2;4.5;5;6;7]. Entering STATISTICS X produces the following display:

If you want to use the values for the maximum, minimum and mean of X, enter:

STATISTICS X XMEAN\MEAN XMIN\MIN XMAX\MAX

and you will have the scalars: XMAX=7, XMIN=1.2, and XMEAN=4.142857

If you also want the index values for the maximum and the minimum of X, enter:

STATISTICS X XMEAN\MEAN XMIN\MIN XMAX\MAX IMX\IMAX IMN\IMIN

and you will also have scalars: IMX=7 and IMN=1.