Additional statistics methods for arrays:
Filter:
MathLib/Guides (extension) | Libraries > MathLib > Statistics

Additional statistics methods for arrays
ExtensionExtension

Additional statistics methods for arrays
 

MathLib provides additional methods for Collection and SequenceableCollection.

Here's a pseudo-normal distribution which we'lll analyse in the following

Various measures to characterise the distribution

Measures such as variance and skewness are commonly calculated on the assumption that the data is a sample from the true (larger) population which we wish to know about. The above measures apply in this case. The following alternatives are used when the data represents the entire population:

percentile

Finds the requested percentile(s) of the distributions, specified as float values from 0 to 1 - e.g. for the 90 percentile use 0.9.

Histogramming

histo partitions the distribution into a set of equal-width bins (default 100 bins):

histoBands gives you the corresponding bin centers (same arguments as histo; argument 'center' determines whether you get the center value (default 0.5), the left (0.0) or right (1.0) edge of the bin, or anything in between). This can be useful for creating an annotated plot.

The weighted mean and variance functions can be used to estimate the mean and variance if all you have is histogram-like data:

Statistical measures of association

Pearson correlation

Kendall's W statistic

A non-parametric correlation test between separate raters' rankings of a common set of objects.

The input array should be an array-of-arrays, each of which is the same size and contains integer rankings. The output varies from 0 (no inter-rater agreement) to 1 (perfect inter-rater agreement). The list of rankings can range (0 .. N-1) or (1 .. N), that won't affect the statistic. The example used in Kendall's original paper (W value should be around 0.16):

Principal Component Analysis

The pc1 method finds the first principal component of a multidimensional data distribution. It doesn't calculate the full PCA, but finds the first PC via expectation-maximisation. The data must already be centred (mean removed) and any scaling issues dealt with appropriately. The termination threshold can be set via an argument to pc1.

Autocorrelation

Fitting

linearFit offers fitting via simple linear regression (least squares).

theilSenFit offers offers a robust linear regression by finding the median slope of all data pairs.