x/simplestatistics@v7.7.5

simple statistics for node & browser javascript
Latest
benchmarks/
scripts/
src/
test/
import * as simplestatistics from "https://deno.land/x/simplestatistics@v7.7.5/index.js";

Classes

bayesian

Bayesian Classifier

BayesianClassifier

Bayesian Classifier

perceptron

This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).

PerceptronModel

This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).

Variables

chiSquaredDistributionTable

Percentage Points of the χ2 (Chi-Squared) Distribution

epsilon

We use ε, epsilon, as a stopping criterion when we want to iterate until we're "close enough". Epsilon is a very small number: for simple statistics, that number is 0.0001

standardNormalTable

A standard normal table, also called the unit normal table or Z table, is a mathematical table for the values of Φ (phi), which are the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the standard normal distribution, and by extension, any normal distribution.

Functions

addToMean

When adding a new value to a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the new value to add.

approxEqual

Approximate equality.

average

The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

averageSimple

The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

bernoulliDistribution

The Bernoulli distribution is the probability discrete distribution of a random variable which takes value 1 with success probability p and value 0 with failure probability q = 1 - p. It can be used, for example, to represent the toss of a coin, where "1" is defined to mean "heads" and "0" is defined to mean "tails" (or vice versa). It is a special case of a Binomial Distribution where n = 1.

binomialDistribution

The Binomial Distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability probability. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when trials = 1, the Binomial Distribution is a Bernoulli Distribution.

bisect

Bisection method is a root-finding method that repeatedly bisects an interval to find the root.

chiSquaredGoodnessOfFit

The χ2 (Chi-Squared) Goodness-of-Fit Test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the number of observations expected given the hypothesized distribution. The resulting χ2 statistic, chiSquared, can be compared to the chi-squared distribution to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic follows, approximately, a chi-square distribution with (k − c) degrees of freedom where k is the number of non-empty cells and c is the number of estimated parameters for the distribution.

chunk

Split an array into chunks of a specified size. This function has the same behavior as PHP's array_chunk function, and thus will insert smaller-sized chunks at the end if the input size is not divisible by the chunk size.

ckmeans

Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

coefficientOfVariation

Thecoefficient of variation_ is the ratio of the standard deviation to the mean. .._coefficient of variation: https://en.wikipedia.org/wiki/Coefficient_of_variation

combinations

Implementation of Combinations Combinations are unique subsets of a collection - in this case, k x from a collection at a time. https://en.wikipedia.org/wiki/Combination

combinationsReplacement

Implementation of Combinations with replacement Combinations are unique subsets of a collection - in this case, k x from a collection at a time. 'With replacement' means that a given element can be chosen multiple times. Unlike permutation, order doesn't matter for combinations.

combineMeans

When combining two lists of values for which one already knows the means, one does not have to necessary recompute the mean of the combined lists in linear time. They can instead use this function to compute the combined mean by providing the mean & number of values of the first list and the mean & number of values of the second list.

combineVariances

When combining two lists of values for which one already knows the variances, one does not have to necessary recompute the variance of the combined lists in linear time. They can instead use this function to compute the combined variance by providing the variance, mean & number of values of the first list and the variance, mean & number of values of the second list.

cumulativeStdLogisticProbability

Logistic Cumulative Distribution Function

cumulativeStdNormalProbability

Cumulative Standard Normal Probability

equalIntervalBreaks

Given an array of x, this will find the extent of the x and return an array of breaks that can be used to categorize the x into a number of classes. The returned array will always be 1 longer than the number of classes because it includes the minimum value.

erf

Gaussian error function

errorFunction

Gaussian error function

extent

This computes the minimum & maximum number in an array.

extentSorted

The extent is the lowest & highest number in the array. With a sorted array, the first element in the array is always the lowest while the last element is always the largest, so this calculation can be done in one step, or constant time.

factorial

A Factorial, usually written n!, is the product of all positive integers less than or equal to n. Often factorial is implemented recursively, but this iterative approach is significantly faster and simpler.

gamma

Compute the gamma function of a value using Nemes' approximation. The gamma of n is equivalent to (n-1)!, but unlike the factorial function, gamma is defined for all real n except zero and negative integers (where NaN is returned). Note, the gamma function is also well-defined for complex numbers, though this implementation currently does not handle complex numbers as input values. Nemes' approximation is defined here as Theorem 2.2. Negative values use Euler's reflection formula for computation.

gammaln

Compute the logarithm of the gamma function of a value using Lanczos' approximation. This function takes as input any real-value n greater than 0. This function is useful for values of n too large for the normal gamma function (n > 165). The code is based on Lanczo's Gamma approximation, defined here.

geometricMean

The Geometric Mean is a mean function that is more useful for numbers in different ranges.

harmonicMean

The Harmonic Mean is a mean function typically used to find the average of rates. This mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of the input numbers.

interquartileRange

The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile.

inverseErrorFunction

The Inverse Gaussian error function returns a numerical approximation to the value that would have caused errorFunction() to return x.

iqr

The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile.

kde

Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample.

kernelDensityEstimation

Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample.

kMeansCluster

Perform k-means clustering.

linearRegression

Simple linear regression is a simple way to find a fitted line between a set of coordinates. This algorithm finds the slope and y-intercept of a regression line using the least sum of squares.

linearRegressionLine

Given the output of linearRegression: an object with m and b values indicating slope and intercept, respectively, generate a line function that translates x values into y values.

logAverage

The log average is an equivalent way of computing the geometric mean of an array suitable for large or small products.

logit

The Logit is the inverse of cumulativeStdLogisticProbability, and is also known as the logistic quantile function.

mad

The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation.

max

This computes the maximum number in an array.

maxSorted

The maximum is the highest number in the array. With a sorted array, the last element in the array is always the largest, so this calculation can be done in one step, or constant time.

mean

The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

meanSimple

The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

median

The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the mean() value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

medianAbsoluteDeviation

The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation.

medianSorted

The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the mean() value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

min

The min is the lowest number in the array. This runs in O(n), linear time, with respect to the length of the array.

minSorted

The minimum is the lowest number in the array. With a sorted array, the first element in the array is always the smallest, so this calculation can be done in one step, or constant time.

mode

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

modeFast

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

modeSorted

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

numericSort

Sort an array of numbers by their numeric value, ensuring that the array is not changed in place.

permutationsHeap

Implementation of Heap's Algorithm for generating permutations.

permutationTest

Conducts a permutation test to determine if two data sets are significantly different from each other, using the difference of means between the groups as the test statistic. The function allows for the following hypotheses:

  • two_tail = Null hypothesis: the two distributions are equal.
  • greater = Null hypothesis: observations from sampleX tend to be smaller than those from sampleY.
  • less = Null hypothesis: observations from sampleX tend to be greater than those from sampleY. Learn more about one-tail vs two-tail tests.
poissonDistribution

The Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

probit

The Probit is the inverse of cumulativeStdNormalProbability(), and is also known as the normal quantile function.

product

The product of an array is the result of multiplying all numbers together, starting using one as the multiplicative identity.

quantile

The quantile: this is a population quantile, since we assume to know the entire dataset in this library. This is an implementation of the Quantiles of a Population algorithm from wikipedia.

quantileRank

This function returns the quantile in which one would find the given value in the given array. It will copy and sort your array before each run, so if you know your array is already sorted, you should use quantileRankSorted instead.

quantileRankSorted

This function returns the quantile in which one would find the given value in the given array. With a sorted array, leveraging binary search, we can find this information in logarithmic time.

quantileSorted

This is the internal implementation of quantiles: when you know that the order is sorted, you don't need to re-sort it, and the computations are faster.

quickselect

Rearrange items in arr so that all items in [left, k] range are the smallest. The k-th element will have the (k - left + 1)-th smallest value in [left, right].

relativeError

Relative error.

rms

The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in O(n), linear time, with respect to the length of the array.

rootMeanSquare

The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in O(n), linear time, with respect to the length of the array.

rSquared

The R Squared value of data compared with a function f is the sum of the squared differences between the prediction and the actual value.

sample

Create a simple random sample from a given array of n elements.

sampleCorrelation

The correlation is a measure of how correlated two datasets are, between -1 and 1

sampleCovariance

Sample covariance of two datasets: how much do the two datasets move together? x and y are two datasets, represented as arrays of numbers.

sampleKurtosis

Kurtosis is a measure of the heaviness of a distribution's tails relative to its variance. The kurtosis value can be positive or negative, or even undefined.

sampleRankCorrelation

The rank correlation is a measure of the strength of monotonic relationship between two arrays

sampleSkewness

Skewness is a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean. The skewness value can be positive or negative, or even undefined.

sampleStandardDeviation

The sample standard deviation is the square root of the sample variance.

sampleVariance

The sample variance is the sum of squared deviations from the mean. The sample variance is distinguished from the variance by the usage of Bessel's Correction: instead of dividing the sum of squared deviations by the length of the input, it is divided by the length minus one. This corrects the bias in estimating a value from a set that you don't know if full.

sampleWithReplacement

Sampling with replacement is a type of sampling that allows the same item to be picked out of a population more than once.

shuffle

A Fisher-Yates shuffle is a fast way to create a random permutation of a finite set. This is a function around shuffle_in_place that adds the guarantee that it will not modify its input.

shuffleInPlace

A Fisher-Yates shuffle in-place - which means that it will change the order of the original array by reference.

sign

Sign is a function that extracts the sign of a real number

silhouette

Calculate the silhouette values for clustered data.

silhouetteMetric

Calculate the silhouette metric for a set of N-dimensional points arranged in groups. The metric is the largest individual silhouette value for the data.

standardDeviation

The standard deviation is the square root of the variance. This is also known as the population standard deviation. It's useful for measuring the amount of variation or dispersion in a set of values.

subtractFromMean

When removing a value from a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the value to remove.

sum

Our default sum is the Kahan-Babuska algorithm. This method is an improvement over the classical Kahan summation algorithm. It aims at computing the sum of a list of numbers while correcting for floating-point errors. Traditionally, sums are calculated as many successive additions, each one with its own floating-point roundoff. These losses in precision add up as the number of numbers increases. This alternative algorithm is more accurate than the simple way of calculating sums by simple addition.

sumNthPowerDeviations

The sum of deviations to the Nth power. When n=2 it's the sum of squared deviations. When n=3 it's the sum of cubed deviations.

sumSimple

The simple sum of an array is the result of adding all numbers together, starting from zero.

tTest

This is to compute a one-sample t-test, comparing the mean of a sample to a known value, x.

tTestTwoSample

This is to compute two sample t-test. Tests whether "mean(X)-mean(Y) = difference", ( in the most common case, we often have difference == 0 to test if two samples are likely to be taken from populations with the same mean value) with no prior knowledge on standard deviations of both samples other than the fact that they have the same standard deviation.

uniqueCountSorted

For a sorted input, counting the number of unique values is possible in constant time and constant memory. This is a simple implementation of the algorithm.

variance

The variance is the sum of squared deviations from the mean.

wilcoxonRankSum

This function calculates the Wilcoxon rank sum statistic for the first sample with respect to the second. The Wilcoxon rank sum test is a non-parametric alternative to the t-test which is equivalent to the Mann-Whitney U test. The statistic is calculated by pooling all the observations together, ranking them, and then summing the ranks associated with one of the samples. If this rank sum is sufficiently large or small we reject the hypothesis that the two samples come from the same distribution in favor of the alternative that one is shifted with respect to the other.

zScore

The Z-Score, or Standard Score.

Simple Statistics

A JavaScript implementation of descriptive, regression, and inference statistics.

Coverage Status npm version

Implemented in literate JavaScript with no dependencies, designed to work in all modern browsers (including IE) as well as in node.js.

Installation

  • I'm using Node.js, Webpack, Browserify, Rollup, or another module bundler, and install packages from npm.
    • First, install the simple-statistics module, using npm install simple-statistics, then include the code with require or import:
    • I use the require function to use modules in my project. (most likely)
      • When you use require, you have the freedom to assign the module to any variable name you want, but you need to specify the module's name exactly: in this case, 'simple-statistics'. The require method returns an object with all of the module's methods attached to it.
        var ss = require('simple-statistics')
    • I use import to use modules in my project. I'm probably using Babel, @std/esm, Webpack, or Rollup.
      • Import all functions under the ss object:
        import * as ss from 'simple-statistics'
        Include a specific named export:
        import {min} from 'simple-statistics'
        Simple statistics has only named exports for ES6.
  • I'm using Deno.
  • I'm not using a module bundler. I'm writing a web page, and want to include simple-statistics using a script tag.
    • I want to support all browsers
      • When you use simple-statistics from a script tag, you don't get to choose the variable name it is assigned to: simple-statistics will always become available globally as the variable ss. You can reassign this variable to another name if you want to, but doing so is optional.
        <script src='https://unpkg.com/simple-statistics@7.7.3/dist/simple-statistics.min.js'>
        </script>
    • I want to use ES6 modules in a browser and I'm willing to only support new browsers to do it
      • This module works great with the ?module query parameter of unpkg. If you specify type='module' in your script tag, you'll be able to import simple-statistics directly - through index.js and with true ES6 import syntax and behavior.
        <script type='module'>
        import {min} from "https://unpkg.com/simple-statistics@7.7.3/index.js?module"
        console.log(min([1, 2, 3]))
        </script>
        This feature is still experimental in unpkg and very bleeding-edge.