x/simplestatistics@v7.7.5

simple statistics for node & browser javascript
2765
Latest
 benchmarks/ scripts/ src/ test/
``import * as simplestatistics from "https://deno.land/x/simplestatistics@v7.7.5/index.js";``

## Classes

 bayesian BayesianClassifier perceptron This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples). PerceptronModel This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).

## Variables

 chiSquaredDistributionTable Percentage Points of the χ2 (Chi-Squared) Distribution epsilon We use `ε`, epsilon, as a stopping criterion when we want to iterate until we're "close enough". Epsilon is a very small number: for simple statistics, that number is 0.0001 standardNormalTable A standard normal table, also called the unit normal table or Z table, is a mathematical table for the values of Φ (phi), which are the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the standard normal distribution, and by extension, any normal distribution.

## Functions

 addToMean When adding a new value to a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the new value to add. approxEqual Approximate equality. average The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. averageSimple The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. bernoulliDistribution The Bernoulli distribution is the probability discrete distribution of a random variable which takes value 1 with success probability `p` and value 0 with failure probability `q` = 1 - `p`. It can be used, for example, to represent the toss of a coin, where "1" is defined to mean "heads" and "0" is defined to mean "tails" (or vice versa). It is a special case of a Binomial Distribution where `n` = 1. binomialDistribution The Binomial Distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability `probability`. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when trials = 1, the Binomial Distribution is a Bernoulli Distribution. bisect Bisection method is a root-finding method that repeatedly bisects an interval to find the root. chiSquaredGoodnessOfFit The χ2 (Chi-Squared) Goodness-of-Fit Test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the number of observations expected given the hypothesized distribution. The resulting χ2 statistic, `chiSquared`, can be compared to the chi-squared distribution to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic follows, approximately, a chi-square distribution with (k − c) degrees of freedom where `k` is the number of non-empty cells and `c` is the number of estimated parameters for the distribution. chunk Split an array into chunks of a specified size. This function has the same behavior as PHP's array_chunk function, and thus will insert smaller-sized chunks at the end if the input size is not divisible by the chunk size. ckmeans Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations. coefficientOfVariation The`coefficient of variation`_ is the ratio of the standard deviation to the mean. .._`coefficient of variation`: https://en.wikipedia.org/wiki/Coefficient_of_variation combinations Implementation of Combinations Combinations are unique subsets of a collection - in this case, k x from a collection at a time. https://en.wikipedia.org/wiki/Combination combinationsReplacement Implementation of Combinations with replacement Combinations are unique subsets of a collection - in this case, k x from a collection at a time. 'With replacement' means that a given element can be chosen multiple times. Unlike permutation, order doesn't matter for combinations. combineMeans When combining two lists of values for which one already knows the means, one does not have to necessary recompute the mean of the combined lists in linear time. They can instead use this function to compute the combined mean by providing the mean & number of values of the first list and the mean & number of values of the second list. combineVariances When combining two lists of values for which one already knows the variances, one does not have to necessary recompute the variance of the combined lists in linear time. They can instead use this function to compute the combined variance by providing the variance, mean & number of values of the first list and the variance, mean & number of values of the second list. cumulativeStdLogisticProbability cumulativeStdNormalProbability equalIntervalBreaks Given an array of x, this will find the extent of the x and return an array of breaks that can be used to categorize the x into a number of classes. The returned array will always be 1 longer than the number of classes because it includes the minimum value. erf errorFunction extent This computes the minimum & maximum number in an array. extentSorted The extent is the lowest & highest number in the array. With a sorted array, the first element in the array is always the lowest while the last element is always the largest, so this calculation can be done in one step, or constant time. factorial A Factorial, usually written n!, is the product of all positive integers less than or equal to n. Often factorial is implemented recursively, but this iterative approach is significantly faster and simpler. gamma Compute the gamma function of a value using Nemes' approximation. The gamma of n is equivalent to (n-1)!, but unlike the factorial function, gamma is defined for all real n except zero and negative integers (where NaN is returned). Note, the gamma function is also well-defined for complex numbers, though this implementation currently does not handle complex numbers as input values. Nemes' approximation is defined here as Theorem 2.2. Negative values use Euler's reflection formula for computation. gammaln Compute the logarithm of the gamma function of a value using Lanczos' approximation. This function takes as input any real-value n greater than 0. This function is useful for values of n too large for the normal gamma function (n > 165). The code is based on Lanczo's Gamma approximation, defined here. geometricMean The Geometric Mean is a mean function that is more useful for numbers in different ranges. harmonicMean The Harmonic Mean is a mean function typically used to find the average of rates. This mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of the input numbers. interquartileRange The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile. inverseErrorFunction The Inverse Gaussian error function returns a numerical approximation to the value that would have caused `errorFunction()` to return x. iqr The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile. kde Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample. kernelDensityEstimation Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample. kMeansCluster Perform k-means clustering. linearRegression Simple linear regression is a simple way to find a fitted line between a set of coordinates. This algorithm finds the slope and y-intercept of a regression line using the least sum of squares. linearRegressionLine Given the output of `linearRegression`: an object with `m` and `b` values indicating slope and intercept, respectively, generate a line function that translates x values into y values. logAverage The log average is an equivalent way of computing the geometric mean of an array suitable for large or small products. logit The Logit is the inverse of cumulativeStdLogisticProbability, and is also known as the logistic quantile function. mad The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation. max This computes the maximum number in an array. maxSorted The maximum is the highest number in the array. With a sorted array, the last element in the array is always the largest, so this calculation can be done in one step, or constant time. mean The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. meanSimple The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. median The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the `mean()` value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. medianAbsoluteDeviation The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation. medianSorted The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the `mean()` value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers. min The min is the lowest number in the array. This runs in `O(n)`, linear time, with respect to the length of the array. minSorted The minimum is the lowest number in the array. With a sorted array, the first element in the array is always the smallest, so this calculation can be done in one step, or constant time. mode The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode. modeFast The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode. modeSorted The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode. numericSort Sort an array of numbers by their numeric value, ensuring that the array is not changed in place. permutationsHeap Implementation of Heap's Algorithm for generating permutations. permutationTest Conducts a permutation test to determine if two data sets are significantly different from each other, using the difference of means between the groups as the test statistic. The function allows for the following hypotheses: two_tail = Null hypothesis: the two distributions are equal. greater = Null hypothesis: observations from sampleX tend to be smaller than those from sampleY. less = Null hypothesis: observations from sampleX tend to be greater than those from sampleY. Learn more about one-tail vs two-tail tests. poissonDistribution The Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. probit The Probit is the inverse of cumulativeStdNormalProbability(), and is also known as the normal quantile function. product The product of an array is the result of multiplying all numbers together, starting using one as the multiplicative identity. quantile The quantile: this is a population quantile, since we assume to know the entire dataset in this library. This is an implementation of the Quantiles of a Population algorithm from wikipedia. quantileRank This function returns the quantile in which one would find the given value in the given array. It will copy and sort your array before each run, so if you know your array is already sorted, you should use `quantileRankSorted` instead. quantileRankSorted This function returns the quantile in which one would find the given value in the given array. With a sorted array, leveraging binary search, we can find this information in logarithmic time. quantileSorted This is the internal implementation of quantiles: when you know that the order is sorted, you don't need to re-sort it, and the computations are faster. quickselect Rearrange items in `arr` so that all items in `[left, k]` range are the smallest. The `k`-th element will have the `(k - left + 1)`-th smallest value in `[left, right]`. relativeError Relative error. rms The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in `O(n)`, linear time, with respect to the length of the array. rootMeanSquare The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in `O(n)`, linear time, with respect to the length of the array. rSquared The R Squared value of data compared with a function `f` is the sum of the squared differences between the prediction and the actual value. sample Create a simple random sample from a given array of `n` elements. sampleCorrelation The correlation is a measure of how correlated two datasets are, between -1 and 1 sampleCovariance Sample covariance of two datasets: how much do the two datasets move together? x and y are two datasets, represented as arrays of numbers. sampleKurtosis Kurtosis is a measure of the heaviness of a distribution's tails relative to its variance. The kurtosis value can be positive or negative, or even undefined. sampleRankCorrelation The rank correlation is a measure of the strength of monotonic relationship between two arrays sampleSkewness Skewness is a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean. The skewness value can be positive or negative, or even undefined. sampleStandardDeviation The sample standard deviation is the square root of the sample variance. sampleVariance The sample variance is the sum of squared deviations from the mean. The sample variance is distinguished from the variance by the usage of Bessel's Correction: instead of dividing the sum of squared deviations by the length of the input, it is divided by the length minus one. This corrects the bias in estimating a value from a set that you don't know if full. sampleWithReplacement Sampling with replacement is a type of sampling that allows the same item to be picked out of a population more than once. shuffle A Fisher-Yates shuffle is a fast way to create a random permutation of a finite set. This is a function around `shuffle_in_place` that adds the guarantee that it will not modify its input. shuffleInPlace A Fisher-Yates shuffle in-place - which means that it will change the order of the original array by reference. sign Sign is a function that extracts the sign of a real number silhouette Calculate the silhouette values for clustered data. silhouetteMetric Calculate the silhouette metric for a set of N-dimensional points arranged in groups. The metric is the largest individual silhouette value for the data. standardDeviation The standard deviation is the square root of the variance. This is also known as the population standard deviation. It's useful for measuring the amount of variation or dispersion in a set of values. subtractFromMean When removing a value from a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the value to remove. sum Our default sum is the Kahan-Babuska algorithm. This method is an improvement over the classical Kahan summation algorithm. It aims at computing the sum of a list of numbers while correcting for floating-point errors. Traditionally, sums are calculated as many successive additions, each one with its own floating-point roundoff. These losses in precision add up as the number of numbers increases. This alternative algorithm is more accurate than the simple way of calculating sums by simple addition. sumNthPowerDeviations The sum of deviations to the Nth power. When n=2 it's the sum of squared deviations. When n=3 it's the sum of cubed deviations. sumSimple The simple sum of an array is the result of adding all numbers together, starting from zero. tTest This is to compute a one-sample t-test, comparing the mean of a sample to a known value, x. tTestTwoSample This is to compute two sample t-test. Tests whether "mean(X)-mean(Y) = difference", ( in the most common case, we often have `difference == 0` to test if two samples are likely to be taken from populations with the same mean value) with no prior knowledge on standard deviations of both samples other than the fact that they have the same standard deviation. uniqueCountSorted For a sorted input, counting the number of unique values is possible in constant time and constant memory. This is a simple implementation of the algorithm. variance The variance is the sum of squared deviations from the mean. wilcoxonRankSum This function calculates the Wilcoxon rank sum statistic for the first sample with respect to the second. The Wilcoxon rank sum test is a non-parametric alternative to the t-test which is equivalent to the Mann-Whitney U test. The statistic is calculated by pooling all the observations together, ranking them, and then summing the ranks associated with one of the samples. If this rank sum is sufficiently large or small we reject the hypothesis that the two samples come from the same distribution in favor of the alternative that one is shifted with respect to the other. zScore

# Simple Statistics

A JavaScript implementation of descriptive, regression, and inference statistics.

Implemented in literate JavaScript with no dependencies, designed to work in all modern browsers (including IE) as well as in node.js.

## Installation

• I'm using Node.js, Webpack, Browserify, Rollup, or another module bundler, and install packages from npm.
• First, install the `simple-statistics` module, using `npm install simple-statistics`, then include the code with require or import:
• I use the `require` function to use modules in my project. (most likely)
• When you use `require`, you have the freedom to assign the module to any variable name you want, but you need to specify the module's name exactly: in this case, 'simple-statistics'. The `require` method returns an object with all of the module's methods attached to it.
`var ss = require('simple-statistics')`
• I use `import` to use modules in my project. I'm probably using Babel, `@std/esm`, Webpack, or Rollup.
• Import all functions under the ss object:
`import * as ss from 'simple-statistics'`
Include a specific named export:
`import {min} from 'simple-statistics'`
Simple statistics has only named exports for ES6.
• I'm using Deno.
• I'm not using a module bundler. I'm writing a web page, and want to include simple-statistics using a script tag.
• I want to support all browsers
• When you use simple-statistics from a script tag, you don't get to choose the variable name it is assigned to: simple-statistics will always become available globally as the variable `ss`. You can reassign this variable to another name if you want to, but doing so is optional.
``````<script src='https://unpkg.com/simple-statistics@7.7.3/dist/simple-statistics.min.js'>
</script>``````
• I want to use ES6 modules in a browser and I'm willing to only support new browsers to do it
• This module works great with the `?module` query parameter of unpkg. If you specify `type='module'` in your script tag, you'll be able to import simple-statistics directly - through `index.js` and with true ES6 import syntax and behavior.
```<script type='module'>
import {min} from "https://unpkg.com/simple-statistics@7.7.3/index.js?module"
console.log(min([1, 2, 3]))
</script>```
This feature is still experimental in unpkg and very bleeding-edge.