/index.js | simplestatistics@v7.7.5

import * as simplestatistics from "https://deno.land/x/simplestatistics@v7.7.5/index.js";

Classes

c bayesian	Bayesian Classifier
c BayesianClassifier	Bayesian Classifier
c perceptron	This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).
c PerceptronModel	This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).

Variables

v chiSquaredDistributionTable	Percentage Points of the χ2 (Chi-Squared) Distribution
v epsilon	We use `ε`, epsilon, as a stopping criterion when we want to iterate until we're "close enough". Epsilon is a very small number: for simple statistics, that number is 0.0001
v standardNormalTable	A standard normal table, also called the unit normal table or Z table, is a mathematical table for the values of Φ (phi), which are the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the standard normal distribution, and by extension, any normal distribution.

Functions

f addToMean	When adding a new value to a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the new value to add.
f approxEqual	Approximate equality.
f average	The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f averageSimple	The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f bernoulliDistribution	The Bernoulli distribution is the probability discrete distribution of a random variable which takes value 1 with success probability `p` and value 0 with failure probability `q` = 1 - `p`. It can be used, for example, to represent the toss of a coin, where "1" is defined to mean "heads" and "0" is defined to mean "tails" (or vice versa). It is a special case of a Binomial Distribution where `n` = 1.
f binomialDistribution	The Binomial Distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability `probability`. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when trials = 1, the Binomial Distribution is a Bernoulli Distribution.
f bisect	Bisection method is a root-finding method that repeatedly bisects an interval to find the root.
f chiSquaredGoodnessOfFit	The χ2 (Chi-Squared) Goodness-of-Fit Test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the number of observations expected given the hypothesized distribution. The resulting χ2 statistic, `chiSquared`, can be compared to the chi-squared distribution to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic follows, approximately, a chi-square distribution with (k − c) degrees of freedom where `k` is the number of non-empty cells and `c` is the number of estimated parameters for the distribution.
f chunk	Split an array into chunks of a specified size. This function has the same behavior as PHP's array_chunk function, and thus will insert smaller-sized chunks at the end if the input size is not divisible by the chunk size.
f ckmeans	Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.
f coefficientOfVariation	The`coefficient of variation`_ is the ratio of the standard deviation to the mean. .._`coefficient of variation`: https://en.wikipedia.org/wiki/Coefficient_of_variation
f combinations	Implementation of Combinations Combinations are unique subsets of a collection - in this case, k x from a collection at a time. https://en.wikipedia.org/wiki/Combination
f combinationsReplacement	Implementation of Combinations with replacement Combinations are unique subsets of a collection - in this case, k x from a collection at a time. 'With replacement' means that a given element can be chosen multiple times. Unlike permutation, order doesn't matter for combinations.
f combineMeans	When combining two lists of values for which one already knows the means, one does not have to necessary recompute the mean of the combined lists in linear time. They can instead use this function to compute the combined mean by providing the mean & number of values of the first list and the mean & number of values of the second list.
f combineVariances	When combining two lists of values for which one already knows the variances, one does not have to necessary recompute the variance of the combined lists in linear time. They can instead use this function to compute the combined variance by providing the variance, mean & number of values of the first list and the variance, mean & number of values of the second list.
f cumulativeStdLogisticProbability	Logistic Cumulative Distribution Function
f cumulativeStdNormalProbability	Cumulative Standard Normal Probability
f equalIntervalBreaks	Given an array of x, this will find the extent of the x and return an array of breaks that can be used to categorize the x into a number of classes. The returned array will always be 1 longer than the number of classes because it includes the minimum value.
f erf	Gaussian error function
f errorFunction	Gaussian error function
f extent	This computes the minimum & maximum number in an array.
f extentSorted	The extent is the lowest & highest number in the array. With a sorted array, the first element in the array is always the lowest while the last element is always the largest, so this calculation can be done in one step, or constant time.
f factorial	A Factorial, usually written n!, is the product of all positive integers less than or equal to n. Often factorial is implemented recursively, but this iterative approach is significantly faster and simpler.
f gamma	Compute the gamma function of a value using Nemes' approximation. The gamma of n is equivalent to (n-1)!, but unlike the factorial function, gamma is defined for all real n except zero and negative integers (where NaN is returned). Note, the gamma function is also well-defined for complex numbers, though this implementation currently does not handle complex numbers as input values. Nemes' approximation is defined here as Theorem 2.2. Negative values use Euler's reflection formula for computation.
f gammaln	Compute the logarithm of the gamma function of a value using Lanczos' approximation. This function takes as input any real-value n greater than 0. This function is useful for values of n too large for the normal gamma function (n > 165). The code is based on Lanczo's Gamma approximation, defined here.
f geometricMean	The Geometric Mean is a mean function that is more useful for numbers in different ranges.
f harmonicMean	The Harmonic Mean is a mean function typically used to find the average of rates. This mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of the input numbers.
f interquartileRange	The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile.
f inverseErrorFunction	The Inverse Gaussian error function returns a numerical approximation to the value that would have caused `errorFunction()` to return x.
f iqr	The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile.
f kde	Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample.
f kernelDensityEstimation	Kernel density estimation is a useful tool for, among other things, estimating the shape of the underlying probability distribution from a sample.
f kMeansCluster	Perform k-means clustering.
f linearRegression	Simple linear regression is a simple way to find a fitted line between a set of coordinates. This algorithm finds the slope and y-intercept of a regression line using the least sum of squares.
f linearRegressionLine	Given the output of `linearRegression`: an object with `m` and `b` values indicating slope and intercept, respectively, generate a line function that translates x values into y values.
f logAverage	The log average is an equivalent way of computing the geometric mean of an array suitable for large or small products.
f logit	The Logit is the inverse of cumulativeStdLogisticProbability, and is also known as the logistic quantile function.
f mad	The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation.
f max	This computes the maximum number in an array.
f maxSorted	The maximum is the highest number in the array. With a sorted array, the last element in the array is always the largest, so this calculation can be done in one step, or constant time.
f mean	The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f meanSimple	The mean, also known as average, is the sum of all values over the number of values. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f median	The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the `mean()` value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f medianAbsoluteDeviation	The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation.
f medianSorted	The median is the middle number of a list. This is often a good indicator of 'the middle' when there are outliers that skew the `mean()` value. This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.
f min	The min is the lowest number in the array. This runs in `O(n)`, linear time, with respect to the length of the array.
f minSorted	The minimum is the lowest number in the array. With a sorted array, the first element in the array is always the smallest, so this calculation can be done in one step, or constant time.
f mode	The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.
f modeFast	The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.
f modeSorted	The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.
f numericSort	Sort an array of numbers by their numeric value, ensuring that the array is not changed in place.
f permutationsHeap	Implementation of Heap's Algorithm for generating permutations.
f permutationTest	Conducts a permutation test to determine if two data sets are significantly different from each other, using the difference of means between the groups as the test statistic. The function allows for the following hypotheses: two_tail = Null hypothesis: the two distributions are equal. greater = Null hypothesis: observations from sampleX tend to be smaller than those from sampleY. less = Null hypothesis: observations from sampleX tend to be greater than those from sampleY. Learn more about one-tail vs two-tail tests.
f poissonDistribution	The Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.
f probit	The Probit is the inverse of cumulativeStdNormalProbability(), and is also known as the normal quantile function.
f product	The product of an array is the result of multiplying all numbers together, starting using one as the multiplicative identity.
f quantile	The quantile: this is a population quantile, since we assume to know the entire dataset in this library. This is an implementation of the Quantiles of a Population algorithm from wikipedia.
f quantileRank	This function returns the quantile in which one would find the given value in the given array. It will copy and sort your array before each run, so if you know your array is already sorted, you should use `quantileRankSorted` instead.
f quantileRankSorted	This function returns the quantile in which one would find the given value in the given array. With a sorted array, leveraging binary search, we can find this information in logarithmic time.
f quantileSorted	This is the internal implementation of quantiles: when you know that the order is sorted, you don't need to re-sort it, and the computations are faster.
f quickselect	Rearrange items in `arr` so that all items in `[left, k]` range are the smallest. The `k`-th element will have the `(k - left + 1)`-th smallest value in `[left, right]`.
f relativeError	Relative error.
f rms	The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in `O(n)`, linear time, with respect to the length of the array.
f rootMeanSquare	The Root Mean Square (RMS) is a mean function used as a measure of the magnitude of a set of numbers, regardless of their sign. This is the square root of the mean of the squares of the input numbers. This runs in `O(n)`, linear time, with respect to the length of the array.
f rSquared	The R Squared value of data compared with a function `f` is the sum of the squared differences between the prediction and the actual value.
f sample	Create a simple random sample from a given array of `n` elements.
f sampleCorrelation	The correlation is a measure of how correlated two datasets are, between -1 and 1
f sampleCovariance	Sample covariance of two datasets: how much do the two datasets move together? x and y are two datasets, represented as arrays of numbers.
f sampleKurtosis	Kurtosis is a measure of the heaviness of a distribution's tails relative to its variance. The kurtosis value can be positive or negative, or even undefined.
f sampleRankCorrelation	The rank correlation is a measure of the strength of monotonic relationship between two arrays
f sampleSkewness	Skewness is a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean. The skewness value can be positive or negative, or even undefined.
f sampleStandardDeviation	The sample standard deviation is the square root of the sample variance.
f sampleVariance	The sample variance is the sum of squared deviations from the mean. The sample variance is distinguished from the variance by the usage of Bessel's Correction: instead of dividing the sum of squared deviations by the length of the input, it is divided by the length minus one. This corrects the bias in estimating a value from a set that you don't know if full.
f sampleWithReplacement	Sampling with replacement is a type of sampling that allows the same item to be picked out of a population more than once.
f shuffle	A Fisher-Yates shuffle is a fast way to create a random permutation of a finite set. This is a function around `shuffle_in_place` that adds the guarantee that it will not modify its input.
f shuffleInPlace	A Fisher-Yates shuffle in-place - which means that it will change the order of the original array by reference.
f sign	Sign is a function that extracts the sign of a real number
f silhouette	Calculate the silhouette values for clustered data.
f silhouetteMetric	Calculate the silhouette metric for a set of N-dimensional points arranged in groups. The metric is the largest individual silhouette value for the data.
f standardDeviation	The standard deviation is the square root of the variance. This is also known as the population standard deviation. It's useful for measuring the amount of variation or dispersion in a set of values.
f subtractFromMean	When removing a value from a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the value to remove.
f sum	Our default sum is the Kahan-Babuska algorithm. This method is an improvement over the classical Kahan summation algorithm. It aims at computing the sum of a list of numbers while correcting for floating-point errors. Traditionally, sums are calculated as many successive additions, each one with its own floating-point roundoff. These losses in precision add up as the number of numbers increases. This alternative algorithm is more accurate than the simple way of calculating sums by simple addition.
f sumNthPowerDeviations	The sum of deviations to the Nth power. When n=2 it's the sum of squared deviations. When n=3 it's the sum of cubed deviations.
f sumSimple	The simple sum of an array is the result of adding all numbers together, starting from zero.
f tTest	This is to compute a one-sample t-test, comparing the mean of a sample to a known value, x.
f tTestTwoSample	This is to compute two sample t-test. Tests whether "mean(X)-mean(Y) = difference", ( in the most common case, we often have `difference == 0` to test if two samples are likely to be taken from populations with the same mean value) with no prior knowledge on standard deviations of both samples other than the fact that they have the same standard deviation.
f uniqueCountSorted	For a sorted input, counting the number of unique values is possible in constant time and constant memory. This is a simple implementation of the algorithm.
f variance	The variance is the sum of squared deviations from the mean.
f wilcoxonRankSum	This function calculates the Wilcoxon rank sum statistic for the first sample with respect to the second. The Wilcoxon rank sum test is a non-parametric alternative to the t-test which is equivalent to the Mann-Whitney U test. The statistic is calculated by pooling all the observations together, ranking them, and then summing the ranks associated with one of the samples. If this rank sum is sufficiently large or small we reject the hypothesis that the two samples come from the same distribution in favor of the alternative that one is shifted with respect to the other.
f zScore	The Z-Score, or Standard Score.