Statistical Functions

The following statistical functions are provided which operate on a range or a list of several cells. Note that lists can include arrays. There are examples of the statistical functions in the PipeDream Examples archive.

AVG

AVG(list)

Returns the arithmetic mean value of list.

Only numeric elements are evaluated so that blank cells do not upset averages.

BETA

BETA(a, b)

Returns the value of the Beta function for arguments a, b.

One definition of the Beta function is

beta(a, b) = gamma(a) × gamma(b) ÷ gamma(a + b)

This special function is actually evaluated by PipeDream using the formula

beta(a, b) = exp( gammaln(a) + gammaln(b) - gammaln(a + b) )

COMBIN

COMBIN(n, k)

This function yields the number of combinations of n unlike things taken k at a time.

The formula used is:

nCk = n! ÷ ((n - k)! × k!)

For very large n this is an approximation achieved using logarithms.

COUNT

COUNT(list)

Counts the number of numeric elements in list.

GAMMALN

GAMMALN(number)

This special function yields the natural logarithm of the gamma function of number.

GRAND

GRAND(m, s)

Returns a random number with gaussian distribution, mean m and standard deviation s.

If m is not given, PipeDream assumes 0; if s is not given, PipeDream assumes 1.

GROWTH

GROWTH(LOGEST_data, x_data)

Takes the result of the LOGEST function (LOGEST_data) and applies it to x_data to return estimated y_data.

LINEST

LINEST(known_y's, known_x's)

Calculates the formula of the straight line which best fits your data (known_y's and known_x's) and returns an array.

Simple linear regression

The result is an array of 2 values giving {c,m} for the straight line fit

y = mx + c

where the first array element, c, is the y-intercept value (i.e. where x=0) of the best-fitting straight line and the second array element, m, is the gradient of that line.

An example is given below. LINEST is used to calculate the gradient and intercept of the best-fitting straight line through the data points. INDEX is used to extract separate elements of the result array.

A B C D
1 2.00 4.00 -2.00 C1=linest(B1B5,A1A5)
2 3.00 7.00 3.00 C2=index(C1,2,1)
3 5.00 13.00
4 7.00 19.00
5  10.00  28.00

To verify the results, a scatter chart of the data series can be drawn and a line of best fit may be specified to be drawn on the chart.

Multiple linear regression

The result is an array of m+1 values such that

y = a0 + Σi=m(ai·xi)

where the xi are the independent x variables.

An example is given below. It is assumed that the value of a house is related to the garden size, number of bedrooms and distance from the city centre. Values for eight houses are known and these are used to calculate (in A10) an array of the coefficients. Expanding the array into the cells A12 to C12 shows the resulting formula to be

Value = 40688 + 62*Garden + 3021*Bedrooms - 956*Distance

This is then used to calculate a value for a house with a garden of 600 square metres, 4 bedrooms and 2 miles from the city centre. D14 and D15 show two ways of calculating the result.

A B C D E
1 Garden m2 Bedrooms Distance Value
2 1600 4 4 150000
3 450 3 67000
4 50 2 0.5 55000
5 0 2 1 45000
6 650 4 3 100000
7 200 1 3.5 58000
8 250 4 55000
9 620 3 2 75000
10 40688 A10=linest(D2D9,A2C9)
11 2 A11=set_value(A12C12,index(A10,2,1,3,1))
12 62 3021 -956
13
14 600 4 2 87949 D14=trend(A10,A14C14)
15 87949 D15=sum(A12C12*A14C14)+A10

LISTCOUNT

LISTCOUNT(array)

Returns a two-column array with the first column being a sorted list of unique entries from the first column of array, and the second column giving the number of occurrences of each entry.

If array has two columns the second column will be treated as a corresponding set of numbers to be added in to the total.

Example:

You have a list of customer names in column A and in column C you have a list of sales figures for individual sales to those customers, then

SET_VALUE(D1E20,LISTCOUNT({A1A100,C1C100}))

would write out 20 rows of customers names and total sales figures in D1E20. {A1A100,C1C100} makes a two column array of 100 rows.

You could use SORT() and FLIP() to sort the result of LISTCOUNT() into order before listing it out.

LOGEST

LOGEST(known_y's, known_x's)

Calculates an exponential curve which fits your data and returns an array which describes the curve.

The equation for the curve is:

y = b · (mx)

or

y = b · Πi=m(mixi)

where the xi are the independent x variables.

This function is simply a logarithmic version of LINEST.

MAX

MAX(list)

Returns the maximum value in list.

MEDIAN

MEDIAN(array)

Returns the median value in the (not necessarily sorted) range.

If there are an even number of values in the range, the mean of the two middle values is returned.

MIN

MIN(list)

Returns the minimum value in list.

PERMUT

PERMUT(n, k)

This function yields the number of permutations of n unlike things taken k at a time.

The formula used is:

nPk = n! ÷ (n - k)!

For very large n this is an approximation achieved using logarithms.

RAND

RAND

Returns a random number in the range 0 to 1 from a sequence which is the same each time PipeDream starts.

RAND(0) generates a random number in the range 0 to 1 using a random seed.

RAND(number)

Generates a random number in the range 0 to 1 using number as the initial seed.

RANK

RANK(array {, spearflag})

Takes an array and ranks the values it contains, producing as an output an array with two columns { rank, entries }, where

Rank is the rank of each value (from highest to lowest).

Entries contains the number of entries of that value in the input array.

If spearflag is present and is non-zero, equally ranked entries are adjusted so that the output is suitable for passing to the SPEARMAN() rank correlation function.

SPEARMAN

SPEARMAN(array2)

Returns the Spearman's rank correlation coefficient for the two arrays. The input list)

Returns the standard deviation of the values in list.

This is the square root of the variance. See VAR for the equation used.

STDP

STDP(list)

Returns the population standard deviation of the values in list.

This is the square root of the population variance. See VARP for the equation used.

TREND

TREND(LINEST_data, x_data)

Takes the result of the LINEST function (LINEST_data) and applies it to x_data to return estimated y_data.

VAR

VAR(list)

Returns the variance of the values in list.

The equation used is:

(n·(Σ(x2)) - (Σx)2) ÷ n·(n-1)

where x is each value and n is the number of values.

VARP

VARP(list)

Returns the population variance of the values in list.

The equation used is:

(n·(Σ(x2)) - (Σx)2) ÷ n2

where x is each value and n is the number of values.