Statistical Functions

The following statistical functions are provided by PipeDream. These functions operate on a range of cells or a list of several items. Note that a list can include a mix of individual values, cell references, ranges of cells and arrays. There are examples of the statistical functions in the PipeDream Examples.

AVG

Syntax:

AVG(list)

Returns the arithmetic mean value of the list.

Only numeric elements are evaluated so that blank cells do not upset averages.

BETA

Syntax:

BETA(a, b)

Returns the value of the Beta function for arguments a and b.

One definition of the Beta function is

Β(a, b)  =  Γ(a) · Γ(b) ÷ Γ(a + b)

This special function is actually evaluated by PipeDream using the formula

Β(a, b)  =  exp( gammaln(a) + gammaln(b) - gammaln(a + b) )

BIN

Syntax:

BIN(data:Array|Range, bins:Array|Range)

The bins array (or range of cells) defines a set of bins into which the values from the data array (or range of cells) is counted.

For example, if bins is {20,40,60,80}, the returned array gives the number of elements of data that have:

The returned array will therefore always have one more element than the bins array (or range of cells). Neither data nor bins need be sorted.

One common use of this function is to organise data for making a histogram.

Example:

With data in A1A100, and bin categories in B1B5, the following formula would provide the basis for a bar chart:

SET_VALUE(C1C6, BIN(A1A100, B1B5))

You would then make a bar chart from the range B1C6, using column B as labels.

Revisions:

This function was added in 4.50/39.

COMBIN

Syntax:

COMBIN(n, k)

This function yields the number of combinations of n unlike things taken k at a time.

The formula used is:

nCk  =  n!  ÷  ( (n - k)! · k! )

For very large n this is an approximation achieved using logarithms.

COUNT

Syntax:

COUNT(list)

Counts the number of numeric elements in the list.

COUNTA

Syntax:

COUNTA(list)

Returns the number of all the values in the list regardless of their type.

Revisions:

This function was added in 4.54.

GAMMALN

Syntax:

GAMMALN(number)

This special function yields the natural logarithm of the gamma function of number.

GRAND

Syntax:

GRAND{(m {, s)}}

Returns a random number with gaussian distribution, mean m and standard deviation s.

If m is not given, PipeDream assumes 0 (zero).

If s is not given, PipeDream assumes 1 (one).

GROWTH

Syntax:

GROWTH(LOGEST_data, x_data)

Takes the result of the LOGEST function (LOGEST_data) and applies it to x_data to return estimated y_data.

LINEST

Syntax:

LINEST(known_y’s, known_x’s)

Calculates the formula of the straight line which best fits your data (known_y’s and known_x’s) and returns an array.

Simple linear regression

The result is an array of 2 values giving {c,m} for the straight line fit

y  =  m x + c

where the first array element, c, is the y-intercept value (i.e. where x=0) of the best-fitting straight line and the second array element, m, is the gradient of that line.

Example:

LINEST is used to calculate the gradient and intercept of the best-fitting straight line through the data points. INDEX is used to extract separate elements of the result array.

A B C D
1 2.00 4.00 -2.00 C1=linest(B1B5,A1A5)
2 3.00 7.00 3.00 C2=index(C1,2,1)
3 5.00 13.00
4 7.00 19.00
5  10.00  28.00

To verify the results, a scatter chart of the data series can be drawn and a line of best fit may be specified to be drawn on the chart.

Multiple linear regression

The result is an array of m+1 values, { a0, a1 … am } such that

y  =  a0 + Σi=m (ai · xi)

where the xi are the independent x variables.

Example:

It is assumed that the value of a house is related to the garden size, number of bedrooms and distance from the city centre. Values for eight houses are known and these are used to calculate (in A10) an array of the coefficients.

A B C D E
1 Garden m2 Bedrooms Distance Value
2 1600 4 4 150000
3 450 3 67000
4 50 2 0.5 55000
5 0 2 1 45000
6 650 4 3 100000
7 200 1 3.5 58000
8 250 4 55000
9 620 3 2 75000
10 40688 A10=linest(D2D9,A2C9)
11 2 A11=set_value(A12C12,index(A10,2,1,3,1))
12 62 3021 -956
13
14 600 4 2 87949 D14=trend(A10,A14C14)
15 87949 D15=sum(A12C12*A14C14)+A10

Expanding the array into the cells A12 to C12 shows the resulting formula to be

Value = 40688 + 62*Garden + 3021*Bedrooms - 956*Distance

This is then used to calculate a value for a house with a garden of 600 square metres, 4 bedrooms and 2 miles from the city centre. D14 and D15 show two ways of calculating the result.

LISTCOUNT

Syntax:

LISTCOUNT(array_or_range)

Returns a two-column array with the first column being a sorted list of unique entries from the first column of array_or_range, and the second column giving the number of occurrences of each entry.

If array_or_range has two columns then the second column will be treated as a corresponding set of numbers to be added in to the total.

Example:

You have a list of customer names in column A and in column C you have a corresponding list of sales figures for individual sales to those customers, then

SET_VALUE(D1E20,LISTCOUNT({A1A100,C1C100}))

would write out 20 rows of customers names and total sales figures in D1E20. {A1A100,C1C100} makes a two column array of 100 rows.

You could use SORT() and FLIP() to sort the result of LISTCOUNT() into order before listing it out.

Revisions:

This function was added in 4.52/06.

LOGEST

Syntax:

LOGEST(known_y’s, known_x’s)

Calculates an exponential curve which fits your data and returns an array which describes the curve.

The equation for the curve is:

y = b · (mx)

or

y  =  a0 · Πi=m (aixi)

where the xi are the independent x variables.

This function is simply a logarithmic version of LINEST.

MAX

Syntax:

MAX(list)

Returns the maximum value in the list.

MEDIAN

Syntax:

MEDIAN(array_or_range)

Returns the median value in the array (or range or cells).

If there are an even number of values in the array, the mean of the two middle values is returned.

Revisions:

This function was added in 4.50/39.

MIN

Syntax:

MIN(list)

Returns the minimum value in the list.

PERMUT

Syntax:

PERMUT(n, k)

This function yields the number of permutations of n unlike things taken k at a time.

The formula used is:

nPk  =  n!  ÷  (n - k)!

For very large n this is an approximation achieved using logarithms.

RAND

Syntax:

RAND

Returns a random number in the range 0.0 to 1.0 from a sequence which is the same each time PipeDream starts.

Syntax:

RAND(number)

Generates a random number in the range 0.0 to 1.0 using number as the initial seed.

RAND(0) generates a random number in the range 0.0 to 1.0 using a random seed.

RANK

Syntax:

RANK(array_or_range {, spearflag})

Ranks the values contained in array_or_range, producing as an output an array with two columns { rankentries }, where

Rank is the rank of each value (from highest to lowest).

Entries contains the number of entries of that value in the input array.

If spearflag is present and is non-zero, equally ranked entries are adjusted so that the output is suitable for passing to the SPEARMAN() rank correlation function.

SPEARMAN

Syntax:

SPEARMAN(array_or_range1, array_or_range2)

Returns the Spearman’s rank correlation coefficient for the two arrays (or ranges of cells).

The input data can be taken from the RANK() function.

The equation used is:

rs = 1 - ( 6 · Σi=n di2  ÷  (n · (n2 - 1)) )

STD

Syntax:

STD(list)

Returns the sample standard deviation of the values in the list.

This is the square root of the sample variance.

The equation used is:

{ ( n · (Σi=n (xi2)) - (Σi=n xi)2 )  ÷  (n · (n - 1)) }

where xi is each value and n is the number of values in the list.

STDP

Syntax:

STDP(list)

Returns the population standard deviation of the values in the list.

This is the square root of the population variance.

The equation used is:

{ ( n · (Σi=n (xi2)) - (Σi=n xi)2 )  ÷  n2 }

where xi is each value and n is the number of values in the list.

TREND

Syntax:

TREND(LINEST_data, x_data)

Takes the result of the LINEST function (LINEST_data) and applies it to x_data to return estimated y_data.

VAR

Syntax:

VAR(list)

Returns the sample variance of the values in the list.

The equation used is:

( n · (Σi=n (xi2)) - (Σi=n xi)2 )  ÷  (n · (n - 1))

where xi is each value and n is the number of values in the list.

VARP

Syntax:

VARP(list)

Returns the population variance of the values in the list.

The equation used is:

( n · (Σi=n (xi2)) - (Σi=n xi)2 )  ÷  n2

where xi is each value and n is the number of values in the list.