parallel_variance#

parallel_variance(mean_a, count_a, var_a, mean_b, count_b, var_b)[source]#

Compute the variance based on stats from two partitions of the data.

See “Parallel Algorithm” in https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

Parameters
  • mean_a – the mean of partition a

  • count_a – the number of elements in partition a

  • var_a – the variance of partition a

  • mean_b – the mean of partition b

  • count_b – the number of elements in partition b

  • var_b – the variance of partition b

Returns

the variance of the two partitions if they were combined