Class OneWayAnova


  • public class OneWayAnova
    extends Object
    Implements one-way ANOVA (analysis of variance) statistics.

    Tests for differences between two or more categories of univariate data (for example, the body mass index of accountants, lawyers, doctors and computer programmers). When two categories are given, this is equivalent to the TTest.

    Uses the Hipparchus F Distribution implementation to estimate exact p-values.

    This implementation is based on a description at http://faculty.vassar.edu/lowry/ch13pt1.html

     Abbreviations: bg = between groups,
                    wg = within groups,
                    ss = sum squared deviations
     
    • Constructor Detail

      • OneWayAnova

        public OneWayAnova()
        Empty constructor.

        This constructor is not strictly necessary, but it prevents spurious javadoc warnings with JDK 18 and later.

        Since:
        3.0
    • Method Detail

      • anovaFValue

        public double anovaFValue​(Collection<double[]> categoryData)
                           throws MathIllegalArgumentException,
                                  NullArgumentException
        Computes the ANOVA F-value for a collection of double[] arrays.

        Preconditions:

        • The categoryData Collection must contain double[] arrays.
        • There must be at least two double[] arrays in the categoryData collection and each of these arrays must contain at least two values.

        This implementation computes the F statistic using the definitional formula

           F = msbg/mswg

        where

          msbg = between group mean square
          mswg = within group mean square

        are as defined here

        Parameters:
        categoryData - Collection of double[] arrays each containing data for one category
        Returns:
        Fvalue
        Throws:
        NullArgumentException - if categoryData is null
        MathIllegalArgumentException - if the length of the categoryData array is less than 2 or a contained double[] array does not have at least two values
      • anovaPValue

        public double anovaPValue​(Collection<double[]> categoryData)
                           throws MathIllegalArgumentException,
                                  NullArgumentException,
                                  MathIllegalStateException
        Computes the ANOVA P-value for a collection of double[] arrays.

        Preconditions:

        • The categoryData Collection must contain double[] arrays.
        • There must be at least two double[] arrays in the categoryData collection and each of these arrays must contain at least two values.

        This implementation uses the Hipparchus F Distribution implementation to estimate the exact p-value, using the formula

           p = 1 - cumulativeProbability(F)

        where F is the F value and cumulativeProbability is the Hipparchus implementation of the F distribution.

        Parameters:
        categoryData - Collection of double[] arrays each containing data for one category
        Returns:
        Pvalue
        Throws:
        NullArgumentException - if categoryData is null
        MathIllegalArgumentException - if the length of the categoryData array is less than 2 or a contained double[] array does not have at least two values
        MathIllegalStateException - if the p-value can not be computed due to a convergence error
        MathIllegalStateException - if the maximum number of iterations is exceeded
      • anovaTest

        public boolean anovaTest​(Collection<double[]> categoryData,
                                 double alpha)
                          throws MathIllegalArgumentException,
                                 NullArgumentException,
                                 MathIllegalStateException
        Performs an ANOVA test, evaluating the null hypothesis that there is no difference among the means of the data categories.

        Preconditions:

        • The categoryData Collection must contain double[] arrays.
        • There must be at least two double[] arrays in the categoryData collection and each of these arrays must contain at least two values.
        • alpha must be strictly greater than 0 and less than or equal to 0.5.

        This implementation uses the Hipparchus F Distribution implementation to estimate the exact p-value, using the formula

           p = 1 - cumulativeProbability(F)

        where F is the F value and cumulativeProbability is the Hipparchus implementation of the F distribution.

        True is returned iff the estimated p-value is less than alpha.

        Parameters:
        categoryData - Collection of double[] arrays each containing data for one category
        alpha - significance level of the test
        Returns:
        true if the null hypothesis can be rejected with confidence 1 - alpha
        Throws:
        NullArgumentException - if categoryData is null
        MathIllegalArgumentException - if the length of the categoryData array is less than 2 or a contained double[] array does not have at least two values
        MathIllegalArgumentException - if alpha is not in the range (0, 0.5]
        MathIllegalStateException - if the p-value can not be computed due to a convergence error
        MathIllegalStateException - if the maximum number of iterations is exceeded