Class MannWhitneyUTest

java.lang.Object
org.hipparchus.stat.inference.MannWhitneyUTest

public class MannWhitneyUTest extends Object
An implementation of the Mann-Whitney U test.

The definitions and computing formulas used in this implementation follow those in the article, Mann-Whitney U Test

In general, results correspond to (and have been tested against) the R wilcox.test function, with exact meaning the same thing in both APIs and CORRECT uniformly true in this implementation. For example, wilcox.test(x, y, alternative = "two.sided", mu = 0, paired = FALSE, exact = FALSE correct = TRUE) will return the same p-value as mannWhitneyUTest(x, y, false). The minimum of the W value returned by R for wilcox.test(x, y...) and wilcox.test(y, x...) should equal mannWhitneyU(x, y...).

  • Constructor Details

    • MannWhitneyUTest

      public MannWhitneyUTest()
      Create a test instance using where NaN's are left in place and ties get the average of applicable ranks.
    • MannWhitneyUTest

      public MannWhitneyUTest(NaNStrategy nanStrategy, TiesStrategy tiesStrategy)
      Create a test instance using the given strategies for NaN's and ties.
      Parameters:
      nanStrategy - specifies the strategy that should be used for Double.NaN's
      tiesStrategy - specifies the strategy that should be used for ties
  • Method Details

    • mannWhitneyU

      public double mannWhitneyU(double[] x, double[] y) throws MathIllegalArgumentException, NullArgumentException
      Computes the Mann-Whitney U statistic comparing means for two independent samples possibly of different lengths.

      This statistic can be used to perform a Mann-Whitney U test evaluating the null hypothesis that the two independent samples have equal mean.

      Let Xi denote the i'th individual of the first sample and Yj the j'th individual in the second sample. Note that the samples can have different lengths.

      Preconditions:

      • All observations in the two samples are independent.
      • The observations are at least ordinal (continuous are also ordinal).
      Parameters:
      x - the first sample
      y - the second sample
      Returns:
      Mann-Whitney U statistic (minimum of Ux and Uy)
      Throws:
      NullArgumentException - if x or y are null.
      MathIllegalArgumentException - if x or y are zero-length.
    • mannWhitneyUTest

      public double mannWhitneyUTest(double[] x, double[] y) throws MathIllegalArgumentException, NullArgumentException
      Returns the asymptotic observed significance level, or p-value, associated with a Mann-Whitney U Test comparing means for two independent samples.

      Let Xi denote the i'th individual of the first sample and Yj the j'th individual in the second sample.

      Preconditions:

      • All observations in the two samples are independent.
      • The observations are at least ordinal.

      If there are no ties in the data and both samples are small (less than or equal to 50 values in the combined dataset), an exact test is performed; otherwise the test uses the normal approximation (with continuity correction).

      If the combined dataset contains ties, the variance used in the normal approximation is bias-adjusted using the formula in the reference above.

      Parameters:
      x - the first sample
      y - the second sample
      Returns:
      approximate 2-sized p-value
      Throws:
      NullArgumentException - if x or y are null.
      MathIllegalArgumentException - if x or y are zero-length
    • mannWhitneyUTest

      public double mannWhitneyUTest(double[] x, double[] y, boolean exact) throws MathIllegalArgumentException, NullArgumentException
      Returns the asymptotic observed significance level, or p-value, associated with a Mann-Whitney U Test comparing means for two independent samples.

      Let Xi denote the i'th individual of the first sample and Yj the j'th individual in the second sample.

      Preconditions:

      • All observations in the two samples are independent.
      • The observations are at least ordinal.

      If exact is true, the p-value reported is exact, computed using the exact distribution of the U statistic. The computation in this case requires storage on the order of the product of the two sample sizes, so this should not be used for large samples.

      If exact is false, the normal approximation is used to estimate the p-value.

      If the combined dataset contains ties and exact is true, MathIllegalArgumentException is thrown. If exact is false and the ties are present, the variance used to compute the approximate p-value in the normal approximation is bias-adjusted using the formula in the reference above.

      Parameters:
      x - the first sample
      y - the second sample
      exact - true means compute the p-value exactly, false means use the normal approximation
      Returns:
      approximate 2-sided p-value
      Throws:
      NullArgumentException - if x or y are null.
      MathIllegalArgumentException - if x or y are zero-length or if exact is true and ties are present in the data