org.hipparchus.stat.descriptive.AbstractUnivariateStatistic

org.hipparchus.stat.descriptive.rank.Percentile

All Implemented Interfaces:: Serializable, UnivariateStatistic, MathArrays.Function

public class Percentile extends AbstractUnivariateStatistic implements Serializable

Provides percentile computation.

There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:

Let n be the length of the (sorted) array and 0 < p <= 100 be the desired percentile.
If n = 1 return the unique array element (regardless of the value of p); otherwise
Compute the estimated percentile position pos = p * (n + 1) / 100 and the difference, d between pos and floor(pos) (i.e. the fractional part of pos).
If pos < 1 return the smallest element in the array.
Else if pos >= n return the largest element in the array.
Else let lower be the element in position floor(pos) in the array and let upper be the next element in the array. Return lower + d * (upper - lower)

To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[]) is the one determined by Double.compareTo(Double). This ordering makes Double.NaN larger than any other value (including Double.POSITIVE_INFINITY). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

Since percentile estimation usually involves interpolation between array elements, arrays containing NaN or infinite values will often result in NaN or infinite values returned.

Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).

Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various evaluate methods. This greatly improves efficiency, both for a single percentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of the evaluate(double[], double) or setData(double[]) methods and thereafter evaluate(double) with just the percentile provided.

Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment() or clear() method, it must be synchronized externally.

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

Percentile.EstimationType

An enum for various estimation strategies of a percentile referred in wikipedia on quantile with the names of enum matching those of types mentioned in wikipedia.
Constructor Summary

Constructors

Modifier

Constructor

Description

Percentile()

Constructs a Percentile with the following defaults.

Percentile(double quantile)

Constructs a Percentile with the specific quantile value and the following default method type: Percentile.EstimationType.LEGACY default NaN strategy: NaNStrategy.REMOVED a Kth Selector : KthSelector

protected

Percentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector)

Constructs a Percentile with the specific quantile value, Percentile.EstimationType, NaNStrategy and KthSelector.

Percentile(Percentile original)

Copy constructor, creates a new Percentile identical to the original
Method Summary

Modifier and Type

Method

Description

Percentile

copy()

Returns a copy of the statistic with the same internal state.

double

evaluate(double p)

Returns the result of evaluating the statistic over the stored data.

double

evaluate(double[] values, double p)

Returns an estimate of the pth percentile of the values in the values array.

double

evaluate(double[] values, int start, int length)

Returns an estimate of the quantileth percentile of the designated values in the values array.

double

evaluate(double[] values, int begin, int length, double p)

Returns an estimate of the pth percentile of the values in the values array, starting with the element in (0-based) position begin in the array and including length values.

Percentile.EstimationType

getEstimationType()

Get the estimation type used for computation.

KthSelector

getKthSelector()

Get the kthSelector used for computation.

NaNStrategy

getNaNStrategy()

Get the NaN Handling strategy used for computation.

PivotingStrategy

getPivotingStrategy()

Get the PivotingStrategy used in KthSelector for computation.

double

getQuantile()

Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).

protected double[]

getWorkArray(double[] values, int begin, int length)

Get the work array to operate.

void

setData(double[] values)

Set the data array.

void

setData(double[] values, int begin, int length)

Set the data array.

void

setQuantile(double p)

Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).

Percentile

withEstimationType(Percentile.EstimationType newEstimationType)

Build a new instance similar to the current one except for the estimation type.

Percentile

withKthSelector(KthSelector newKthSelector)

Build a new instance similar to the current one except for the kthSelector instance specifically set.

Percentile

withNaNStrategy(NaNStrategy newNaNStrategy)

Build a new instance similar to the current one except for the NaN handling strategy.

Methods inherited from class org.hipparchus.stat.descriptive.AbstractUnivariateStatistic
evaluate, getData, getDataRef

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.hipparchus.stat.descriptive.UnivariateStatistic
evaluate

Constructor Details
- Percentile
  
  public Percentile()
  Constructs a Percentile with the following defaults.
  
  default quantile: 50.0, can be reset with setQuantile(double)
  
  default estimation type: Percentile.EstimationType.LEGACY, can be reset with withEstimationType(EstimationType)
  
  default NaN strategy: NaNStrategy.REMOVED, can be reset with withNaNStrategy(NaNStrategy)
  
  a KthSelector that makes use of PivotingStrategy.MEDIAN_OF_3, can be reset with withKthSelector(KthSelector)
- Percentile
  
  public Percentile(double quantile) throws MathIllegalArgumentException
  Constructs a Percentile with the specific quantile value and the following
  
  default method type: Percentile.EstimationType.LEGACY
  
  default NaN strategy: NaNStrategy.REMOVED
  
  a Kth Selector : KthSelector
  Parameters:
  
  quantile - the quantile
  
  Throws:
  
  MathIllegalArgumentException - if p is not greater than 0 and less than or equal to 100
- Percentile
  
  public Percentile(Percentile original) throws NullArgumentException
  
  Copy constructor, creates a new Percentile identical to the original
  
  Parameters:
  
  original - the Percentile instance to copy
  
  Throws:
  
  NullArgumentException - if original is null
- Percentile
  
  protected Percentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector) throws MathIllegalArgumentException
  
  Constructs a Percentile with the specific quantile value, Percentile.EstimationType, NaNStrategy and KthSelector.
  
  Parameters:
  
  quantile - the quantile to be computed
  
  estimationType - one of the percentile estimation types
  
  nanStrategy - one of NaNStrategy to handle with NaNs
  
  kthSelector - a KthSelector to use for pivoting during search
  
  Throws:
  
  MathIllegalArgumentException - if p is not within (0,100]
  
  NullArgumentException - if type or NaNStrategy passed is null
Method Details
- setData
  
  public void setData(double[] values)
  
  Set the data array.
  The stored value is a copy of the parameter array, not the array itself.
  Overrides:
  
  setData in class AbstractUnivariateStatistic
  
  Parameters:
  
  values - data array to store (may be null to remove stored data)
  
  See Also:
  
  AbstractUnivariateStatistic.evaluate()
- setData
  
  public void setData(double[] values, int begin, int length) throws MathIllegalArgumentException
  
  Set the data array. The input array is copied, not referenced.
  Overrides:
  
  setData in class AbstractUnivariateStatistic
  
  Parameters:
  
  values - data array to store
  
  begin - the index of the first element to include
  
  length - the number of elements to include
  
  Throws:
  
  MathIllegalArgumentException - if values is null or the indices are not valid
  
  See Also:
  
  AbstractUnivariateStatistic.evaluate()
- evaluate
  
  public double evaluate(double p) throws MathIllegalArgumentException
  
  Returns the result of evaluating the statistic over the stored data.
  The stored array is the one which was set by previous calls to setData(double[])
  
  Parameters:
  
  p - the percentile value to compute
  
  Returns:
  
  the value of the statistic applied to the stored data
  
  Throws:
  
  MathIllegalArgumentException - if p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
- evaluate
  
  public double evaluate(double[] values, int start, int length) throws MathIllegalArgumentException
  Returns an estimate of the quantileth percentile of the designated values in the values array.
  The quantile estimated is determined by the quantile property.
  
  Returns Double.NaN if length = 0
  
  Returns (for any value of quantile) values[begin] if length = 1
  
  Throws MathIllegalArgumentException if values is null, or start or length is invalid
  
  See Percentile for a description of the percentile estimation algorithm used.
  Specified by:
  
  evaluate in interface MathArrays.Function
  
  Specified by:
  
  evaluate in interface UnivariateStatistic
  
  Specified by:
  
  evaluate in class AbstractUnivariateStatistic
  
  Parameters:
  
  values - the input array
  
  start - index of the first array element to include
  
  length - the number of elements to include
  
  Returns:
  
  the percentile value
  
  Throws:
  
  MathIllegalArgumentException - if the parameters are not valid
- evaluate
  
  public double evaluate(double[] values, double p) throws MathIllegalArgumentException
  Returns an estimate of the pth percentile of the values in the values array.
  
  Returns Double.NaN if values has length 0
  
  Returns (for any value of p) values[0] if values has length 1
  
  Throws MathIllegalArgumentException if values is null or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
  
  The default implementation delegates to evaluate(double[], int, int, double) in the natural way.
  Parameters:
  
  values - input array of values
  
  p - the percentile value to compute
  
  Returns:
  
  the percentile value or Double.NaN if the array is empty
  
  Throws:
  
  MathIllegalArgumentException - if values is null or p is invalid
- evaluate
  
  public double evaluate(double[] values, int begin, int length, double p) throws MathIllegalArgumentException
  Returns an estimate of the pth percentile of the values in the values array, starting with the element in (0-based) position begin in the array and including length values.
  Calls to this method do not modify the internal quantile state of this statistic.
  
  Returns Double.NaN if length = 0
  
  Returns (for any value of p) values[begin] if length = 1
  
  Throws MathIllegalArgumentException if values is null , begin or length is invalid, or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
  
  See Percentile for a description of the percentile estimation algorithm used.
  Parameters:
  
  values - array of input values
  
  p - the percentile to compute
  
  begin - the first (0-based) element to include in the computation
  
  length - the number of array elements to include
  
  Returns:
  
  the percentile value
  
  Throws:
  
  MathIllegalArgumentException - if the parameters are not valid or the input array is null
- getQuantile
  
  public double getQuantile()
  
  Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
  
  Returns:
  
  quantile set while construction or setQuantile(double)
- setQuantile
  
  public void setQuantile(double p) throws MathIllegalArgumentException
  
  Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
  
  Parameters:
  
  p - a value between 0 < p <= 100
  
  Throws:
  
  MathIllegalArgumentException - if p is not greater than 0 and less than or equal to 100
- copy
  
  public Percentile copy()
  
  Returns a copy of the statistic with the same internal state.
  
  Specified by:
  
  copy in interface UnivariateStatistic
  
  Specified by:
  
  copy in class AbstractUnivariateStatistic
  
  Returns:
  
  a copy of the statistic
- getWorkArray
  
  protected double[] getWorkArray(double[] values, int begin, int length)
  
  Get the work array to operate. Makes use of prior storedData if it exists or else do a check on NaNs and copy a subset of the array defined by begin and length parameters. The set nanStrategy will be used to either retain/remove/replace any NaNs present before returning the resultant array.
  
  Parameters:
  
  values - the array of numbers
  
  begin - index to start reading the array
  
  length - the length of array to be read from the begin index
  
  Returns:
  
  work array sliced from values in the range [begin,begin+length)
  
  Throws:
  
  MathIllegalArgumentException - if values or indices are invalid
- getEstimationType
  
  public Percentile.EstimationType getEstimationType()
  
  Get the estimation type used for computation.
  
  Returns:
  
  the estimationType set
- withEstimationType
  
  public Percentile withEstimationType(Percentile.EstimationType newEstimationType)
  Build a new instance similar to the current one except for the estimation type.
  This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
  Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);
  
  If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.
  Parameters:
  
  newEstimationType - estimation type for the new instance
  
  Returns:
  
  a new instance, with changed estimation type
  
  Throws:
  
  NullArgumentException - when newEstimationType is null
- getNaNStrategy
  
  public NaNStrategy getNaNStrategy()
  
  Get the NaN Handling strategy used for computation.
  
  Returns:
  
  NaN Handling strategy set during construction
- withNaNStrategy
  
  public Percentile withNaNStrategy(NaNStrategy newNaNStrategy)
  Build a new instance similar to the current one except for the NaN handling strategy.
  This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
  Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);
  
  If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.
  Parameters:
  
  newNaNStrategy - NaN strategy for the new instance
  
  Returns:
  
  a new instance, with changed NaN handling strategy
  
  Throws:
  
  NullArgumentException - when newNaNStrategy is null
- getKthSelector
  
  public KthSelector getKthSelector()
  
  Get the kthSelector used for computation.
  
  Returns:
  
  the kthSelector set
- getPivotingStrategy
  
  public PivotingStrategy getPivotingStrategy()
  
  Get the PivotingStrategy used in KthSelector for computation.
  
  Returns:
  
  the pivoting strategy set
- withKthSelector
  
  public Percentile withKthSelector(KthSelector newKthSelector)
  Build a new instance similar to the current one except for the kthSelector instance specifically set.
  This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
  Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(newKthSelector);
  
  If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.
  Parameters:
  
  newKthSelector - KthSelector for the new instance
  
  Returns:
  
  a new instance, with changed KthSelector
  
  Throws:
  
  NullArgumentException - when newKthSelector is null

Class Percentile

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class org.hipparchus.stat.descriptive.AbstractUnivariateStatistic

Methods inherited from class java.lang.Object

Methods inherited from interface org.hipparchus.stat.descriptive.UnivariateStatistic

Constructor Details

Percentile

Percentile

Percentile

Percentile

Method Details

setData

setData

evaluate

evaluate

evaluate

evaluate

getQuantile

setQuantile

copy

getWorkArray

getEstimationType

withEstimationType

getNaNStrategy

withNaNStrategy

getKthSelector

getPivotingStrategy

withKthSelector