Class SimpleRegression
- java.lang.Object
- 
- org.hipparchus.stat.regression.SimpleRegression
 
- 
- All Implemented Interfaces:
- Serializable,- UpdatingMultipleLinearRegression
 
 public class SimpleRegression extends Object implements Serializable, UpdatingMultipleLinearRegression Estimates an ordinary least squares regression model with one independent variable.y = intercept + slope * xStandard errors for interceptandslopeare available as well as ANOVA, r-square and Pearson's r statistics.Observations (x,y pairs) can be added to the model one at a time or they can be provided in a 2-dimensional array. The observations are not stored in memory, so there is no limit to the number of observations that can be added to the model. * Usage Notes: -  When there are fewer than two observations in the model, or when
 there is no variation in the x values (i.e. all x values are the same)
 all statistics return NaN. At least two observations with different x coordinates are required to estimate a bivariate regression model.
- Getters for the statistics always compute values based on the current set of observations -- i.e., you can get statistics, then add more data and get updated statistics without using a new instance. There is no "compute" method that updates all statistics. Each of the getters performs the necessary computations to return the requested statistic.
-  The intercept term may be suppressed by passing falseto theSimpleRegression(boolean)constructor. When thehasInterceptproperty is false, the model is estimated without a constant term andgetIntercept()returns0.
 - See Also:
- Serialized Form
 
- 
- 
Constructor SummaryConstructors Constructor Description SimpleRegression()Create an empty SimpleRegression instanceSimpleRegression(boolean includeIntercept)Create a SimpleRegression instance, specifying whether or not to estimate an intercept.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddData(double[][] data)Adds the observations represented by the elements indata.voidaddData(double x, double y)Adds the observation (x,y) to the regression data set.voidaddObservation(double[] x, double y)Adds one observation to the regression model.voidaddObservations(double[][] x, double[] y)Adds a series of observations to the regression model.voidappend(SimpleRegression reg)Appends data from another regression calculation to this one.voidclear()Clears all data from the model.doublegetIntercept()Returns the intercept of the estimated regression line, ifhasIntercept()is true; otherwise 0.doublegetInterceptStdErr()Returns the standard error of the intercept estimate, usually denoted s(b0).doublegetMeanSquareError()Returns the sum of squared errors divided by the degrees of freedom, usually abbreviated MSE.longgetN()Returns the number of observations that have been added to the model.doublegetR()Returns Pearson's product moment correlation coefficient, usually denoted r.doublegetRegressionSumSquares()Returns the sum of squared deviations of the predicted y values about their mean (which equals the mean of y).doublegetRSquare()Returns the coefficient of determination, usually denoted r-square.doublegetSignificance()Returns the significance level of the slope (equiv) correlation.doublegetSlope()Returns the slope of the estimated regression line.doublegetSlopeConfidenceInterval()Returns the half-width of a 95% confidence interval for the slope estimate.doublegetSlopeConfidenceInterval(double alpha)Returns the half-width of a (100-100*alpha)% confidence interval for the slope estimate.doublegetSlopeStdErr()Returns the standard error of the slope estimate, usually denoted s(b1).doublegetSumOfCrossProducts()Returns the sum of crossproducts, xi*yi.doublegetSumSquaredErrors()Returns the sum of squared errors (SSE) associated with the regression model.doublegetTotalSumSquares()Returns the sum of squared deviations of the y values about their mean.doublegetXSumSquares()Returns the sum of squared deviations of the x values about their mean.booleanhasIntercept()Returns true if the model includes an intercept term.doublepredict(double x)Returns the "predicted"yvalue associated with the suppliedxvalue, based on the data that has been added to the model when this method is activated.RegressionResultsregress()Performs a regression on data present in buffers and outputs a RegressionResults object.RegressionResultsregress(int[] variablesToInclude)Performs a regression on data present in buffers including only regressors indexed in variablesToInclude and outputs a RegressionResults objectvoidremoveData(double[][] data)Removes observations represented by the elements indata.voidremoveData(double x, double y)Removes the observation (x,y) from the regression data set.
 
- 
- 
- 
Constructor Detail- 
SimpleRegressionpublic SimpleRegression() Create an empty SimpleRegression instance
 - 
SimpleRegressionpublic SimpleRegression(boolean includeIntercept) Create a SimpleRegression instance, specifying whether or not to estimate an intercept.Use falseto estimate a model with no intercept. When thehasInterceptproperty is false, the model is estimated without a constant term andgetIntercept()returns0.- Parameters:
- includeIntercept- whether or not to include an intercept term in the regression model
 
 
- 
 - 
Method Detail- 
addDatapublic void addData(double x, double y)Adds the observation (x,y) to the regression data set.Uses updating formulas for means and sums of squares defined in "Algorithms for Computing the Sample Variance: Analysis and Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. 1983, American Statistician, vol. 37, pp. 242-247, referenced in Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985. - Parameters:
- x- independent variable value
- y- dependent variable value
 
 - 
appendpublic void append(SimpleRegression reg) Appends data from another regression calculation to this one.The mean update formulae are based on a paper written by Philippe Pébay: Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments, 2008, Technical Report SAND2008-6212, Sandia National Laboratories. - Parameters:
- reg- model to append data from
 
 - 
removeDatapublic void removeData(double x, double y)Removes the observation (x,y) from the regression data set.Mirrors the addData method. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window. The method has no effect if there are no points of data (i.e. n=0)- Parameters:
- x- independent variable value
- y- dependent variable value
 
 - 
addDatapublic void addData(double[][] data) throws MathIllegalArgumentExceptionAdds the observations represented by the elements indata.(data[0][0],data[0][1])will be the first observation, then(data[1][0],data[1][1]), etc.This method does not replace data that has already been added. The observations represented by dataare added to the existing dataset.To replace all data, use clear()before adding the new data.- Parameters:
- data- array of observations to be added
- Throws:
- MathIllegalArgumentException- if the length of- data[i]is not greater than or equal to 2
 
 - 
addObservationpublic void addObservation(double[] x, double y) throws MathIllegalArgumentExceptionAdds one observation to the regression model.- Specified by:
- addObservationin interface- UpdatingMultipleLinearRegression
- Parameters:
- x- the independent variables which form the design matrix
- y- the dependent or response variable
- Throws:
- MathIllegalArgumentException- if the length of- xdoes not equal the number of independent variables in the model
 
 - 
addObservationspublic void addObservations(double[][] x, double[] y) throws MathIllegalArgumentExceptionAdds a series of observations to the regression model. The lengths of x and y must be the same and x must be rectangular.- Specified by:
- addObservationsin interface- UpdatingMultipleLinearRegression
- Parameters:
- x- a series of observations on the independent variables
- y- a series of observations on the dependent variable The length of x and y must be the same
- Throws:
- MathIllegalArgumentException- if- xis not rectangular, does not match the length of- yor does not contain sufficient data to estimate the model
 
 - 
removeDatapublic void removeData(double[][] data) Removes observations represented by the elements indata.If the array is larger than the current n, only the first n elements are processed. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window. To remove all data, use clear().- Parameters:
- data- array of observations to be removed
 
 - 
clearpublic void clear() Clears all data from the model.- Specified by:
- clearin interface- UpdatingMultipleLinearRegression
 
 - 
getNpublic long getN() Returns the number of observations that have been added to the model.- Specified by:
- getNin interface- UpdatingMultipleLinearRegression
- Returns:
- n number of observations that have been added.
 
 - 
predictpublic double predict(double x) Returns the "predicted"yvalue associated with the suppliedxvalue, based on the data that has been added to the model when this method is activated.predict(x) = intercept + slope * x* Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double,NaNis returned.
 - Parameters:
- x- input- xvalue
- Returns:
- predicted yvalue
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getInterceptpublic double getIntercept() Returns the intercept of the estimated regression line, ifhasIntercept()is true; otherwise 0.The least squares estimate of the intercept is computed using the normal equations. The intercept is sometimes denoted b0. Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double,NaNis returned.
 - Returns:
- the intercept of the regression line if the model includes an intercept; 0 otherwise
- See Also:
- SimpleRegression(boolean)
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
hasInterceptpublic boolean hasIntercept() Returns true if the model includes an intercept term.- Specified by:
- hasInterceptin interface- UpdatingMultipleLinearRegression
- Returns:
- true if the regression includes an intercept; false otherwise
- See Also:
- SimpleRegression(boolean)
 
 - 
getSlopepublic double getSlope() Returns the slope of the estimated regression line.The least squares estimate of the slope is computed using the normal equations. The slope is sometimes denoted b1. * Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double.NaNis returned.
 - Returns:
- the slope of the regression line
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getSumSquaredErrorspublic double getSumSquaredErrors() Returns the sum of squared errors (SSE) associated with the regression model.The sum is computed using the computational formula SSE = SYY - (SXY * SXY / SXX)where SYYis the sum of the squared deviations of the y values about their mean,SXXis similarly defined andSXYis the sum of the products of x and y mean deviations.The sums are accumulated using the updating algorithm referenced in addData(double, double).The return value is constrained to be non-negative - i.e., if due to rounding errors the computational formula returns a negative result, 0 is returned. * Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double,NaNis returned.
 - Returns:
- sum of squared errors associated with the regression model
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getTotalSumSquarespublic double getTotalSumSquares() Returns the sum of squared deviations of the y values about their mean.This is defined as SSTO here. If n < 2, this returnsDouble.NaN.- Returns:
- sum of squared deviations of y values
 
 - 
getXSumSquarespublic double getXSumSquares() Returns the sum of squared deviations of the x values about their mean.If n < 2, this returnsDouble.NaN.- Returns:
- sum of squared deviations of x values
 
 - 
getSumOfCrossProductspublic double getSumOfCrossProducts() Returns the sum of crossproducts, xi*yi.- Returns:
- sum of cross products
 
 - 
getRegressionSumSquarespublic double getRegressionSumSquares() Returns the sum of squared deviations of the predicted y values about their mean (which equals the mean of y).This is usually abbreviated SSR or SSM. It is defined as SSM here * Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double.NaNis returned.
 - Returns:
- sum of squared deviations of predicted y values
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getMeanSquareErrorpublic double getMeanSquareError() Returns the sum of squared errors divided by the degrees of freedom, usually abbreviated MSE.If there are fewer than three data pairs in the model, or if there is no variation in x, this returnsDouble.NaN.- Returns:
- sum of squared deviations of y values
 
 - 
getRpublic double getR() Returns Pearson's product moment correlation coefficient, usually denoted r.* Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double,NaNis returned.
 - Returns:
- Pearson's r
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getRSquarepublic double getRSquare() Returns the coefficient of determination, usually denoted r-square.* Preconditions: - At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, Double,NaNis returned.
 - Returns:
- r-square
 
- At least two observations (with at least two different x values)
 must have been added before invoking this method. If this method is
 invoked before a model can be estimated, 
 - 
getInterceptStdErrpublic double getInterceptStdErr() Returns the standard error of the intercept estimate, usually denoted s(b0).If there are fewer that three observations in the model, or if there is no variation in x, this returns Additionally, aDouble.NaN.Double.NaNis returned when the intercept is constrained to be zero- Returns:
- standard error associated with intercept estimate
 
 - 
getSlopeStdErrpublic double getSlopeStdErr() Returns the standard error of the slope estimate, usually denoted s(b1).If there are fewer that three data pairs in the model, or if there is no variation in x, this returns Double.NaN.- Returns:
- standard error associated with slope estimate
 
 - 
getSlopeConfidenceIntervalpublic double getSlopeConfidenceInterval() throws MathIllegalArgumentExceptionReturns the half-width of a 95% confidence interval for the slope estimate.The 95% confidence interval is (getSlope() - getSlopeConfidenceInterval(), getSlope() + getSlopeConfidenceInterval())If there are fewer that three observations in the model, or if there is no variation in x, this returns Double.NaN.* Usage Note: The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution. - Returns:
- half-width of 95% confidence interval for the slope estimate
- Throws:
- MathIllegalArgumentException- if the confidence interval can not be computed.
 
 - 
getSlopeConfidenceIntervalpublic double getSlopeConfidenceInterval(double alpha) throws MathIllegalArgumentExceptionReturns the half-width of a (100-100*alpha)% confidence interval for the slope estimate.The (100-100*alpha)% confidence interval is (getSlope() - getSlopeConfidenceInterval(), getSlope() + getSlopeConfidenceInterval())To request, for example, a 99% confidence interval, use alpha = .01Usage Note: 
 The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution.Preconditions: - If there are fewer that three observations in the
 model, or if there is no variation in x, this returns
 Double.NaN.
- (0 < alpha < 1); otherwise an- MathIllegalArgumentExceptionis thrown.
 - Parameters:
- alpha- the desired significance level
- Returns:
- half-width of 95% confidence interval for the slope estimate
- Throws:
- MathIllegalArgumentException- if the confidence interval can not be computed.
 
- If there are fewer that three observations in the
 model, or if there is no variation in x, this returns
 
 - 
getSignificancepublic double getSignificance() Returns the significance level of the slope (equiv) correlation.Specifically, the returned value is the smallest alphasuch that the slope confidence interval with significance level equal toalphadoes not include0. On regression output, this is often denotedProb(|t| > 0)Usage Note: 
 The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution.If there are fewer that three observations in the model, or if there is no variation in x, this returns Double.NaN.- Returns:
- significance level for slope/correlation
- Throws:
- MathIllegalStateException- if the significance level can not be computed.
 
 - 
regresspublic RegressionResults regress() throws MathIllegalArgumentException Performs a regression on data present in buffers and outputs a RegressionResults object.If there are fewer than 3 observations in the model and hasInterceptis true aMathIllegalArgumentExceptionis thrown. If there is no intercept term, the model must contain at least 2 observations.- Specified by:
- regressin interface- UpdatingMultipleLinearRegression
- Returns:
- RegressionResults acts as a container of regression output
- Throws:
- MathIllegalArgumentException- if the model is not correctly specified
- MathIllegalArgumentException- if there is not sufficient data in the model to estimate the regression parameters
 
 - 
regresspublic RegressionResults regress(int[] variablesToInclude) throws MathIllegalArgumentException Performs a regression on data present in buffers including only regressors indexed in variablesToInclude and outputs a RegressionResults object- Specified by:
- regressin interface- UpdatingMultipleLinearRegression
- Parameters:
- variablesToInclude- an array of indices of regressors to include
- Returns:
- RegressionResults acts as a container of regression output
- Throws:
- MathIllegalArgumentException- if the variablesToInclude array is null or zero length
- MathIllegalArgumentException- if a requested variable is not present in model
 
 
- 
 
-