Perform analysis of variance for between-subjects factors using anova.
Perform multivariate analysis of variance using manova.
Perform hypothesis tests on the coefficients using coeftest.
Perform repeated measures analysis of variance using ranova.
Test for sphericity (compound symmetry) with Mauchly's test using mauchly.
Compute summary statistics organized by group using grpstats.
Perform multiple comparisons of marginal means using multcompare.
Make predictions on new data with the fitted repeated measures model using predict.
Generate random data with the fitted repeated measures model at new design points using random.
For the properties and methods of this object, see the RepeatedMeasuresModel class page.
You can now use the new fitcsvm function to train an SVM classifier for one- or two-class learning. fitcsvm creates an object of the new class ClassificationSVM or existing class ClassificationPartitionedModel.
ClassificationSVM is a new class for accessing and performing operations on the training data. CompactClassificationSVM is a new class for storing configurations of trained models without storing training data. The syntax and methods resemble those in the existing ClassificationTree and CompactClassificationTree classes.
The new fitcsvm function and ClassificationSVM and CompactClassificationSVM classes include the functionality of the svmtrain and svmclassify functions. ClassificationSVM provides several benefits compared to the svmtrain and svmclassify functions:
The new functionality
Supports computation of soft classification scores
Supports fitting posterior probabilities
Has improved training speed, especially on big data with well-separated classes by providing shrinkage
Allows a warm restart by accepting an initial α value
Allows training to resume after the maximum number of iterations is exceeded
Supports robust learning in the presence of outliers
ClassificationSVM is built on the same framework as ClassificationTree, ClassificationDiscriminant, and ClassificationKNN, so you have a variety of options and methods, including:
For all methods and properties of the new objects, see the ClassificationSVM and CompactClassificationSVM class pages.
There are two new methods for the objects created using the evalclusters function:
addK adds additional number of clusters to be evaluated. This method applies to all classes of cluster evaluation (i.e., clustering.evaluation.GapEvaluation, clustering.evaluation.SilhouetteEvaluation, clustering.evaluation.CalinskiHarabaszEvaluation, and clustering.evaluation.DaviesBouldinEvaluation).
increaseB increases the number of reference data sets for gap criterion simulations. This method applies to the clustering.evaluation.GapEvaluation class.
The default value of the 'SearchMethod' name-value pair argument for clustering.evaluation.GapEvaluation objects is now always 'globalMaxSE'.
The default value of the 'SearchMethod' name-value pair argument for clustering.evaluation.GapEvaluation objects is now always 'globalMaxSE' and does not change depending on the value of the 'KList' name-value pair argument.
multcompare now returns the p-value of each pairwise comparison of group means. multcompare returns the p-value in the sixth column of its first output argument. The p-value is the overall significance level at which the individual comparison is borderline significant.
The first output argument of multcompare now has six columns, instead of five. The sixth column contains the p-value.
The following functions and methods now accept table inputs as alternative to dataset array inputs.
The following functions, methods, and model properties now return a table rather than a dataset array.
|Functions and Methods||Class|
|VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Coefficients||LinearModel|
|VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Fitted, Coefficients||GeneralizedLinearModel|
|VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Coefficients||NonLinearModel|
|VariableInfo, ObservationInfo, Variables, Coefficients, ModelCriterion||LinearMixedModel|
*grpstats now matches the output with input type.
The functions and properties listed now return a table instead of a dataset array. You can convert them to dataset arrays using the table2dataset function.
The default value of the 'EmptyAction' name-value pair argument of the kmeans function is now 'singleton'.
To set the value of 'EmptyAction' to 'error', you must explicitly specify 'EmptyAction','error'.
The following are new functions for classification and regression trees, discriminant analysis, nearest neighbors, Naive Bayes classification, and Gaussian mixture models.
|templateTree||ClassificationTree.template or RegressionTree.template|
|Functionality||What Happens When You Use This Functionality?||Use This Instead||Compatibility Considerations|
|ClassificationDiscriminant.fit||Still runs||fitcdiscr||Replace instances of ClassificationDiscriminant.fit with fitcdiscr.|
|ClassificationKNN.fit||Still runs||fitcknn||Replace instances of ClassificationKNN.fit with fitcknn.|
|ClassificationTree.fit||Still runs||fitctree||Replace instances of ClassificationTree.fit with fitctree.|
|RegressionTree.fit||Still runs||fitrtree||Replace instances of RegressionTree.fit with fitrtree.|
|NaiveBayes.fit||Still runs||fitNaiveBayes||Replace instances of NaiveBayes.fit with fitNaiveBayes.|
|gmdistribution.fit||Still runs||fitgmdist||Replace instances of gmdistribution.fit with fitgmdist.|
|ClassificationDiscriminant.template||Still runs||templateDiscriminant||Replace instances of ClassificationDiscriminant.template with templateDiscriminant.|
|ClassificationKNN.template||Still runs||templateKNN||Replace instances of ClassificationKNN.template with templateKNN.|
|ClassificationTree.template or RegressionTree.template||Still runs||templateTree||Replace instances of ClassificationTree.template or RegressionTree.template with templateTree.|
|ClassificationDiscriminant.make||Still runs||makecdiscr||Replace instances of ClassificationDiscriminant.make with makecdiscr.|
LinearMixedModel is a new class for fitting linear mixed-effects (LME) models. Fit multi-level LME models or LME models with nested and/or crossed random effects using the fitlme or fitlmematrix function. You can:
Specify LME models using either the formula notation or via matrix input.
Fit LME models using maximum likelihood (ML) or restricted maximum likelihood (REML).
Specify a covariance pattern for the random effects.
Calculate estimates of best linear unbiased predictors (BLUPs) for random effects.
Perform custom joint hypothesis tests on fixed and random effects.
Compute confidence intervals on fixed effects, random effects, and covariance parameters.
Examine residuals, diagnostic plots, fitted values, and design matrices.
Compare two different models via theoretical or simulated likelihood ratio tests.
Make predictions on new data using the fitted LME model.
Generate random data using the fitted LME model at new design points.
For the properties and methods of this object, see the class page for LinearMixedModel.
Many probability distribution and descriptive statistics functions are now supported for code generation. For a full list of Statistics Toolbox functions that are supported by MATLAB® Coder™, see Statistics Toolbox Functions.
The new function evalclusters estimates the optimal number of clusters for various criterion values, and returns the clustering solution corresponding to the estimated optimal value.
You can provide clustering solutions, ask evalclusters to use one of the built-in clustering algorithms, 'kmeans', 'linkage', or 'gmdistribution', or provide a function handle.
The following criteria are available:
The Calinski-Harabasz (CH) index
The Silhouette index
The Gap statistic
The Davies-Bouldin (DB) index
mvregress now accepts an n-by-(p + 1) design matrix X, when the response Y is an n-by-d matrix with d > 1, where n is the number of observations, p is the number of predictor variables, d is the number of dimensions in the response, and X includes a column of ones for the intercept (constant) term.
Statistics Toolbox now provides upper tail probability calculations for cumulative distribution functions. You can compute the upper tail probabilities using a trailing 'upper' argument in the following functions:
cdf function for probability distribution objects, returned by pd = makedist(distname) or pd = fitdist(X,distname):
Y = cdf('name',X,A,'upper')
Y = cdf('name',X,A,B,'upper')
Y = cdf('name',X,A,B,C,'upper')
Distribution-specific cdf functions:
|Beta||p = betacdf(X,A,B,'upper')|
|Binomial||Y = binocdf(X,N,P,'upper')|
|Chi-square||p = chi2cdf(X,V,'upper')|
|Extreme Value||P = evcdf(X,mu,sigma,'upper') |
[P,PLO,PUP] = evcdf(X,mu,sigma,pcov,'upper')
|Exponential||P = expcdf(X,mu,'upper')|
[P,PLO,PUP] = expcdf(X,mu,pcov,'upper')
|F||P = fcdf(X,V1,V2,'upper')|
|Gamma||P = gamcdf(X,A,B,'upper') |
[P,PLO,PUP] = gamcdf(X,A,B,pcov,'upper')
|Geometric||Y = geocdf(X,P,'upper')|
|Generalized Extreme Value||P = gevcdf(X,k,sigma,mu,'upper')|
|Generalized Pareto||P = gpcdf(X,sigma,theta,'upper')|
|Hypergeometric||P = hygecdf(X,M,K,N,'upper')|
|Lognormal||P = logncdf(X,mu,sigma,'upper')|
[P,PLO,PUP] = logncdf(X,mu,sigma,pcov,'upper')
|Negative Binomial||Y = nbincdf(X,R,P,'upper')|
|Non-central F||P = ncfcdf(X,NU1,NU2,DELTA,'upper')|
|Non-central t||P = nctcdf(X,NU,DELTA,'upper')|
|Non-central Chi-square||P = ncx2cdf(X,V,DELTA,'upper')|
|Normal||P = normcdf(X,mu,sigma,'upper')|
[P,PLO,PUP] = normcdf(X,mu,sigma,pcov,'upper')
|Poisson||P = poisscdf(X,lambda,'upper')|
|t||P = tcdf(X,V,'upper')|
|Rayleigh||P = raylcdf(X,B,'upper')|
|Uniform Discrete||P = unidcdf(X,N,'upper')|
|Uniform Continuous||P = unidcdf(X,A,B,'upper')|
|Weibull||P = wblcdf(X,A,B,'upper')|
[P,PLO,PUP] = wblcdf(X,A,B,pcov,'upper')
The new function partialcorri computes linear partial correlation coefficients with internal adjustments. You can compute partial correlation between pairs of variables in Y and X, adjusting for the remaining variables in X, or between pairs of variables in Y and X, adjusting for the remaining variables in X, after first controlling both X and Y for the variables in Z.
You can also:
Specify whether to use Pearson or Spearman partial correlations.
Specify how to handle missing values.
Perform hypotheses test of zero correlation against a one-sided or two-sided alternative.
There are new functions for the fitting and stepwise algorithms of linear and generalized linear models, and the fitting algorithm of nonlinear models. The new functions are as follows.
|Functionality||What Happens When You Use This Functionality?||Use This Instead||Compatibility Considerations|
|LinearModel.fit||Still runs||fitlm||Replace instances of LinearModel.fit with fitlm|
|LinearModel.stepwise||Still runs||stepwiselm||Replace instances of LinearModel.stepwise with stepwiselm|
|GeneralizedLinearModel.fit||Still runs||fitglm||Replace instances of GeneralizedLinearModel.fit with fitglm|
|GeneralizedLinearModel.stepwise||Still runs||stepwiseglm||Replace instances of GeneralizedLinearModel.stepwise with stepwiseglm|
|NonLinearModel.fit||Still runs||fitnlm||Replace instances of NonLinearModel.fit with fitnlm|
Two new features handle missing data in principal component analysis:
The new function adtest performs the Anderson-Darling goodness-of-fit test. adtest can perform:
Simple test: Test against a specific distribution with parameters specified. You can test against any continuous univariate parametric distribution.
Composite test: Test against a specified distribution family (also called an omnibus test). You can test against the normal, exponential, extreme-value, lognormal, or weibull distribution families.
Improved efficiency of TreeBagger when used in parallel mode.
You can specify the number of surrogate splits saved in decision trees using the 'surrogate' name-value pair argument in the fit and template methods of the ClassificationTree and RegressionTree classes.
ClassificationTree.fit and ClassificationTree.template provide several heuristic methods for splitting on categorical predictors with many levels. Use the 'AlgorithmForCategorical' name-value pair argument to specify the algorithm to find the best split and the 'MaxCat' name-value pair argument to specify the maximum number of categories you allow.
The scatterhist function has these name-value pair arguments:
'Group' lets you specify a grouping variable and produces a grouped scatter plot.
'Kernel' lets you use grouped kernel density plots instead of overall histograms for the marginal distributions.
Additional options let you change colors, line properties, legends, and more.
These functions now accept additional error models and fixed or fit-dependent weights.
Additional functionality changes are:
disp (NonLinearModel method) shows only estimable coefficients, and shows NaN for inestimable coefficients.
Ftest (NonLinearModel method) automatically decides whether to compare the full model against an intercept-only model or zero.
NonLinearModel properties such as Diagnostics, Residuals, LogLikelihood, SSE, and SST account for weights and error models.
Parametric hypothesis test functions accept optional input arguments as name-value pair arguments.
|adtest||Anderson-Darling goodness-of-fit test|
|kstest||One-sample Kolmogorov-Smirnov test|
|kstest2||Two-sample Kolmogorov-Smirnov test|
|vartest||One-sample variance chi-square test|
|vartest2||Two-sample variance F-test|
|vartestn||Variance test across multiple groups|
New probability distribution objects provide the following new functionality:
Create a distribution without fitting to data using the new makedist function.
Assign directly to parameter values.
Create truncated distributions.
Create and operate on arrays of distribution objects.
Create custom distributions. To begin, use dfittool and select Edit > Define Custom Distributions. Use the provided template to define the 'Laplace' distribution, or modify it to create your own.
Compute and plot likelihood ratio confidence intervals and profile likelihood for fitted probability distributions.
Additional distributions in the probability distribution framework:
You can continue fitting distributions to data using the existing fitdist function.
The class names of probability distribution objects returned by fitdist are different than in earlier releases.
There are three new boosting algorithms for classification:
RUSBoost (boosting by random undersampling) for imbalanced data (data in which one class has many more observations than the other).
LPBoost (linear programming) and TotalBoost (totally corrective boosting) which self-terminate, can lead to a sparse ensemble, and can be used for multiclass boosting.
There is a new probability distribution object for the Burr Type XII distribution, a three-parameter family of continuous distributions on the real line. Use fitdist to fit this distribution to data. Use ProbDistUnivParam to specify the distribution parameters directly. Either function produces a distribution you can use to generate random samples or compute functions such as pdf and cdf.
You can now import data from a file directly into a dataset array using the MATLAB Import Tool.
The new pca function includes additional functionality for principal component analysis. Features of pca include:
Handling of NaN as missing data values.
Weighted principal component analysis with user-specified weights.
Choice of SVD or EIG algorithm for computing principal components.
Option to specify number of components to return.
Option to not center before computing principal components.
Statistics Toolbox now supports parallel execution for kmeans.
The dendrogram function has new options for reordering the nodes of hierarchical binary cluster trees:
The reorder option allows you to specify a permutation vector for the order of nodes in a dendrogram plot.
The checkcrossings option checks whether a requested permutation vector leads to crossing branches in a dendrogram plot.
The function optimalleaforder generates an optimal permutation of nodes.
You can add a vector of observation weights, or a handle to a function that returns a vector of observation weights, to these functions:
For an example of weighted fitting, see Weighted Nonlinear Regression.
Use either Weights or RobustWgtFun when performing weighted nonlinear regression.
The diagnostics in the Diagnostics dataset array for LinearModel objects are in a new order, and no longer appear in the Variables editor. The new order is:
To access the correct diagnostics, you should update any code that indexes the diagnostics dataset array columns by number.
|Functionality||What Happens When You Use This Functionality?||Use This Instead||Compatibility Considerations|
|princomp||Still runs||pca||Replace instances of princomp with pca|
Lets you fit models with both categorical and continuous predictor variables
Contains information about the quality of the fit, such as residuals and ANOVA tables
Lets you easily plot the fit
Allows for automatic or manual exclusion of unimportant variables
Enables robust fitting for reduced influence of outliers
Lets you specify quadratic and other models using a symbolic formula
Enables stepwise model selection
There are similar improvements for generalized linear and nonlinear modeling using the GeneralizedLinearModel and NonLinearModel classes. For details, see the class reference pages in the reference material, or Linear Regression, Stepwise Regression, Robust Regression — Reduce Outlier Effects, Generalized Linear Regression, or Nonlinear Regression in the User's Guide.
You can now edit, sort, plot, and select portions of dataset arrays from the MATLAB Variable Editor. For details, see Using Dataset Arrays in the User's Guide.
The lassoglm function regularizes generalized linear models. Use lassoglm to examine model alternatives and to constrain or remove redundant or unimportant variables in generalized linear regression. For details, see the function reference page, or Lasso Regularization of Generalized Linear Models in the User's Guide.
ClassificationKNN.fit creates a classification model that performs k-nearest neighbor classification. You can check the quality of the model with cross validation or resubstitution. For details, see the ClassificationKNN page in the reference material, or Classification Using Nearest Neighbors in the User's Guide.
ClassificationDiscriminant models now have two parameters, Gamma and Delta, for regularization and lowering the number of variables. Set Gamma to regularize the discriminant. Set Delta to eliminate variables. Use cvshrink to obtain optimal Gamma and Delta parameters by cross validation. For details, see the reference pages, or Regularize a Discriminant Analysis Classifier in the User's Guide.
The stepwisefit function now returns the fitted coefficient history in the history.B field.
The WgtFun and Robust options are currently accepted by all functions. To avoid potential future incompatibilities, update code that uses the WgtFun and Robust options to use the RobustWgtFun option.
The ClassificationTree predict method now chooses the class with minimal expected misclassification cost. Previously, it chose the class with maximal posterior probability. The new behavior is consistent with the cvLoss method. Furthermore, both ClassificationDiscriminant and ClassificationKNN predict using minimal expected misclassification cost. For details, see predict and loss.
If you use a nondefault cost matrix, some ClassificationTree classification predictions can differ from those in previous versions.
The fpdf function now accepts a wider range of parameter values, including Inf.
The lasso function incorporates both the lasso regularization algorithm and the elastic net regularization algorithm. Use lasso to remove redundant or unimportant variables in linear regression. The lassoPlot function helps you visualize lasso results, with a variety of coefficient trace plots and a cross-validation plot.
For details, see Lasso and Elastic Net.
You can now use the ClassificationDiscriminant and CompactClassificationDiscriminant classes for classification via discriminant analysis. The syntax and methods resemble those in the existing ClassificationTree and CompactClassificationTree classes. The ClassificationDiscriminant class includes the functionality of the classify function. ClassificationDiscriminant provides several benefits compared to the classify function:
After you fit a classifier, you can predict without refitting.
ClassificationDiscriminant is built on the same framework as ClassificationTree, so you have a variety of options and methods, including:
A choice of cost functions
ClassificationDiscriminant can fit several models, including linear, quadratic, and linear or quadratic with pseudoinverse.
For details, see Discriminant Analysis.
The rangesearch function finds all members of a data set that are within a specified distance of members of another data set. As with the knnsearch function, you can set a variety of distance metrics, or program your own. rangesearch has counterparts that are methods of the ExhaustiveSearcher and KDTreeSearcher classes.
The datasample function samples with or without replacement from a data set. It can also perform weighted sampling, with or without replacement.
The fracfactgen function now allows up to 52 factors, instead of the previous limit of 26 factors. Specify factors as case-sensitive strings, using 'a' through 'z' for the first 26 factors, and 'A' through 'Z' for the remaining factors.
fracfact now checks for an arbitrary level of interaction in confounding, instead of the previous limit of confounding up to products of two factors. Set the MaxInt name-value pair to the level of interaction you want. You can also set names for the factors using the FactorNames name-value pair.
The nlmefit function now returns the covariance matrix of the estimated coefficients as the covb field of the stats structure.
The signrank test now defines ties to be entries that differ by 2*eps or less. Previously, ties were entries that were identical to machine precision.
For R2011b, error and warning message identifiers have changed in Statistics Toolbox.
If you have scripts or functions that use message identifiers that changed, you must update the code to use the new identifiers. Typically, message identifiers are used to turn off specific warning messages, or in code that uses a try/catch statement and performs an action based on a specific error identifier.
For example, if you use the 'resubstitution' method, the 'stats:plsregress:InvalidMCReps' identifier has changed to 'stats:plsregress:InvalidResubMCReps'. If you use the 'resubstitution' method and your code checks for 'stats:plsregress:InvalidMCReps', you must update it to check for 'stats:plsregress:InvalidResubMCReps' instead.
To determine the identifier for a warning, run the following command just after you see the warning:
[MSG,MSGID] = lastwarn;
This command saves the message identifier to the variable MSGID.
To determine the identifier for an error, run the following command just after you see the error:
exception = MException.last; MSGID = exception.identifier;
The new fitensemble function constructs ensembles of decision trees. It provides:
Several popular boosting algorithms (AdaBoostM1, AdaBoostM2, GentleBoost, LogitBoost, and RobustBoost) for classification
Least-squares boosting (LSBoost) for regression
Most TreeBagger functionality for ensembles of bagged decision trees
For details, see Ensemble Methods.
The linkage and clusterdata functions have a new savememory option that can use less memory than before. With savememory set to 'on', the functions do not build a pairwise distance matrix, so use less memory and, depending on problem size, can use less time. You can use the savememory option when:
The linkage method is 'ward', 'centroid', or 'median'
The linkage distance metric is 'euclidean' (default)
The nlmefit and nlmefitsa functions now provide the conditional weighted residuals of the fit. Use this information to assess the quality of the model; see Example: Examining Residuals for Model Verification.
The statset Options structure now includes 'DerivStep', which enables you to set finite differences for gradient estimation.
MATLAB functions generated with the Distribution Fitting Tool now use the fitdist function to create fitted probability distribution objects. The generated functions return probability distribution objects as output arguments.
ncx2cdf is now faster and more accurate for large values of the noncentrality parameter.
If the two categories in a binomial regression model (such as logit or probit) are perfectly separated, the best-fitting model is degenerate with infinite coefficients. In this case, the glmfit function is likely to exceed its iteration limit. glmfit now tries to detect this perfect separation and display a diagnostic message.
mdscale now enforces that, in each column of the output Y, the value with the largest magnitude has a positive sign. This change makes results consistent across releases and platforms—small changes used to lead to sign reversals.
New filter algorithm, relieff, is based on nearest neighbors. The ReliefF algorithm accounts for correlations among predictors by computing the effect of every predictor on the class label (or true response for regression) locally and then integrates these local estimates over the entire predictor space.
nlmefit now supports the following error models:
You can specify an error model with both nlmefitsa and nlmefit.
The nlmefit bic calculation has changed. Now the degrees of freedom value is based on the number of groups rather than the number of observations. This conforms with the bic definition used by the nlmefitsa function.
Both nlmefit and nlmefitsa now store the estimated error parameters in the errorparm field of the output stats structure. The rmse field of the structure now contains the root mean squared residual for all error models; this value is computed on the log scale for the exponential model.
In the previous release, the rmse field was used by nlmefitsa for both mean squared residual and the estimated error parameter. Change your code, if necessary, to address the appropriate field in the stats structure.
As described in nlmefit Support for Error Models, and nlmefitsa changes, nlmefit now calculates different bic values than in previous releases.
The new surrogate splits feature in classregtree allows for better handling of missing values, more accurate estimation of variable importance, and calculation of the predictive measure of association between variables.
The distribution fitting GUI (dfittool) now allows you to export fits to the MATLAB workspace as probability distribution fit objects. For more information, see Modeling Data Using the Distribution Fitting Tool.
If you load a distribution fitting session that was created with previous versions of Statistics Toolbox, you cannot save an existing fit. Fit the distribution again to enable saving.
partialcorr now accepts a new syntax, RHO = partialcorr(X), which returns the sample linear partial correlation coefficients between pairs of variables in X, controlling for the remaining variables in X. For more information, see the function reference page.
quantile now accepts a new syntax, Y = quantile(X,N,...), which returns quantiles at the cumulative probabilities (1:N)/(N+1) where N is a scalar positive integer value.
scatterhist now accepts three parameter name/value pairs that control where and how the histogram plots appear. The new parameter names are NBins, Location, and Direction. For more information, see the function reference page.
bootci has a new output option which returns the bootstrapped statistic computed for each of the NBoot bootstrap replicate samples. For more information, see the function reference page.
New stochastic algorithm for fitting NLME models is more robust with respect to starting values, enables parameter transformations, and relaxes assumption of constant error variance. See nlmefitsa.
New functions for k-Nearest Neighbor (kNN) search efficiently to find the closest points to any query point. For information, see k-Nearest Neighbor Search and Radius Search.
A new option in the perfcurve function computes confidence intervals for classifier performance curves.
dataset.unstack converts a "tall" dataset array to an equivalent dataset array that is in "wide format", by "unstacking" a single variable in the tall dataset array into multiple variables in wide. dataset.stack reverses this manipulation by converting a "wide" dataset array to an equivalent dataset array that is in "tall format", by "stacking up" multiple variables in the wide dataset array into a single variable in tall.
An enhanced dataset.join method provides additional types of join operations:
join can now perform more complicated inner and outer join operations that allow a many-to-many correspondence between dataset arrays A and B, and allow unmatched observations in either A or B.
join can be of Type 'inner', 'leftouter', 'rightouter', 'fullouter', or 'outer' (which is a synonym for 'fullouter'). For an inner join, the dataset array, C, only contains observations corresponding to a combination of key values that occurred in both A and B. For a left (or right) outer join, C also contains observations corresponding to keys in A (or B) that did not match any in B (or A).
join can now return index vectors indicating the correspondence between observations in C and those in A and B.
join now supports using multiple keys.
join now supports an optional parameter for specifying missing key behavior rather than raising an error.
An enhanced dataset.export method now supports exporting directly to Microsoft® Excel® files.
The NaiveBayes classification object is suitable for data sets that contain many predictors or features.
It supports normal, kernel, multinomial, and multivariate multinomial distributions.
New perfcurve function provides graphical method to evaluate classification results.
Includes ROC (receiver operating characteristic) and other curves.
Provides a consistent interface for working with probability distributions.
Option to fit distributions by group.
The new confusionmat function tabulates misclassifications by comparing known and predicted classes of observations.
When reading external text files into a dataset array, dataset has a new 'TreatAsEmpty' parameter for specifying strings to be treated as empty.
In previous versions, dataset used eval to evaluate strings in external text files before writing them into a dataset array. As a result, strings such as '1/1/2008' were treated as numerical expressions with two divides. Now, dataset treats such expressions as strings, and writes a string variable into the dataset array whenever a column in the external file contains a string that does not represent a valid scalar value.
The cross-validation function, crossval, has new options for directly specifying loss functions for mean-squared error or misclassification rate, without having to provide a separate function M-file.
The procrustes function has new options for computing linear transformations without scale or reflection components.
The ksdensity function may give different answers for the case where there are censoring times beyond the last observed value. In this case, ksdensity tries to reduce the bias in its density estimate by folding kernel functions across a folding point so that they do not extend into the area that is completely censored. Two things have changed for this release:
In previous releases the folding point was the last observed value. In this release it is the first censoring time after the last observed value.
The folding procedure is applied not just when the 'function' parameter is 'pdf', but for all 'function' values.
The boxplot function has new options for handling multiple grouping variables and extreme outliers.
The lsline, gline, refline, and refcurve functions now work with scatter plots produced by the scatter function. In previous versions, these functions worked only with scatter plots produced by the plot function.
The following visualization functions now have custom data cursors, displaying information such as observation numbers, group numbers, and the values of related variables:
Changes to boxplot have altered a number of default behaviors:
Box labels are now drawn as text objects rather than tick labels. Any code that customizes the box labels by changing tick marks should now set the tick locations as well as the tick labels.
The function no longer returns a handles array with a fixed number handles, and the order and meaning of the handles now depends on which options are selected. To locate a handle of interest, search for its 'Tag' property using findobj. 'Tag' values for box plot components are listed on the boxplot reference page.
There are now valid handles for outliers, even when boxes have no outliers. In previous releases, the handles array returned by the function had NaN values in place of handles when boxes had no outliers. Now the 'xdata' and 'ydata' for outliers are NaN when there are no outliers.
For small groups, the 'notch' parameter sometimes produces notches that extend outside of the box. In previous releases, the notch was truncated to the extent of the box, which could produce a misleading display. A new value of 'markers' for this parameter avoids the display issue.
As a consequence, the anova1 function, which displays notched box plots for grouped data, may show notches that extend outside the boxes.
The statistics options structure created by statset now includes a Jacobian field to specify whether or not an objective function can return the Jacobian as a second output.
Bootstrap confidence intervals computed by bootci are now more accurate for lumpy data.
The formula for bootci confidence intervals of type 'bca' or 'cper' involves the proportion of bootstrap statistics less than the observed statistic. The formula now takes into account cases where there are many bootstrap statistics exactly equal to the observed statistic.
The new sobolset and haltonset functions produce quasi-random point sets for applications in Monte Carlo integration, space-filling experimental designs, and global optimization. Options allow you to skip, leap over, and scramble the points. The qrandstream function provides corresponding quasi-random number streams for intermittent sampling.
The normspec function now shades regions of a normal density curve that are either inside or outside specification limits.
The statistics options structure created by statset now includes fields for TolTypeFun and TolTypeX, to specify tolerances on objective functions and parameter values, respectively.
The new gmdistribution class represents Gaussian mixture distributions, where random points come from different multivariate normal distributions with certain probabilities. The gmdistribution constructor creates mixture models with specified means, covariances, and mixture proportions, or by fitting a mixture model with a specified number of components to data. Methods for the class include:
The cluster function for hierarchical clustering now accepts a vector of cutoff values, and returns a matrix of cluster assignments, with one column per cutoff value.
The kmeans function now returns a vector of cluster indices of length n, where n is the number of rows in the input data matrix X, even when X contains NaN values. In the past, rows of X with NaN values were ignored, and the vector of cluster indices was correspondingly reduced in size. Now the vector of cluster indices contains NaN values where rows have been ignored, consistent with other toolbox functions.
The kstest function now uses a more accurate method to calculate the p-value for a single-sample Kolmogorov-Smirnov test.
kstest now compares the computed p-value to the desired cutoff, rather than comparing the test statistic to a table of values. Results may differ from those in previous releases, especially for small samples in two-sided tests where an asymptotic formula was used in the past.
A new fitting function, copulafit, has been added to the family of functions that describe dependencies among variables using copulas. The function fits parametric copulas to data, providing a link between models of marginal distributions and models of data correlations.
A number of probability functions now have improved accuracy, especially for extreme parameter values. The functions are:
betainv — More accurate for probabilities in P near 1.
binocdf — More efficient and less likely to run out of memory for large values in X.
binopdf — More accurate when the probabilities in P are on the order of eps.
fcdf — More accurate when the parameter ratios V2./V1 are much less than the values in X.
ncx2cdf — More accurate in some extreme cases that previously returned 0.
poisscdf — More efficient and less likely to run out of memory for large values in X.
tcdf — More accurate when the squares of the values in X are much less than the parameters in V.
tinv — More accurate when the probabilities in P are very close to 0.5 and the outputs are very small in magnitude.
Function-style syntax for paretotails objects has been removed.
The changes to the probability functions listed above may lead to different, but more accurate, outputs than in previous releases.
In previous releases, syntax of the form obj(x) for a paretotails objects obj invoked the cdf method. This syntax now produces a warning. To evaluate the cumulative distribution function, use the syntax cdf(obj,x).
The new corrcov function converts a covariance matrix to the corresponding correlation matrix.
The mvregress function now supports an option to force the estimated covariance matrix to be diagonal.
In previous releases the mvregress function, when using the 'cwls' algorithm, estimated the covariance of coefficients COVB using the estimated, rather than the initial, covariance of the responses SIGMA. The initial SIGMA is now used, and COVB differs to a degree dependent on the difference between the initial and final estimates of SIGMA.
The boxplot function has a new 'compact' plot style suitable for displaying large numbers of groups.
New categorical and dataset arrays are available for organizing and processing statistical data.
Categorical arrays facilitate the use of nominal and ordinal categorical data.
Dataset arrays provide a natural way to encapsulate heterogeneous statistical data and metadata, so that it can be accessed and manipulated using familiar methods analogous to those for numerical matrices.
Categorical and dataset arrays are supported by a variety of new functions for manipulating the encapsulated data.
Categorical arrays are now accepted as input arguments in all Statistics Toolbox functions that make use of grouping variables.
Expanded options are available for linear hypothesis testing.
The new linhyptest function performs linear hypothesis tests on parameters such as regression coefficients. These tests have the form H*b = c for specified values of H and c, where b is a vector of unknown parameters.
The covb output from regstats and the SIGMA output from nlinfit are suitable for use as the covariance matrix input argument required by linhyptest. The following functions have been modified to return a covb output for use with linhyptest: coxphfit, glmfit, mnrfit, robustfit.
The new cholcov function computes a Cholesky-like decomposition of a covariance matrix, even if the matrix is not positive definite. Factors are useful in many of the same ways as Cholesky factors, such as imposing correlation on random number generators.
The classify function for discriminant analysis has been improved.
The function now computes the coefficients of the discriminant functions that define boundaries between classification regions.
The output of the function is now of the same type as the input grouping variable group.
The classify function now returns outputs of different type than it did in the past. If the input argument group is a logical vector, output is now converted to a logical vector. In the past, output was returned as a cell array of 0s and 1s. If group is numeric, the output is now converted to the same type. For example, if group is of type uint8, the output will be of type uint8.
New paretotails objects are available for modeling distributions with an empirical cdf or similar distribution in the center and generalized Pareto distributions in the tails.
The paretotails function converts a data sample to a paretotails object. The objects are useful for generating random samples from a distribution similar to the data, but with tail behavior that is less discrete than the empirical distribution.
Objects from the paretotails class are supported by a variety of new methods for working with the piecewise distribution.
The paretotails class provides function-like behavior, so that p(x) evaluates the cdf of p at values x.
The new mvregresslike function is a utility related to the mvregress function for fitting regression models to multivariate data with missing values. The new function computes the objective (log likelihood) function, and can also compute the estimated covariance matrix for the parameter estimates.
New classregtree objects are available for creating and analyzing classification and regression trees.
The classregtree function fits a classification or regression tree to training data. The objects are useful for predicting response values from new predictors.
Objects from the classregtree class are supported by a variety of new methods for accessing information about the tree.
The classregtree class provides function-like behavior, so that t(X) evaluates the tree t at predictor values in X.
Objects from the classregtree class are intended to be compatible with the structure arrays that were produced in previous versions by the classification and regression tree functions listed above. In particular, classregtree supports dot indexing of the form t.property to obtain properties of the object t. The class also provides function-like behavior through parenthesis indexing, so that t(x) uses the tree t to classify or compute fitted values for predictors x, rather than index into t as a structure array as it did in the past. As a result, cell arrays should now be used to aggregate classregtree objects.
The following demo has been updated:
Selecting a Sample Size — Modified to highlight the new sampsizepwr function
The following functions for hypothesis testing have been added or improved:
jbtest — Replaces the chi-square approximation of the test statistic, which is asymptotic, with a more accurate algorithm that interpolates p-values from a table of quantiles. A new option allows you to run Monte Carlo simulations to compute p-values outside of the table.
lillietest — Uses an improved version of Lilliefors' table of quantiles, covering a wider range of sample sizes and significance levels, with more accurate values. New options allow you to test for exponential and extreme value distributions, as well as normal distributions, and to run Monte Carlo simulations to compute p-values outside of the tables.
runstest — Adds a test for runs up and down to the existing test for runs above or below a specified value.
sampsizepwr — New function to compute the sample size necessary for a test to have a specified power. Options are available for choosing a variety of test types.
If the significance level for a test lies outside the range of tabulated values, [0.001, 0.5], then both jbtest and lillietest now return an error. In previous versions, jbtest returned an approximate p-value and lillietest returned an error outside a smaller range, [0.01, 0.2]. Error messages suggest using the new Monte Carlo option for computing values outside the range of tabulated values.
If the data sample for a test leads to a p-value outside the range of tabulated values, then both jbtest and lillietest now return, with a warning, either the smallest or largest tabulated value. In previous versions, jbtest returned an approximate p-value and lillietest returned NaN.
Support has been added for multinomial regression modeling of discrete multi-category response data, including multinomial logistic regression. The following new functions supplement the regression models in glmfit and glmval by providing for a wider range of response values:
The new mvregress function carries out multivariate regression on data with missing response values. An option allows you to specify how missing data is handled.
coxphfit — A new option allows you to specify the values at which the baseline hazard is computed.
The following new functions consolidate and expand upon existing functions for statistical process control:
capability — Computes a wider range of probabilities and capability indices than the capable function found in previous releases
controlchart — Displays a wider range of control charts than the ewmaplot, schart, and xbarplot functions found in previous releases
controlrules — Supplements the new controlchart function by providing for a wider range of control rules (Western Electric and Nelson)
gagerr — Performs a gage repeatability and reproducibility study on measurements grouped by operator and part
The capability function subsumes the capable function that appeared in previous versions of Statistics Toolbox software, and the controlchart function subsumes the functions ewmaplot, schart, and xbarplot. The older functions remain in the toolbox for backwards compatibility, but they are no longer documented or supported.
Support for nested and continuous factors has been added to the anovan function for N-way analysis of variance.
The following functions have been added to supplement the existing bootstrp function for bootstrap estimation:
The following demos have been added to the toolbox:
Bayesian Analysis for a Logistic Regression Model
Time Series Regression of Airline Passenger Data
The following demo has been updated to demonstrate new features:
Random Number Generation
The new fracfactgen function finds a set of fractional factorial design generators suitable for fitting a specified model.
The following functions for D-optimal designs have been enhanced:
cordexch, daugment, dcovary, rowexch — New options specify the range of values and the number of levels for each factor, exclude factor combinations, treat factors as categorical rather than continuous, control the number of iterations, and repeat the design generation process from random starting points
candexch — New options control the number of iterations and repeat the design generation process from random starting points
candgen — New options specify the range of values and the number of levels for each factor, and treat factors as categorical rather than continuous
x2fx — New option treats factors as categorical rather than continuous
The new dwtest function performs a Durbin-Watson test for autocorrelation in linear regression.
Two new functions have been added to compute multivariate cdfs. These supplement existing functions for pdfs and random number generators for the same distributions.
New functions have been added to the toolbox that allow you to use copulas to model correlated multivariate data and generate random numbers from multivariate distributions.
The following functions generate random numbers from nonstandard distributions using Markov Chain Monte Carlo methods:
The following demos have been added to the toolbox:
Curve Fitting and Distribution Fitting
Fitting a Univariate Distribution Using Cumulative Probabilities
Fitting an Orthogonal Regression Using Principal Components Analysis
Modelling Tail Data with the Generalized Pareto Distribution
Pitfalls in Fitting Nonlinear Models by Transforming to Linearity
Weighted Nonlinear Regression
The following demo has been updated:
Modelling Data with the Generalized Extreme Value Distribution
The new partialcorr function computes the correlation of one set of variables while controlling for a second set of variables.
The grpstats function now computes a wider variety of descriptive statistics for grouped data. Choices include the mean, standard error of the mean, number of elements, group name, standard deviation, variance, confidence interval for the mean, and confidence interval for new observations. The function also supports the computation of user-defined statistics.
The new chi2gof function tests if a sample comes from a specified distribution, against the alternative that it does not come from that distribution, using a chi-square test statistic.
Three functions have been added to test sample variances:
vartest — One-sample chi-square variance test. Tests if a sample comes from a normal distribution with specified variance, against the alternative that it comes from a normal distribution with a different variance.
vartest2 — Two-sample F-test for equal variances. Tests if two independent samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.
vartestn — Bartlett multiple-sample test for equal variances. Tests if multiple samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.
The new ansaribradley function tests if two independent samples come from the same distribution, against the alternative that they come from distributions that have the same median and shape but different variances.
The new runstest function tests if a sequence of values comes in random order, against the alternative that the ordering is not random.
Support has been added for two new distributions:
The Generalized Extreme Value distribution combines the Gumbel, Frechet, and Weibull distributions into a single distribution. It is used to model extreme values in data.
The following distribution functions have been added:
The Generalized Pareto distribution is used to model the tails of a data distribution.
The following distribution functions have been added:
The cophenet function now returns cophenetic distances as well as the cophenetic correlation coefficient.
|Release||Features or Changes with Compatibility Considerations|
|R2013a||Probability distribution enhancements|
|R2011b||Conversion of Error and Warning Message Identifiers|