lassoglm
Lasso or elastic net regularization for generalized linear
model regression
Syntax
B = lassoglm(X,Y)
[B,FitInfo]
= lassoglm(X,Y)
[B,FitInfo]
= lassoglm(X,Y,distr)
[B,FitInfo]
= lassoglm(X,Y,distr,Name,Value)
Description
B = lassoglm(X,Y) returns
penalized maximumlikelihood fitted coefficients for a generalized
linear model of the response Y to the data matrix X. Y are
assumed to have a Gaussian probability distribution.
[B,FitInfo]
= lassoglm(X,Y) returns
a structure containing information about the fits.
[B,FitInfo]
= lassoglm(X,Y,distr) fits
the model using the probability distribution type for Y as
specified in distr.
[B,FitInfo]
= lassoglm(X,Y,distr,Name,Value) fits
regularized generalized linear regressions with additional options
specified by one or more Name,Value pair arguments.
X 
Numeric matrix with n rows and p columns.
Each row represents one observation, and each column represents one
predictor (variable). 
Y 
When distr is not 'binomial', Y is
a numeric vector or categorical array of length n,
where n is the number of rows of X. Y(i) is
the response to row i of X.
When distr is 'binomial', Y is
either a:
Numeric vector of length n, where
each entry represents success (1) or failure (0) Logical vector of length n, where
each entry represents success or failure Categorical array of length n,
where each entry represents success or failure Two column numeric matrix, where the first column
contains the number of successes for each observation, and the second
column contains the total number of trials

distr 
Distributional family for the nonsystematic variation in the
responses, a string. Choices:
'normal' 'binomial' 'poisson' 'gamma' 'inverse gaussian'
By default, lassoglm uses the canonical link function corresponding
to distr. Specify another link function using the 'link' namevalue
pair.

NameValue Pair Arguments
Specify optional commaseparated pairs of Name,Value arguments.
Name is the argument
name and Value is the corresponding
value. Name must appear
inside single quotes (' ').
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN.
'Alpha' 
Scalar value from 0 to 1 (excluding 0)
representing the weight of lasso (L^{1})
versus ridge (L^{2}) optimization. Alpha = 1 represents lasso regression,
and other values represent elastic net optimization. Alpha close
to 0 approaches ridge regression. See Definitions.
Default: 1 
'CV' 
Method lassoglm uses to estimate deviance:
K, a positive integer — lassoglm uses Kfold
cross validation. cvp, a cvpartition object — lassoglm uses
the crossvalidation method expressed in cvp. You
cannot use a 'leaveout' partition with lassoglm. 'resubstitution' — lassoglm uses X and Y to
fit the model and to estimate the deviance, without cross validation.
Default: 'resubstitution' 
'DFmax' 
Maximum number of nonzero coefficients in the model. lassoglm returns
results for Lambda values that satisfy this criterion.
Default: Inf 
'Lambda' 
Vector of nonnegative Lambda values. See Lasso.
If you do not supply Lambda, lassoglm estimates
the largest value of Lambda that gives a nonnull
model. In this case, LambdaRatio gives the ratio
of the smallest to the largest value of the sequence, and NumLambda gives
the length of the vector. If you supply Lambda, lassoglm ignores LambdaRatio and NumLambda.
Default: Geometric sequence of NumLambda values,
the largest just sufficient to produce B = 0 
'LambdaRatio' 
Positive scalar, the ratio of the smallest to the largest Lambda value
when you do not explicitly set Lambda.
If you set LambdaRatio = 0, lassoglm generates
a default sequence of Lambda values, and replaces
the smallest one with 0.
Default: 1e4 
'Link' 
Specify the mapping between the mean µ of
the response and the linear predictor Xb.
Value  Description 
'comploglog'  log( –log((1–µ)))
= Xb 
'identity', default for the distribution 'normal'  µ = Xb 
'log', default for the distribution 'poisson'  log(µ) = Xb 
'logit', default for the distribution 'binomial'  log(µ/(1
– µ)) = Xb 
'loglog'  log( –log(µ))
= Xb 
'probit'  Φ^{–1}(µ)
= Xb, where Φ is the
normal (Gaussian) CDF function 
'reciprocal', default for the distribution 'gamma'  µ^{–1} = Xb 
p (a number), default for the distribution 'inverse
gaussian' (with p = –2)  µ^{p} = Xb 
Cell array of the form {FL FD FI}, containing three function handles, created
using @, that define the link (FL),
the derivative of the link (FD), and the inverse
link (FI). Equivalently, can be a structure of
function handles with field Link containing FL,
field Derivative containing FD,
and field Inverse containing FI.  Userspecified link function (see Custom Link Function) 

'MCReps' 
Positive integer, the number of Monte Carlo repetitions for
cross validation.
If CV is 'resubstitution' or
a cvpartition of type 'resubstitution', MCReps must
be 1. If CV is a cvpartition of
type 'holdout', MCReps must
be greater than 1.
Default: 1 
'NumLambda' 
Positive integer, the number of Lambda values lassoglm uses
when you do not set Lambda. lassoglm can
return fewer than NumLambda fits if the deviance
of the fits drops below a threshold fraction of the null deviance
(deviance of the fit without any predictors X).
Default: 100 
'Offset' 
Numeric vector with the same number of rows as X. lassoglm uses Offset as
an additional predictor variable, but keeps its coefficient value
fixed at 1.0.

'Options' 
Structure that specifies whether to cross validate in parallel,
and specifies the random stream or streams. Create the Options structure
with statset. Option fields:
UseParallel — Set to true to
compute in parallel. Default is false. UseSubstreams — Set to true to
compute in parallel in a reproducible fashion. To compute reproducibly,
set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'.
Default is false. Streams — RandStream object or cell array consisting
of one such object. If you do not specify Streams, lassoglm uses
the default stream.

'PredictorNames' 
Cell array of strings representing names of the predictor variables,
in the order in which they appear in X.
Default: {} 
'RelTol' 
Convergence threshold for the coordinate descent algorithm (see
Friedman, Tibshirani, and Hastie [3]).
The algorithm terminates when successive estimates of the coefficient
vector differ in the L^{2} norm
by a relative amount less than RelTol.
Default: 1e4 
'Standardize' 
Boolean value specifying whether lassoglm scales X before
fitting the models.
Default: true 
'Weights' 
Observation weights, a nonnegative vector of length n,
where n is the number of rows of X.
At least two values must be positive.
Default: 1/n * ones(n,1) 
Output Arguments
B 
Fitted coefficients, a pbyL matrix,
where p is the number of predictors (columns) in X,
and L is the number of Lambda values.

FitInfo 
Structure containing information about the model fits.
Field in FitInfo  Description 
Alpha  Value of Alpha parameter, a scalar. 
Deviance  Deviance of the fitted model for each value of Lambda,
a 1byL vector. If
cross validation was performed, the values for Deviance represent
the estimated expected deviance of the model applied to new data,
as calculated by cross validation. Otherwise, Deviance is
the deviance of the fitted model applied to the data used to perform
the fit. 
DF  Number of nonzero coefficients in B for
each Lambda value, a 1byL vector. 
Intercept  Intercept term β_{0} for
each linear model, a 1byL vector. 
Lambda  Lambda parameters in ascending order, a 1byL vector. 
If you set the CV namevalue pair to cross
validate, the FitInfo structure contains additional
fields.
Field in FitInfo  Description 
IndexMinDeviance  Index of Lambda with value LambdaMinDeviance,
a scalar. 
Index1SE  Index of Lambda with value Lambda1SE,
a scalar. 
LambdaMinDeviance  Lambda value with minimum expected deviance,
as calculated by cross validation, a scalar. 
Lambda1SE  Largest Lambda such that Deviance is
within one standard error of the minimum, a scalar. 
SE  Standard error of Deviance for each Lambda,
as calculated during cross validation, a 1byL vector. 

Examples
expand all
Construct data from a Poisson model, and identify
the important predictors using lassoglm.
Create data with 20 predictors, and Poisson responses
using just three of the predictors, plus a constant.
rng('default') % for reproducibility
X = randn(100,20);
mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1);
y = poissrnd(mu);
Construct a crossvalidated lasso regularization of a
Poisson regression model of the data.
[B FitInfo] = lassoglm(X,y,'poisson','CV',10);
Examine the crossvalidation plot to see the effect of
the Lambda regularization parameter.
lassoPlot(B,FitInfo,'plottype','CV');
The green circle and dashed line locate the Lambda with
minimal crossvalidation error. The blue circle and dashed line locate
the point with minimal crossvalidation error plus one standard deviation.
Find the nonzero model coefficients corresponding to the
two identified points.
minpts = find(B(:,FitInfo.IndexMinDeviance))
minpts =
3
5
6
10
11
15
16
min1pts = find(B(:,FitInfo.Index1SE))
min1pts =
5
10
15
The coefficients from the minimal plus one standard error point
are exactly those coefficients used to create the data.
More About
expand all
A link function f(μ)
maps a distribution with mean μ to a linear
model with data X and coefficient vector b using
the formula
f(μ) = Xb.
Find the formulas for the link functions in the Link namevalue
pair description. Here, "typical" means a link function
that is typically used for the listed distribution.
Distributional Family  Link Function (typical, {default}) 
'normal'  {'identity'} 
'binomial'  'comploglog', 'loglog', 'probit', {'logit'} 
'poisson'  {'log'} 
'gamma'  {'reciprocal'} 
'inverse gaussian'  {2} 
For a nonnegative value of λ, lasso solves
the problem
where
Deviance is the deviance of the model fit to the responses
using intercept β_{0} and
predictor coefficients β. The formula for
Deviance depends on the distr parameter you supply
to lassoglm. Minimizing the λpenalized
deviance is equivalent to maximizing the λpenalized
log likelihood.
N is the number of observations.
λ is a nonnegative regularization
parameter corresponding to one value of Lambda.
Parameters β_{0} and β are
scalar and pvector respectively.
As λ increases, the number of nonzero
components of β decreases.
The lasso problem involves the L^{1} norm
of β, as contrasted with the elastic net
algorithm.
For an α strictly between 0 and 1,
and a nonnegative λ, elastic net solves the
problem
where
Elastic net is the same as lasso when α = 1. For other values of α,
the penalty term P_{α}(β)
interpolates between the L^{1} norm
of β and the squared L^{2} norm
of β. As α shrinks
toward 0, elastic net approaches ridge regression.
References
[1] Tibshirani, R. Regression Shrinkage
and Selection via the Lasso. Journal of the Royal Statistical
Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.
[2] Zou, H. and T. Hastie. Regularization
and Variable Selection via the Elastic Net. Journal of
the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320,
2005.
[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization
Paths for Generalized Linear Models via Coordinate Descent. Journal
of Statistical Software, Vol. 33, No. 1, 2010. http://www.jstatsoft.org/v33/i01
[4] Hastie, T., R. Tibshirani, and J. Friedman. The
Elements of Statistical Learning, 2nd edition. Springer,
New York, 2008.
[5] Dobson, A. J. An Introduction to Generalized
Linear Models, 2nd edition. Chapman & Hall/CRC Press,
New York, 2002.
[6] McCullagh, P., and J. A. Nelder. Generalized
Linear Models, 2nd edition. Chapman & Hall/CRC Press,
New York, 1989.
[7] Collett, D. Modelling Binary Data, 2nd
edition. Chapman & Hall/CRC Press, New York, 2003.
See Also
glmfit  lasso  lassoPlot  ridge