Title: | Maximum Likelihood Estimation and Related Tools |
---|---|
Description: | Functions for Maximum Likelihood (ML) estimation, non-linear optimization, and related tools. It includes a unified way to call different optimizers, and classes and methods to handle the results from the Maximum Likelihood viewpoint. It also includes a number of convenience tools for testing and developing your own models. |
Authors: | Ott Toomet [aut, cre], Arne Henningsen [aut], Spencer Graves [ctb], Yves Croissant [ctb], David Hugh-Jones [ctb], Luca Scrucca [ctb] |
Maintainer: | Ott Toomet <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5-2.1 |
Built: | 2024-11-20 02:49:15 UTC |
Source: | https://github.com/cran/maxLik |
This package contains a set of functions and tools for Maximum Likelihood (ML) estimation. The focus of the package is on non-linear optimization from the ML viewpoint, and it provides several convenience wrappers and tools, like BHHH algorithm, variance-covariance matrix and standard errors.
maxLik package is a set of convenience tools and wrappers
focusing on
Maximum Likelihood (ML) analysis, but it also contains tools for
other optimization tasks.
The package includes a) wrappers for several
existing optimizers (implemented by optim
); b) original
optimizers, including Newton-Raphson and Stochastic Gradient Ascent;
and c) several convenience tools
to use these optimizers from the ML perspective. Examples are BHHH
optimization (maxBHHH
) and utilities that extract
standard errors from the estimates. Other highlights include a unified
interface for all included optimizers, tools to test user-provided analytic
derivatives, and constrained optimization.
A good starting point to learn about the usage of maxLik are the
included vignettes “Introduction: what is maximum likelihood”,
“Maximum likelihood estimation with maxLik” and
“Stochastic Gradient Ascent in maxLik”. Another good
source is
Henningsen & Toomet (2011), an introductory paper to the package.
Use
vignette(package="maxLik")
to see the available vignettes, and
vignette("using-maxlik")
to read the usage vignette.
From the user's perspective, the
central function in the package is maxLik
. In its
simplest form it takes two arguments: the log-likelihood function, and
a vector of initial parameter values (see the example below).
It returns an object of class
‘maxLik’ with convenient methods such as
summary
,
coef
, and
stdEr
. It also supports a plethora
of other arguments, for instance one can supply analytic gradient and
Hessian, select the desired optimizer, and control the optimization in
different ways.
A useful utility functions in the package is
compareDerivatives
that
allows one to compare the analytic and numeric derivatives for debugging
purposes.
Another useful function is condiNumber
for
analyzing multicollinearity problems in the estimated models.
In the interest of providing a unified user interface, all the
optimizers are implemented as maximizers in this package. This includes
the optim
-based methods, such as maxBFGS
and
maxSGA
, the maximizer version of popular Stochastic
Gradient Descent.
Ott Toomet <[email protected]>, Arne Henningsen <[email protected]>, with contributions from Spencer Graves, Yves Croissant and David Hugh-Jones.
Maintainer: Ott Toomet <[email protected]>
Henningsen A, Toomet O (2011). “maxLik: A package for maximum likelihood estimation in R.” Computational Statistics, 26(3), 443-458. doi: doi:10.1007/s00180-010-0217-1.
### estimate mean and variance of normal random vector ## create random numbers where mu=1, sd=2 set.seed(123) x <- rnorm(50, 1, 2 ) ## log likelihood function. ## Note: 'param' is a 2-vector c(mu, sd) llf <- function(param) { mu <- param[1] sd <- param[2] llValue <- dnorm(x, mean=mu, sd=sd, log=TRUE) sum(llValue) } ## Estimate it with mu=0, sd=1 as start values ml <- maxLik(llf, start = c(mu=0, sigma=1) ) print(summary(ml)) ## Estimates close to c(1,2) :-)
### estimate mean and variance of normal random vector ## create random numbers where mu=1, sd=2 set.seed(123) x <- rnorm(50, 1, 2 ) ## log likelihood function. ## Note: 'param' is a 2-vector c(mu, sd) llf <- function(param) { mu <- param[1] sd <- param[2] llValue <- dnorm(x, mean=mu, sd=sd, log=TRUE) sum(llValue) } ## Estimate it with mu=0, sd=1 as start values ml <- maxLik(llf, start = c(mu=0, sigma=1) ) print(summary(ml)) ## Estimates close to c(1,2) :-)
Return a logical vector, indicating which parameters were free under
maximization, as opposed to the fixed parameters that are treated as
constants. See argument “fixed” for maxNR
.
activePar(x, ...) ## Default S3 method: activePar(x, ...)
activePar(x, ...) ## Default S3 method: activePar(x, ...)
x |
object, created by a maximization routine, such as
|
... |
further arguments for methods |
Several optimization routines allow the user to fix some parameter values (or do it automatically in some cases). For gradient or Hessian based inference one has to know which parameters carry optimization-related information.
A logical vector, indicating whether the parameters were free to change during optimization algorithm.
Ott Toomet
## a two-dimensional exponential hat f <- function(a) exp(-a[1]^2 - a[2]^2) ## maximize wrt. both parameters free <- maxNR(f, start=1:2) summary(free) # results should be close to (0,0) activePar(free) ## keep the first parameter constant cons <- maxNR(f, start=1:2, fixed=c(TRUE,FALSE)) summary(cons) # result should be around (1,0) activePar(cons)
## a two-dimensional exponential hat f <- function(a) exp(-a[1]^2 - a[2]^2) ## maximize wrt. both parameters free <- maxNR(f, start=1:2) summary(free) # results should be close to (0,0) activePar(free) ## keep the first parameter constant cons <- maxNR(f, start=1:2, fixed=c(TRUE,FALSE)) summary(cons) # result should be around (1,0) activePar(cons)
These are methods for the maxLik related objects. See also the documentation for the corresponding generic functions
## S3 method for class 'maxLik' AIC(object, ..., k=2) ## S3 method for class 'maxim' coef(object, ...) ## S3 method for class 'maxLik' coef(object, ...) ## S3 method for class 'maxLik' stdEr(x, eigentol=1e-12, ...)
## S3 method for class 'maxLik' AIC(object, ..., k=2) ## S3 method for class 'maxim' coef(object, ...) ## S3 method for class 'maxLik' coef(object, ...) ## S3 method for class 'maxLik' stdEr(x, eigentol=1e-12, ...)
object |
a ‘maxLik’ object ( |
k |
numeric, the penalty per parameter to be used; the default ‘k = 2’ is the classical AIC. |
x |
a ‘maxLik’ object |
eigentol |
The standard errors are only calculated if the ratio of the smallest and largest eigenvalue of the Hessian matrix is less than “eigentol”. Otherwise the Hessian is treated as singular. |
... |
other arguments for methods |
calculates Akaike's Information Criterion (and other information criteria).
extracts the estimated parameters (model's coefficients).
extracts standard errors (using the Hessian matrix).
## estimate mean and variance of normal random vector set.seed(123) x <- rnorm(50, 1, 2) ## log likelihood function. ## Note: 'param' is a vector llf <- function( param ) { mu <- param[ 1 ] sigma <- param[ 2 ] return(sum(dnorm(x, mean=mu, sd=sigma, log=TRUE))) } ## Estimate it. Take standard normal as start values ml <- maxLik(llf, start = c(mu=0, sigma=1) ) coef(ml) stdEr(ml) AIC(ml)
## estimate mean and variance of normal random vector set.seed(123) x <- rnorm(50, 1, 2) ## log likelihood function. ## Note: 'param' is a vector llf <- function( param ) { mu <- param[ 1 ] sigma <- param[ 2 ] return(sum(dnorm(x, mean=mu, sd=sigma, log=TRUE))) } ## Estimate it. Take standard normal as start values ml <- maxLik(llf, start = c(mu=0, sigma=1) ) coef(ml) stdEr(ml) AIC(ml)
Extracting an estimator for the ‘bread’ of the sandwich estimator,
see bread
.
## S3 method for class 'maxLik' bread( x, ... )
## S3 method for class 'maxLik' bread( x, ... )
x |
an object of class |
... |
further arguments (currently ignored). |
Matrix, the inverse of the expectation of the second derivative (Hessian matrix) of the log-likelihood function with respect to the parameters. In case of the simple Maximum Likelihood, it is equal to the variance covariance matrix of the parameters, multiplied by the number of observations.
The sandwich package is required for this function.
This method works only if the observaton-specific gradient information
was available for the estimation. This is the case if the
observation-specific gradient was supplied (see the grad
argument for maxLik
), or the log-likelihood function
returns a vector of observation-specific values.
Arne Henningsen
## ML estimation of exponential duration model: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1 ) # Extract the "bread" library( sandwich ) bread( a ) all.equal( bread( a ), vcov( a ) * nObs( a ) )
## ML estimation of exponential duration model: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1 ) # Extract the "bread" library( sandwich ) bread( a ) all.equal( bread( a ), vcov( a ) * nObs( a ) )
This function compares analytic and numerical derivative and prints related diagnostics information. It is intended for testing and debugging code for analytic derivatives for maximization algorithms.
compareDerivatives(f, grad, hess=NULL, t0, eps=1e-6, printLevel=1, print=printLevel > 0, max.rows=getOption("max.rows", 20), max.cols=getOption("max.cols", 7), ...)
compareDerivatives(f, grad, hess=NULL, t0, eps=1e-6, printLevel=1, print=printLevel > 0, max.rows=getOption("max.rows", 20), max.cols=getOption("max.cols", 7), ...)
f |
function to be differentiated. The parameter (vector) of interest must be the first argument. The function may return a vector, in that case the derivative will be a matrix. |
grad |
analytic gradient. This may be either a function,
returning the analytic gradient, or a numeric vector, the pre-computed
gradient. The function must use the same set of
parameters as |
hess |
function returning the analytic hessian. If present, hessian matrices are compared too. Only appropriate for scalar-valued functions. |
t0 |
numeric vector, parameter at which the derivatives are
compared. The derivative is taken with respect to this vector. both
|
eps |
numeric. Step size for numeric differentiation. Central derivative is used. |
printLevel |
numeric: a positive number prints summary of the comparison. 0 does not do any printing, only returns the comparison results (invisibly). |
print |
deprecated (for backward compatibility only). |
max.rows |
maximum number of matrix rows to be printed. |
max.cols |
maximum number of columns to be printed. |
... |
further arguments to |
Analytic derivatives (and Hessian) substantially improve the
estimation speed and reliability. However, these are
typically hard to program. This utility compares the programmed result
and the (internally calculated) numeric derivative.
For every component of f
, it prints the parameter value, analytic and
numeric derivative, and their relative difference
If and
, then rel.diff is also set to
0. If
analytic derivatives are correct and the function is sufficiently
smooth, expect the relative differences to be less than
.
A list with following components:
t0 |
the input argument |
f.t0 |
f(t0) |
compareGrad |
a list with components |
maxRelDiffGrad |
max(abs(rel.diff)) |
If hess
is also provided, the following optional components
are also present:
compareHessian |
a list with components |
maxRelDiffHess |
max(abs(rel.diff)) for the Hessian |
Ott Toomet [email protected] and Spencer Graves
## A simple example with sin(x)' = cos(x) f <- function(x) c(sin=sin(x)) Dsin <- compareDerivatives(f, cos, t0=c(angle=1)) ## ## Example of normal log-likelihood. Two-parameter ## function. ## x <- rnorm(100, 1, 2) # generate rnorm x l <- function(b) sum(dnorm(x, mean=b[1], sd=b[2], log=TRUE)) gradl <- function(b) { c(mu=sum(x - b[1])/b[2]^2, sigma=sum((x - b[1])^2/b[2]^3 - 1/b[2])) } gradl. <- compareDerivatives(l, gradl, t0=c(mu=1,sigma=2)) ## ## An example with f returning a vector, t0 = a scalar ## trig <- function(x)c(sin=sin(x), cos=cos(x)) Dtrig <- function(x)c(sin=cos(x), cos=-sin(x)) Dtrig. <- compareDerivatives(trig, Dtrig, t0=1)
## A simple example with sin(x)' = cos(x) f <- function(x) c(sin=sin(x)) Dsin <- compareDerivatives(f, cos, t0=c(angle=1)) ## ## Example of normal log-likelihood. Two-parameter ## function. ## x <- rnorm(100, 1, 2) # generate rnorm x l <- function(b) sum(dnorm(x, mean=b[1], sd=b[2], log=TRUE)) gradl <- function(b) { c(mu=sum(x - b[1])/b[2]^2, sigma=sum((x - b[1])^2/b[2]^3 - 1/b[2])) } gradl. <- compareDerivatives(l, gradl, t0=c(mu=1,sigma=2)) ## ## An example with f returning a vector, t0 = a scalar ## trig <- function(x)c(sin=sin(x), cos=cos(x)) Dtrig <- function(x)c(sin=cos(x), cos=-sin(x)) Dtrig. <- compareDerivatives(trig, Dtrig, t0=1)
This function prints the condition number of a matrix while adding
columns one-by-one. This is useful for testing multicollinearity and
other numerical problems. It is a generic function with a default
method, and a method for maxLik
objects.
condiNumber(x, ...) ## Default S3 method: condiNumber(x, exact = FALSE, norm = FALSE, printLevel=print.level, print.level=1, digits = getOption( "digits" ), ... ) ## S3 method for class 'maxLik' condiNumber(x, ...)
condiNumber(x, ...) ## Default S3 method: condiNumber(x, exact = FALSE, norm = FALSE, printLevel=print.level, print.level=1, digits = getOption( "digits" ), ... ) ## S3 method for class 'maxLik' condiNumber(x, ...)
x |
numeric matrix, condition numbers of which are to be printed |
exact |
logical, should condition numbers be exact or
approximations (see |
norm |
logical, whether the columns should be normalised to have unit norm |
printLevel |
numeric, positive value will output the numbers during the calculations. Useful for interactive work. |
print.level |
same as ‘printLevel’, for backward compatibility |
digits |
minimal number of significant digits to print
(only relevant if argument |
... |
Further arguments to |
Statistical model often fail because of a high correlation between the explanatory variables in the linear index (multicollinearity) or because the evaluated maximum of a non-linear model is virtually flat. In both cases, the (near) singularity of the related matrices may help to understand the problem.
condiNumber
inspects the matrices column-by-column and
indicates which variables lead to a jump in the condition
number (cause singularity).
If the matrix column name does not immediately indicate the
problem, one may run an OLS model by estimating this column
using all the previous columns as explanatory variables. Those
columns that explain almost all the variation in the current one will
have very high
-values.
Invisible vector of condition numbers by column. If the start values
for maxLik
are named, the condition numbers are named
accordingly.
Ott Toomet
Greene, W. (2012): Econometrics Analysis, 7th edition, p. 130.
set.seed(0) ## generate a simple nearly multicollinear dataset x1 <- runif(100) x2 <- runif(100) x3 <- x1 + x2 + 0.000001*runif(100) # this is virtually equal to x1 + x2 x4 <- runif(100) y <- x1 + x2 + x3 + x4 + rnorm(100) m <- lm(y ~ -1 + x1 + x2 + x3 + x4) print(summary(m)) # note the outlandish estimates and standard errors # while R^2 is 0.88. This suggests multicollinearity condiNumber(model.matrix(m)) # note the value 'explodes' at x3 ## we may test the results further: print(summary(lm(x3 ~ -1 + x1 + x2))) # Note the extremely high t-values and R^2: x3 is (almost) completely # explained by x1 and x2
set.seed(0) ## generate a simple nearly multicollinear dataset x1 <- runif(100) x2 <- runif(100) x3 <- x1 + x2 + 0.000001*runif(100) # this is virtually equal to x1 + x2 x4 <- runif(100) y <- x1 + x2 + x3 + x4 + rnorm(100) m <- lm(y ~ -1 + x1 + x2 + x3 + x4) print(summary(m)) # note the outlandish estimates and standard errors # while R^2 is 0.88. This suggests multicollinearity condiNumber(model.matrix(m)) # note the value 'explodes' at x3 ## we may test the results further: print(summary(lm(x3 ~ -1 + x1 + x2))) # Note the extremely high t-values and R^2: x3 is (almost) completely # explained by x1 and x2
Wald confidence intervals for Maximum Likelihood Estimates
## S3 method for class 'maxLik' confint(object, parm, level=0.95, ...)
## S3 method for class 'maxLik' confint(object, parm, level=0.95, ...)
object |
object of class “maxLik” returned by |
parm |
the name of parameters to compute the confidence intervals. If omitted, confidence intervals for all parameters are computed. |
level |
the level of confidence interval |
... |
additional arguments to be passed to the other methods |
A matrix of lower and upper confidence interval limits (in the first and second column respectively). The matrix rows are labeled by the parameter names (if any) and columns by the corresponding distribution quantiles.
Luca Scrucca
confint
for the generic confint
function,
stdEr
for computing standard errors and
summary
for summary output that includes statistical
significance information.
## compute MLE parameters of normal random sample x <- rnorm(100) loglik <- function(theta) { dnorm(x, mean=theta[1], sd=theta[2], log=TRUE) } m <- maxLik(loglik, start=c(mu=0, sd=1)) summary(m) confint(m) confint(m, "mu", level=0.1)
## compute MLE parameters of normal random sample x <- rnorm(100) loglik <- function(theta) { dnorm(x, mean=theta[1], sd=theta[2], log=TRUE) } m <- maxLik(loglik, start=c(mu=0, sd=1)) summary(m) confint(m) confint(m, "mu", level=0.1)
Combine variable parameters with with fixed parameters and pass to
fnFull
. Useful for optimizing over a subset of parameters
without writing a separate function. Values are combined by name if
available. Otherwise, xFull
is constructed by position (the
default).
fnSubset(x, fnFull, xFixed, xFull=c(x, xFixed), ...)
fnSubset(x, fnFull, xFixed, xFull=c(x, xFixed), ...)
x |
Variable parameters to be passed to |
fnFull |
Function whose first argument has length = length(xFull). |
xFixed |
Parameter values to be combined with |
xFull |
Prototype initial argument for |
... |
Optional arguments passed to |
This function first confirms that
length(x) + length(xFixed) == length(xFull)
.
Next,
If xFull
has names, match at least xFixed
by
name.
Else xFull = c(x, xFixes)
, the default.
Finally, call fnFull(xFull, ...)
.
value returned by fnFull
Spencer Graves
## ## Example with 'optim' ## fn <- function(x) (x[2]-2*x[1])^2 # note: true minimum is 0 on line 2*x[1] == x[2] fullEst <- optim(par=c(1,1), method="BFGS", fn=fn) fullEst$par # par = c(0.6, 1.2) at minimum (not convex) # Fix the last component to 4 est4 <- optim(par=1, fn=fnSubset, method="BFGS", fnFull=fn, xFixed=4) est4$par # now there is a unique minimun x[1] = 2 # Fix the first component fnSubset(x=1, fnFull=fn, xFixed=c(a=4), xFull=c(a=1, b=2)) # After substitution: xFull = c(a=4, b=1), # so fn = (1 - 2*4)^2 = (-7)^2 = 49 est4. <- optim(par=1, fn=fnSubset, method="BFGS", fnFull=fn, xFixed=c(a=4), xFull=c(a=1, b=2)) est4.$par # At optimum: xFull=c(a=4, b=8), # so fn = (8 - 2*4)^2 = 0 ## ## Example with 'maxLik' ## fn2max <- function(x) -(x[2]-2*x[1])^2 # -> need to have a maximum max4 <- maxLik(fnSubset, start=1, fnFull=fn2max, xFixed=4) summary(max4) # Similar result using fixed parameters in maxNR, called by maxLik max4. <- maxLik(fn2max, start=c(1, 4), fixed=2) summary(max4.)
## ## Example with 'optim' ## fn <- function(x) (x[2]-2*x[1])^2 # note: true minimum is 0 on line 2*x[1] == x[2] fullEst <- optim(par=c(1,1), method="BFGS", fn=fn) fullEst$par # par = c(0.6, 1.2) at minimum (not convex) # Fix the last component to 4 est4 <- optim(par=1, fn=fnSubset, method="BFGS", fnFull=fn, xFixed=4) est4$par # now there is a unique minimun x[1] = 2 # Fix the first component fnSubset(x=1, fnFull=fn, xFixed=c(a=4), xFull=c(a=1, b=2)) # After substitution: xFull = c(a=4, b=1), # so fn = (1 - 2*4)^2 = (-7)^2 = 49 est4. <- optim(par=1, fn=fnSubset, method="BFGS", fnFull=fn, xFixed=c(a=4), xFull=c(a=1, b=2)) est4.$par # At optimum: xFull=c(a=4, b=8), # so fn = (8 - 2*4)^2 = 0 ## ## Example with 'maxLik' ## fn2max <- function(x) -(x[2]-2*x[1])^2 # -> need to have a maximum max4 <- maxLik(fnSubset, start=1, fnFull=fn2max, xFixed=4) summary(max4) # Similar result using fixed parameters in maxNR, called by maxLik max4. <- maxLik(fn2max, start=c(1, 4), fixed=2) summary(max4.)
Extract the gradients of the log-likelihood function evaluated
at each observation (‘Empirical Estimating Function’,
see estfun
).
## S3 method for class 'maxLik' estfun(x, ...) ## S3 method for class 'maxim' gradient(x, ...)
## S3 method for class 'maxLik' estfun(x, ...) ## S3 method for class 'maxim' gradient(x, ...)
x |
an object inheriting from class |
... |
further arguments (currently ignored). |
gradient |
vector, objective function gradient at estimated maximum (or the last calculated value if the estimation did not converge.) |
estfun |
matrix, observation-wise log-likelihood gradients at the estimated parameter value evaluated at each observation. Observations in rows, parameters in columns. |
The sandwich package must be loaded in order to use estfun
.
estfun
only works if the observaton-specific gradient information
was available for the estimation. This is the case of the
observation-specific gradient was supplied (see the grad
argument for maxLik
), or the log-likelihood function
returns a vector of observation-specific values.
Arne Henningsen, Ott Toomet
## ML estimation of exponential duration model: t <- rexp(10, 2) loglik <- function(theta) log(theta) - theta*t ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1 ) gradient(a) # Extract the gradients evaluated at each observation library( sandwich ) estfun( a ) ## Estimate with analytic gradient. ## Note: it returns a vector gradlik <- function(theta) 1/theta - t b <- maxLik(loglik, gradlik, start=1) gradient(a) estfun( b )
## ML estimation of exponential duration model: t <- rexp(10, 2) loglik <- function(theta) log(theta) - theta*t ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1 ) gradient(a) # Extract the gradients evaluated at each observation library( sandwich ) estfun( a ) ## Estimate with analytic gradient. ## Note: it returns a vector gradlik <- function(theta) 1/theta - t b <- maxLik(loglik, gradlik, start=1) gradient(a) estfun( b )
This function extracts the Hessian of the objective function at optimum. The Hessian information should be supplied by the underlying optimization algorithm, possibly by an approximation.
hessian(x, ...) ## Default S3 method: hessian(x, ...)
hessian(x, ...) ## Default S3 method: hessian(x, ...)
x |
an optimization result of class ‘maxim’ or ‘maxLik’ |
... |
other arguments for methods |
A numeric matrix, the Hessian of the model at the estimated parameter
values. If the maximum is flat, the Hessian
is singular. In that case you may want to invert only the
non-singular part of the matrix. You may also want to fix certain
parameters (see activePar
).
Ott Toomet
maxLik
, activePar
, condiNumber
# log-likelihood for normal density # a[1] - mean # a[2] - standard deviation ll <- function(a) sum(-log(a[2]) - (x - a[1])^2/(2*a[2]^2)) x <- rnorm(100) # sample from standard normal ml <- maxLik(ll, start=c(1,1)) # ignore eventual warnings "NaNs produced in: log(x)" summary(ml) # result should be close to c(0,1) hessian(ml) # How the Hessian looks like sqrt(-solve(hessian(ml))) # Note: standard deviations are on the diagonal # # Now run the same example while fixing a[2] = 1 mlf <- maxLik(ll, start=c(1,1), activePar=c(TRUE, FALSE)) summary(mlf) # first parameter close to 0, the second exactly 1.0 hessian(mlf) # Note that now NA-s are in place of passive # parameters. # now invert only the free parameter part of the Hessian sqrt(-solve(hessian(mlf)[activePar(mlf), activePar(mlf)])) # gives the standard deviation for the mean
# log-likelihood for normal density # a[1] - mean # a[2] - standard deviation ll <- function(a) sum(-log(a[2]) - (x - a[1])^2/(2*a[2]^2)) x <- rnorm(100) # sample from standard normal ml <- maxLik(ll, start=c(1,1)) # ignore eventual warnings "NaNs produced in: log(x)" summary(ml) # result should be close to c(0,1) hessian(ml) # How the Hessian looks like sqrt(-solve(hessian(ml))) # Note: standard deviations are on the diagonal # # Now run the same example while fixing a[2] = 1 mlf <- maxLik(ll, start=c(1,1), activePar=c(TRUE, FALSE)) summary(mlf) # first parameter close to 0, the second exactly 1.0 hessian(mlf) # Note that now NA-s are in place of passive # parameters. # now invert only the free parameter part of the Hessian sqrt(-solve(hessian(mlf)[activePar(mlf), activePar(mlf)])) # gives the standard deviation for the mean
Return the log likelihood value of objects of class maxLik
and summary.maxLik
.
## S3 method for class 'maxLik' logLik( object, ... ) ## S3 method for class 'summary.maxLik' logLik( object, ... )
## S3 method for class 'maxLik' logLik( object, ... ) ## S3 method for class 'summary.maxLik' logLik( object, ... )
object |
object of class |
... |
additional arguments to methods |
A scalar numeric, log likelihood of the estimated model. It has attribute “df”, number of free parameters.
Arne Henningsen, Ott Toomet
## ML estimation of exponential duration model: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1) ## print log likelihood value logLik( a ) ## print log likelihood value of summary object b <- summary( a ) logLik( b )
## ML estimation of exponential duration model: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1) ## print log likelihood value logLik( a ) ## print log likelihood value of summary object b <- summary( a ) logLik( b )
These functions are wrappers for optim
, adding
constrained optimization and fixed parameters.
maxBFGS(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ... ) maxCG(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ...) maxSANN(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ... ) maxNM(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ...)
maxBFGS(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ... ) maxCG(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ...) maxSANN(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ... ) maxNM(fn, grad=NULL, hess=NULL, start, fixed=NULL, control=NULL, constraints=NULL, finalHessian=TRUE, parscale=rep(1, length=length(start)), ...)
fn |
function to be maximised. Must have the parameter vector as
the first argument. In order to use numeric gradient
and BHHH method, |
grad |
gradient of |
hess |
Hessian of |
start |
initial values for the parameters. If start values are named, those names are also carried over to the results. |
fixed |
parameters to be treated as constants at their
|
control |
list of control parameters or a ‘MaxControl’ object. If it is a list, the default values are used for the parameters that are left unspecified by the user. These functions accept the following parameters:
|
constraints |
either |
finalHessian |
how (and if) to calculate the final Hessian. Either
|
parscale |
A vector of scaling values for the parameters.
Optimization is performed on 'par/parscale' and these should
be comparable in the sense that a unit change in any element
produces about a unit change in the scaled value. (see
|
... |
further arguments for |
In order to provide a consistent interface, all these functions also
accept arguments that other optimizers use. For instance,
maxNM
accepts the ‘grad’ argument despite being a
gradient-less method.
The ‘state’ (or ‘seed’) of R's random number generator
is saved at the beginning of the maxSANN
function
and restored at the end of this function
so this function does not affect the generation of random numbers
although the random seed is set to argument random.seed
and the ‘SANN’ algorithm uses random numbers.
object of class "maxim". Data can be extracted through the following functions:
maxValue |
|
coef |
estimated parameter value. |
gradient |
vector, last calculated gradient value. Should be close to 0 in case of normal convergence. |
estfun |
matrix of gradients at parameter value |
hessian |
Hessian at the maximum (the last calculated value if not converged). |
returnCode |
integer. Success code, 0 is success (see
|
returnMessage |
a short message, describing the return code. |
activePar |
logical vector, which parameters are optimized over.
Contains only |
nIter |
number of iterations. Two-element integer vector giving the number of
calls to |
maximType |
character string, type of maximization. |
maxControl |
the optimization control parameters in the form of a
|
The following components can only be extracted directly (with \$
):
constraints |
A list, describing the constrained optimization
(
|
Ott Toomet, Arne Henningsen
Nelder, J. A. & Mead, R. A, Simplex Method for Function Minimization, The Computer Journal, 1965, 7, 308-313
optim
, nlm
, maxNR
,
maxBHHH
, maxBFGSR
for a
maxNR
-based BFGS implementation.
# Maximum Likelihood estimation of Poissonian distribution n <- rpois(100, 3) loglik <- function(l) n*log(l) - l - lfactorial(n) # we use numeric gradient summary(maxBFGS(loglik, start=1)) # you would probably prefer mean(n) instead of that ;-) # Note also that maxLik is better suited for Maximum Likelihood ### ### Now an example of constrained optimization ### f <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## you may want to use exp(- theta %*% theta) instead } ## use constraints: x + y >= 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNM(f, start=c(1,1), constraints=list(ineqA=A, ineqB=B), control=list(printLevel=1)) print(summary(res))
# Maximum Likelihood estimation of Poissonian distribution n <- rpois(100, 3) loglik <- function(l) n*log(l) - l - lfactorial(n) # we use numeric gradient summary(maxBFGS(loglik, start=1)) # you would probably prefer mean(n) instead of that ;-) # Note also that maxLik is better suited for Maximum Likelihood ### ### Now an example of constrained optimization ### f <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## you may want to use exp(- theta %*% theta) instead } ## use constraints: x + y >= 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNM(f, start=c(1,1), constraints=list(ineqA=A, ineqB=B), control=list(printLevel=1)) print(summary(res))
"MaxControl"
This is the structure that holds the optimization control options.
The corresponding constructors take
the parameters, perform consistency checks, and return the
control structure. Alternatively, it overwrites the supplied
parameters in an existing MaxControl
structure. There is also
a method to extract the control structure from the estimated
‘maxim’-objects.
The default values and definition of the slots:
1e-8, stopping condition
for maxNR
and related optimizers.
Stop if the absolute difference
between successive iterations is less than tol
, returns
code 2.
sqrt(.Machine$double.eps), relative convergence
tolerance (used by maxNR
related optimizers, and
optim
-based optimizers.
The algorithm stops if
it iteration increases the value by less than a factor of
reltol*(abs(val) + reltol)
.
Returns code 2.
1e-6, stopping condition
for maxNR
and related optimizers.
Stops if norm of the gradient is
less than gradtol
, returns code 1.
1e-10, stopping/error condition
for maxNR
and related optimizers.
If qac == "stephalving"
and the quadratic
approximation leads to a worse, instead of a better value, or to
NA
, the step length
is halved and a new attempt is made. If necessary, this procedure is repeated
until step < steptol
, thereafter code 3 is returned.
1e-6, (for maxNR
related
optimizers)
controls whether Hessian is treated as negative
definite. If the
largest of the eigenvalues of the Hessian is larger than
-lambdatol
(Hessian is not negative definite),
a suitable diagonal matrix is subtracted from the
Hessian (quadratic hill-climbing) in order to enforce negative
definiteness.
"stephalving", character, Qadratic Approximation
Correction for maxNR
related optimizers. When the new
guess is worse than the initial one, program attempts to correct it:
"stephalving"
decreases the
step but keeps the direction.
"marquardt"
uses
Marquardt (1963) method by decreasing the step length while also
moving closer to the pure gradient direction. It may be faster and
more robust choice in areas where quadratic approximation behaves poorly.
1e-10, QR-decomposition tolerance
for Hessian inversion in maxNR
related optimizers.
0.01, a positive numeric, initial correction term
for Marquardt (1963) correction in
maxNR
-related optimizers
2, how much the Marquardt (1963)
correction is decreased/increased at
successful/unsuccesful step
for maxNR
related optimizers
1e12, maximum allowed correction term
for maxNR
related optimizers.
If exceeded, the
algorithm exits with return code 3.
1, Nelder-Mead simplex method reflection factor (see Nelder & Mead, 1965)
0.5, Nelder-Mead contraction factor
2, Nelder-Mead expansion factor
NULL
or a function for "SANN"
algorithm
to generate a new candidate point;
if NULL
, Gaussian Markov kernel is used
(see argument gr
of optim
).
10, starting temperature
for the “SANN” cooling schedule. See optim
.
10, number of function evaluations at each temperature for
the “SANN” optimizer. See optim
.
123, integer to seed random numbers to
ensure replicability of “SANN” optimization and preserve
R
random numbers. Use
options like SANN_randomSeed=Sys.time()
or
SANN_randomeSeed=sample(1000,1)
if you want stochastic results.
General options for stochastic gradient methods:
0.1, learning rate, numeric
NULL
, batch size for Stochastic Gradient Ascent. A
positive integer, or NULL
for full-batch gradent ascent.
NULL
, gradient clipping threshold. This is
the max allowed squared Euclidean norm of the gradient. If the
actual norm of the gradient exceeds (square root of) this
threshold, the gradient will be scaled back accordingly while
preserving its direction. NULL
means no clipping.
NULL
, or integer. Stopping condition: if
the objective function is worse than its largest value so far this
many times, the algorithm stops, and returns not the last
parameter value but the one that
gave the best results so far. This is mostly useful if gradient
is computed on training data and the
objective function on validation data.
1L, integer. After how many epochs to check the patience value. 1 means to check (and hence to compute the objective function) at each epoch.
Options for SGA:
0, numeric momentum parameter for SGA. Must lie
in interval .
Options for Adam:
0.9, numeric in , the first moment momentum
0.999, numeric in , the second moment momentum
General options:
150, stopping condition (the default differs for
different methods). Stop if more than iterlim
iterations performed. Note that ‘iteration’ may mean
different things for different optimizers.
20, maximum number of matrix rows to be printed when requesting verbosity in the optimizers.
7, maximum number of columns to be printed. This also applies to vectors that are printed horizontally.
0, the level of verbosity. Larger values print
more information. Result depends on the optimizer. Form
print.level
is also accepted by the methods for
compatibility.
FALSE
, whether to store and return the
parameter
values at each epoch. If TRUE
, the stored values
can be retrieved with storedParameters
-method. The
parameters are stored as a matrix with rows corresponding to the
epochs and columns to the parameter components.
FALSE
, whether to store and return the objective
function values at each epoch. If TRUE
, the stored values
can be retrieved with storedValues
-method.
(...)
creates a “MaxControl” object. The
arguments must be in the form option1 = value1, option2 =
value2, ...
. The options should be slot names, but the method
also supports selected other parameter forms for compatibility reasons
e.g. “print.level” instead of “printLevel”.
In case there are more than one option with
similar name, the last one overwrites the previous values. This
allows the user to override default parameters in the control
list. See example in maxLik-package.
(x = "MaxControl", ...)
overwrites parameters
of an existing “MaxControl” object. The ‘...’
argument must be in the form option1 = value1, option2 =
value2, ...
. In case there are more than one option with
similar name, only the last one is taken into account. This
allows the user to override default parameters in the control
list. See example in maxLik-package.
(x = "maxim")
extracts “MaxControl”
structure from an estimated model
shows the parameter values
Typically, the control options are supplied in the form of a list, in which
case the corresponding default values are overwritten by the
user-specified ones. However, one may also create the control
structure by maxControl(opt1=value1, opt2=value2, ...)
and
supply such value directly to the optimizer. In this case the
optimization routine takes all the values from the control object.
Several control parameters can also be supplied directly to the optimization routines.
Ott Toomet
Nelder, J. A. & Mead, R. A (1965) Simplex Method for Function Minimization The Computer Journal 7, 308–313
Marquardt, D. W. (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters Journal of the Society for Industrial and Applied Mathematics 11, 431–441
library(maxLik) ## Create a 'maxControl' object: maxControl(tol=1e-4, sann_tmax=7, printLevel=2) ## Optimize quadratic form t(D) %*% W %*% D with p.d. weight matrix, ## s.t. constraints sum(D) = 1 quadForm <- function(D) { return(-t(D) %*% W %*% D) } eps <- 0.1 W <- diag(3) + matrix(runif(9), 3, 3)*eps D <- rep(1/3, 3) # initial values ## create control object and use it for optimization co <- maxControl(printLevel=2, qac="marquardt", marquardt_lambda0=1) res <- maxNR(quadForm, start=D, control=co) print(summary(res)) ## Now perform the same with no trace information co <- maxControl(co, printLevel=0) res <- maxNR(quadForm, start=D, control=co) # no tracing information print(summary(res)) # should be the same as above maxControl(res) # shows the control structure
library(maxLik) ## Create a 'maxControl' object: maxControl(tol=1e-4, sann_tmax=7, printLevel=2) ## Optimize quadratic form t(D) %*% W %*% D with p.d. weight matrix, ## s.t. constraints sum(D) = 1 quadForm <- function(D) { return(-t(D) %*% W %*% D) } eps <- 0.1 W <- diag(3) + matrix(runif(9), 3, 3)*eps D <- rep(1/3, 3) # initial values ## create control object and use it for optimization co <- maxControl(printLevel=2, qac="marquardt", marquardt_lambda0=1) res <- maxNR(quadForm, start=D, control=co) print(summary(res)) ## Now perform the same with no trace information co <- maxControl(co, printLevel=0) res <- maxNR(quadForm, start=D, control=co) # no tracing information print(summary(res)) # should be the same as above maxControl(res) # shows the control structure
Returns the type of optimization as supplied by the optimisation routine.
maximType(x)
maximType(x)
x |
object of class 'maxim' or another object which involves numerical optimisation. |
A text message, describing the involved optimisation algorithm
Ott Toomet
## maximize two-dimensional exponential hat. True maximum c(2,1): f <- function(a) exp(-(a[1] - 2)^2 - (a[2] - 1)^2) m <- maxNR(f, start=c(0,0)) coef(m) maximType(m) ## Now use BFGS maximisation. m <- maxBFGS(f, start=c(0,0)) maximType(m)
## maximize two-dimensional exponential hat. True maximum c(2,1): f <- function(a) exp(-(a[1] - 2)^2 - (a[2] - 1)^2) m <- maxNR(f, start=c(0,0)) coef(m) maximType(m) ## Now use BFGS maximisation. m <- maxBFGS(f, start=c(0,0)) maximType(m)
This is the main interface for the maxLik package, and the function that performs Maximum Likelihood estimation. It is a wrapper for different optimizers returning an object of class "maxLik". Corresponding methods handle the likelihood-specific properties of the estimates, including standard errors.
maxLik(logLik, grad = NULL, hess = NULL, start, method, constraints=NULL, ...)
maxLik(logLik, grad = NULL, hess = NULL, start, method, constraints=NULL, ...)
logLik |
log-likelihood function. Must have the parameter vector as the first argument. Must return either a single log-likelihood value, or a numeric vector where each component is log-likelihood of the corresponding individual observation. |
grad |
gradient of log-likelihood. Must have the parameter
vector as the first argument. Must return either a single gradient
vector with length equal to the number of parameters, or a matrix
where each row is the gradient vector of the corresponding individual
observation. If |
hess |
hessian of log-likelihood. Must have the parameter
vector as the first argument. Must return a square matrix. If
|
start |
numeric vector, initial value of parameters. If it has names, these will also be used for naming the results. |
method |
maximisation method, currently either
"NR" (for Newton-Raphson),
"BFGS" (for Broyden-Fletcher-Goldfarb-Shanno),
"BFGSR" (for the BFGS algorithm implemented in R),
"BHHH" (for Berndt-Hall-Hall-Hausman),
"SANN" (for Simulated ANNealing),
"CG" (for Conjugate Gradients),
or "NM" (for Nelder-Mead).
Lower-case letters (such as "nr" for Newton-Raphson) are allowed.
The default method is "NR" for unconstrained problems, and "NM" or
"BFGS" for constrained problems, depending on if the Note that stochastic gradient ascent (SGA) is currently not supported as this method seems to be rarely used for maximum likelihood estimation. |
constraints |
either |
... |
further arguments, such as |
maxLik
supports constrained optimization in the sense that
constraints are passed further to the underlying optimization
routines, and suitable default method is selected. However, no
attempt is made to correct the resulting variance-covariance matrix.
Hence the inference may be wrong. A corresponding warning is issued
by the summary method.
object of class 'maxLik' which inherits from class 'maxim'. Useful methods include
AIC
: estimated parameter value
coef
: estimated parameter value
logLik
: log-likelihood value
nIter
: number of iterations
stdEr
: standard errors
summary
: print summary table
with estimates, standard errors, p, and z-values.
vcov
: variance-covariance matrix
The constrained maximum likelihood estimation should be considered experimental. In particular, the variance-covariance matrix is not corrected for constrained parameter space.
Ott Toomet, Arne Henningsen
maxNR
, nlm
and optim
for different non-linear optimisation routines, see
maxBFGS
for the constrained maximization examples.
## Estimate the parameter of exponential distribution t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) summary( a ) ## ## Estimate with analytic gradient and hessian. ## require much smaller tolerance ## setting 'tol=0' or negative essentially disables this stopping criterion a <- maxLik(loglik, gradlik, hesslik, start=1, control=list(tol=-1, reltol=1e-12, gradtol=1e-12)) summary( a ) ## ## Next, we give an example with vector argument: ## fit normal distribution by estimating mean and standard deviation ## by maximum likelihood ## loglik <- function(param) { # param: vector of 2, c(mean, standard deviation) mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) # can use dnorm(x, mu, sigma, log=TRUE) instead ll } x <- rnorm(100, 1, 2) # use mean=1, stdd=2 N <- length(x) res <- maxLik(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ## ## Same example, but now with named parameters and a fixed value ## resFix <- maxLik(loglik, start=c(mu=0, sigma=1), fixed="sigma") summary(resFix) # 'sigma' is exactly 1.000 now.
## Estimate the parameter of exponential distribution t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) summary( a ) ## ## Estimate with analytic gradient and hessian. ## require much smaller tolerance ## setting 'tol=0' or negative essentially disables this stopping criterion a <- maxLik(loglik, gradlik, hesslik, start=1, control=list(tol=-1, reltol=1e-12, gradtol=1e-12)) summary( a ) ## ## Next, we give an example with vector argument: ## fit normal distribution by estimating mean and standard deviation ## by maximum likelihood ## loglik <- function(param) { # param: vector of 2, c(mean, standard deviation) mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) # can use dnorm(x, mu, sigma, log=TRUE) instead ll } x <- rnorm(100, 1, 2) # use mean=1, stdd=2 N <- length(x) res <- maxLik(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ## ## Same example, but now with named parameters and a fixed value ## resFix <- maxLik(loglik, start=c(mu=0, sigma=1), fixed="sigma") summary(resFix) # 'sigma' is exactly 1.000 now.
Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.
maxNR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBFGSR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBHHH(fn, grad = NULL, hess = NULL, start, finalHessian = "BHHH", ... )
maxNR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBFGSR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBHHH(fn, grad = NULL, hess = NULL, start, finalHessian = "BHHH", ... )
fn |
the function to be maximized.
It must have the parameter vector as the first argument and
it must return either a single number, or a numeric vector (this is
is summed internally).
If the BHHH method is used and argument
|
grad |
gradient of the objective function.
It must have the parameter vector as the first argument and
it must return either a gradient vector of the objective function,
or a matrix, where columns correspond to individual parameters.
The column sums are treated as gradient components.
If |
hess |
Hessian matrix of the function.
It must have the parameter vector as the first argument and
it must return the Hessian matrix of the objective function.
If missing, finite-difference Hessian, based on |
start |
initial parameter values. If start values are named, those names are also carried over to the results. |
constraints |
either |
finalHessian |
how (and if) to calculate the final Hessian. Either
|
bhhhHessian |
logical. Indicating whether to use the information
equality approximation (Bernd, Hall, Hall, and Hausman, 1974) for
the Hessian. This effectively transforms |
fixed |
parameters to be treated as constants at their
|
activePar |
this argument is retained for backward compatibility only;
please use argument |
control |
list of control parameters. The control parameters used by these optimizers are
|
... |
further arguments to |
The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.
The BHHH (information equality) approximation is only valid for
log-likelihood functions.
It requires the score (gradient) values by individual observations and hence
those must be returned
by individual observations by grad
or fn
.
The Hessian is approximated as the negative of the sum of the outer products
of the gradients of individual observations, or, in the matrix form,
The functions maxNR
, maxBFGSR
, and maxBHHH
can work with constant parameters, useful if a parameter value
converges to the boundary of support, or for testing.
One way is to put
fixed
to non-NULL, specifying which parameters should be treated as
constants. The
parameters can also be fixed in runtime (only for maxNR
and maxBHHH
) by
signaling it with the
fn
return value. See Henningsen & Toomet (2011) for details.
object of class "maxim". Data can be extracted through the following methods:
\link{maxValue} |
|
\link[=coef.maxim]{coef} |
estimated parameter value. |
gradient |
vector, last calculated gradient value. Should be close to 0 in case of normal convergence. |
estfun |
matrix of gradients at parameter value |
hessian |
Hessian at the maximum (the last calculated value if not converged). |
returnCode |
return code:
|
returnMessage |
a short message, describing the return code. |
activePar |
logical vector, which parameters are optimized over.
Contains only |
nIter |
number of iterations. |
maximType |
character string, type of maximization. |
maxControl |
the optimization control parameters in the form of a
|
The following components can only be extracted directly (with \$
):
last.step |
a list describing the last unsuccessful step if
|
constraints |
A list, describing the constrained optimization
(
|
No attempt is made to ensure that user-provided analytic
gradient/Hessian is correct. The users are
encouraged to use compareDerivatives
function,
designed for this purpose. If analytic gradient/Hessian are wrong,
the algorithm may not converge, or may converge to a wrong point.
As the BHHH method uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!
Quasi-Newton methods, including those mentioned above, do not work
well in non-concave regions. This is especially the case with the
implementation in maxBFGSR
. The user is advised to
experiment with various tolerance options to achieve convergence.
Ott Toomet, Arne Henningsen,
function maxBFGSR
was originally developed by Yves Croissant
(and placed in 'mlogit' package)
Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653–665.
Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, 76–90.
Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, 317–322.
Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, 23–26.
Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443–458
Marquardt, D.W., (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431–441
Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, 647–656.
maxLik
for a general framework for maximum likelihood
estimation (MLE);
maxBHHH
for maximizations using the Berndt, Hall, Hall,
Hausman (1974) algorithm (which is a wrapper function to maxNR
);
maxBFGS
for maximization using the BFGS, Nelder-Mead (NM),
and Simulated Annealing (SANN) method (based on optim
),
also supporting inequality constraints;
nlm
for Newton-Raphson optimization; and
optim
for different gradient-based optimization
methods.
## Fit exponential distribution by ML t <- rexp(100, 2) # create data with parameter 2 loglik <- function(theta) sum(log(theta) - theta*t) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta) sum(1/theta - t) hesslik <- function(theta) -100/theta^2 ## Estimate with finite-difference gradient and Hessian a <- maxNR(loglik, start=1, control=list(printLevel=2)) summary(a) ## You would probably prefer 1/mean(t) instead ;-) ## The same example with analytic gradient and Hessian a <- maxNR(loglik, gradlik, hesslik, start=1) summary(a) ## BFGS estimation with finite-difference gradient a <- maxBFGSR( loglik, start=1 ) summary(a) ## For the BHHH method we need likelihood values and gradients ## of individual observations, not the sum of those loglikInd <- function(theta) log(theta) - theta*t gradlikInd <- function(theta) 1/theta - t ## Estimate with analytic gradient a <- maxBHHH(loglikInd, gradlikInd, start=1) summary(a) ## Example with a vector argument: Estimate the mean and ## variance of a random normal sample by maximum likelihood ## Note: you might want to use maxLik instead loglik <- function(param) { # param is a 2-vector of c(mean, sd) mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) ll } x <- rnorm(100, 1, 2) # use mean=1, sd=2 N <- length(x) res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ## The previous example with named parameters and a fixed value resFix <- maxNR(loglik, start=c(mu=0, sigma=1), fixed="sigma") summary(resFix) # 'sigma' is exactly 1.000 now. ### Constrained optimization ### ## We maximize exp(-x^2 - y^2) where x+y = 1 hatf <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may prefer exp(- theta %*% theta) instead } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNR(hatf, start=c(0,0), constraints=list(eqA=A, eqB=B), control=list(printLevel=1)) print(summary(res))
## Fit exponential distribution by ML t <- rexp(100, 2) # create data with parameter 2 loglik <- function(theta) sum(log(theta) - theta*t) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta) sum(1/theta - t) hesslik <- function(theta) -100/theta^2 ## Estimate with finite-difference gradient and Hessian a <- maxNR(loglik, start=1, control=list(printLevel=2)) summary(a) ## You would probably prefer 1/mean(t) instead ;-) ## The same example with analytic gradient and Hessian a <- maxNR(loglik, gradlik, hesslik, start=1) summary(a) ## BFGS estimation with finite-difference gradient a <- maxBFGSR( loglik, start=1 ) summary(a) ## For the BHHH method we need likelihood values and gradients ## of individual observations, not the sum of those loglikInd <- function(theta) log(theta) - theta*t gradlikInd <- function(theta) 1/theta - t ## Estimate with analytic gradient a <- maxBHHH(loglikInd, gradlikInd, start=1) summary(a) ## Example with a vector argument: Estimate the mean and ## variance of a random normal sample by maximum likelihood ## Note: you might want to use maxLik instead loglik <- function(param) { # param is a 2-vector of c(mean, sd) mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) ll } x <- rnorm(100, 1, 2) # use mean=1, sd=2 N <- length(x) res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ## The previous example with named parameters and a fixed value resFix <- maxNR(loglik, start=c(mu=0, sigma=1), fixed="sigma") summary(resFix) # 'sigma' is exactly 1.000 now. ### Constrained optimization ### ## We maximize exp(-x^2 - y^2) where x+y = 1 hatf <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may prefer exp(- theta %*% theta) instead } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNR(hatf, start=c(0,0), constraints=list(eqA=A, eqB=B), control=list(printLevel=1)) print(summary(res))
Stochastic Gradient Ascent–based optimizers
maxSGA(fn = NULL, grad = NULL, hess = NULL, start, nObs, constraints = NULL, finalHessian = FALSE, fixed = NULL, control=NULL, ... ) maxAdam(fn = NULL, grad = NULL, hess = NULL, start, nObs, constraints = NULL, finalHessian = FALSE, fixed = NULL, control=NULL, ... )
maxSGA(fn = NULL, grad = NULL, hess = NULL, start, nObs, constraints = NULL, finalHessian = FALSE, fixed = NULL, control=NULL, ... ) maxAdam(fn = NULL, grad = NULL, hess = NULL, start, nObs, constraints = NULL, finalHessian = FALSE, fixed = NULL, control=NULL, ... )
fn |
the function to be maximized. As the objective function
values are not directly used for optimization, this argument is
optional, given
|
grad |
gradient of the objective function.
It must have the parameter vector as the first argument, and it must
have an argument If |
hess |
Hessian matrix of the function. Mainly for compatibility
reasons, only used for computing the final Hessian if asked to do
so by setting |
start |
initial parameter values. If these have names, the names are also used for results. |
nObs |
number of observations. This is used to partition the data
into individual batches. The resulting batch
indices are forwarded to the |
constraints |
either |
finalHessian |
how (and if) to calculate the final Hessian. Either
Hessian matrix is not often used for optimization problems where one applies SGA, but even if one is not interested in standard errors, it may provide useful information about the model performance. If computed by finite-difference method, the Hessian computation may be very slow. |
fixed |
parameters to be treated as constants at their
|
control |
list of control parameters. The ones used by these optimizers are
Adam-specific parameters
General stochastic gradient parameters:
Stopping conditions:
See |
... |
further arguments to |
Gradient Ascent (GA) is a optimization method where the algorithm
repeatedly takes small steps in the gradient's direction, the
parameter vector is updated as
.
In case of Stochastic GA (SGA), the gradient is not computed on the
full set of observations but on a small subset, batch,
potentially a single observation only. In certain circumstances
this converges much faster
than when using all observation (see
Bottou et al, 2018).
If SGA_momentum
is positive, the SGA algorithm updates the parameters
in two steps. First, the momentum is used to update
the “velocity”
as
, and thereafter the parameter
is updates as
. Initial
velocity is set to 0.
The Adam algorithm is more complex and uses first and second moments of stochastic gradients to automatically adjust the learning rate. See Goodfellow et al, 2016, page 301.
The function fn
is not directly used for optimization, only
for printing or as a stopping condition. In this sense
it is up to the user to decide what the function
returns, if anything. For instance, it may be useful for fn
to compute the
objective function on either full training data, or on validation data,
and just ignore the index
argument. The latter is useful if
using patience-based stopping.
However, one may also
choose to select the observations determined by the index to
compute the objective function on the current data batch.
object of class "maxim". Data can be extracted through the following methods:
\link{maxValue} |
|
\link{coef} |
estimated parameter value. |
\link{gradient} |
vector, last calculated gradient value. Should be close to 0 in case of normal convergence. |
estfun |
matrix of gradients at parameter value |
\link{hessian} |
Hessian at the maximum (the last calculated value if not converged). |
\link{storedValues} |
return values stored at each epoch |
\link{storedParameters} |
return parameters stored at each epoch |
\link{returnCode} |
a numeric code that describes the convergence or error. |
\link{returnMessage} |
a short message, describing the return code. |
\link{activePar} |
logical vector, which parameters are optimized over.
Contains only |
\link{nIter} |
number of iterations. |
\link{maximType} |
character string, type of maximization. |
\link{maxControl} |
the optimization control parameters in the form of a
|
Ott Toomet, Arne Henningsen
Bottou, L.; Curtis, F. & Nocedal, J.: Optimization Methods for Large-Scale Machine Learning SIAM Review, 2018, 60, 223–311.
Goodfellow, I.; Bengio, Y.; Courville, A. (2016): Deep Learning, MIT Press
Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443–458
A good starting point to learn about the usage of stochastic gradient ascent in maxLik package is the vignette “Stochastic Gradient Ascent in maxLik”.
The other related functions are
maxNR
for Newton-Raphson, a popular Hessian-based maximization;
maxBFGS
for maximization using the BFGS, Nelder-Mead (NM),
and Simulated Annealing (SANN) method (based on optim
),
also supporting inequality constraints;
maxLik
for a general framework for maximum likelihood
estimation (MLE);
optim
for different gradient-based optimization
methods.
## estimate the exponential distribution parameter by ML set.seed(1) t <- rexp(100, 2) loglik <- function(theta, index) sum(log(theta) - theta*t[index]) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta, index) sum(1/theta - t[index]) ## Estimate with full-batch a <- maxSGA(loglik, gradlik, start=1, control=list(iterlim=1000, SG_batchSize=10), nObs=100) # note that loglik is not really needed, and is not used # here, unless more print verbosity is asked summary(a) ## ## demonstrate the usage of index, and using ## fn for computing the objective function on validation data. ## Create a linear model where variables are very unequally scaled ## ## OLS loglik function: compute the function value on validation data only loglik <- function(beta, index) { e <- yValid - XValid %*% beta -crossprod(e)/length(y) } ## OLS gradient: compute it on training data only ## Use 'index' to select the subset corresponding to the minibatch gradlik <- function(beta, index) { e <- yTrain[index] - XTrain[index,,drop=FALSE] %*% beta g <- t(-2*t(XTrain[index,,drop=FALSE]) %*% e) -g/length(index) } N <- 1000 ## two random variables: one with scale 1, the other with 100 X <- cbind(rnorm(N), rnorm(N, sd=100)) beta <- c(1, 1) # true parameter values y <- X %*% beta + rnorm(N, sd=0.2) ## training-validation split iTrain <- sample(N, 0.8*N) XTrain <- X[iTrain,,drop=FALSE] XValid <- X[-iTrain,,drop=FALSE] yTrain <- y[iTrain] yValid <- y[-iTrain] ## ## do this without momentum: learning rate must stay small for the gradient not to explode cat(" No momentum:\n") a <- maxSGA(loglik, gradlik, start=c(10,10), control=list(printLevel=1, iterlim=50, SG_batchSize=30, SG_learningRate=0.0001, SGA_momentum=0 ), nObs=length(yTrain)) print(summary(a)) # the first component is off, the second one is close to the true value ## do with momentum 0.99 cat(" Momentum 0.99:\n") a <- maxSGA(loglik, gradlik, start=c(10,10), control=list(printLevel=1, iterlim=50, SG_batchSize=30, SG_learningRate=0.0001, SGA_momentum=0.99 # no momentum ), nObs=length(yTrain)) print(summary(a)) # close to true value
## estimate the exponential distribution parameter by ML set.seed(1) t <- rexp(100, 2) loglik <- function(theta, index) sum(log(theta) - theta*t[index]) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta, index) sum(1/theta - t[index]) ## Estimate with full-batch a <- maxSGA(loglik, gradlik, start=1, control=list(iterlim=1000, SG_batchSize=10), nObs=100) # note that loglik is not really needed, and is not used # here, unless more print verbosity is asked summary(a) ## ## demonstrate the usage of index, and using ## fn for computing the objective function on validation data. ## Create a linear model where variables are very unequally scaled ## ## OLS loglik function: compute the function value on validation data only loglik <- function(beta, index) { e <- yValid - XValid %*% beta -crossprod(e)/length(y) } ## OLS gradient: compute it on training data only ## Use 'index' to select the subset corresponding to the minibatch gradlik <- function(beta, index) { e <- yTrain[index] - XTrain[index,,drop=FALSE] %*% beta g <- t(-2*t(XTrain[index,,drop=FALSE]) %*% e) -g/length(index) } N <- 1000 ## two random variables: one with scale 1, the other with 100 X <- cbind(rnorm(N), rnorm(N, sd=100)) beta <- c(1, 1) # true parameter values y <- X %*% beta + rnorm(N, sd=0.2) ## training-validation split iTrain <- sample(N, 0.8*N) XTrain <- X[iTrain,,drop=FALSE] XValid <- X[-iTrain,,drop=FALSE] yTrain <- y[iTrain] yValid <- y[-iTrain] ## ## do this without momentum: learning rate must stay small for the gradient not to explode cat(" No momentum:\n") a <- maxSGA(loglik, gradlik, start=c(10,10), control=list(printLevel=1, iterlim=50, SG_batchSize=30, SG_learningRate=0.0001, SGA_momentum=0 ), nObs=length(yTrain)) print(summary(a)) # the first component is off, the second one is close to the true value ## do with momentum 0.99 cat(" Momentum 0.99:\n") a <- maxSGA(loglik, gradlik, start=c(10,10), control=list(printLevel=1, iterlim=50, SG_batchSize=30, SG_learningRate=0.0001, SGA_momentum=0.99 # no momentum ), nObs=length(yTrain)) print(summary(a)) # close to true value
Returns the function value at (estimated) maximum.
maxValue(x, ...) ## S3 method for class 'maxim' maxValue(x, ...)
maxValue(x, ...) ## S3 method for class 'maxim' maxValue(x, ...)
x |
a statistical model, or a result of maximisation, created
by |
... |
further arguments for other methods |
numeric, the value of the objective function at maximum. In general, it is the last calculated value in case the process did not converge.
Ott Toomet
## Estimate the exponential distribution parameter: t <- rexp(100, 2) loglik <- function(theta) sum(log(theta) - theta*t) ## Estimate with numeric gradient and numeric Hessian a <- maxNR(loglik, start=1) maxValue(a)
## Estimate the exponential distribution parameter: t <- rexp(100, 2) loglik <- function(theta) sum(log(theta) - theta*t) ## Estimate with numeric gradient and numeric Hessian a <- maxNR(loglik, start=1) maxValue(a)
Returns the number of iterations for iterative models. The default
method assumes presence of a component iterations
in x
.
nIter(x, ...) ## Default S3 method: nIter(x, ...)
nIter(x, ...) ## Default S3 method: nIter(x, ...)
x |
a statistical model, or a result of maximisation, created
by |
... |
further arguments for methods |
This is a generic function. The default method returns the component
x$iterations
.
numeric, number of iterations. Note that ‘iteration’ may mean different things for different optimizers.
Ott Toomet
## Estimate the exponential distribution parameter: t <- rexp(100, 2) loglik <- function(theta) sum(log(theta) - theta*t) ## Estimate with numeric gradient and numeric Hessian a <- maxNR(loglik, start=1) nIter(a)
## Estimate the exponential distribution parameter: t <- rexp(100, 2) loglik <- function(theta) sum(log(theta) - theta*t) ## Estimate with numeric gradient and numeric Hessian a <- maxNR(loglik, start=1) nIter(a)
Returns the number of observations for statistical models,
estimated by Maximum Likelihood using maxLik
.
## S3 method for class 'maxLik' nObs(x, ...)
## S3 method for class 'maxLik' nObs(x, ...)
x |
a statistical model estimated by Maximum Likelihood
using |
... |
further arguments (currently ignored). |
The nObs
method for “maxLik” objects
can return the number of observations only if log-likelihood function
(or the gradient) returns values by individual observation.
numeric, number of observations
Arne Henningsen, Ott Toomet
## fit a normal distribution by ML # generate a variable from normally distributed random numbers x <- rnorm( 100, 1, 2 ) # log likelihood function (for individual observations) llf <- function( param ) { return( dnorm( x, mean = param[ 1 ], sd = param[ 2 ], log = TRUE ) ) } ## ML method ml <- maxLik( llf, start = c( mu = 0, sigma = 1 ) ) # return number of onservations nObs( ml )
## fit a normal distribution by ML # generate a variable from normally distributed random numbers x <- rnorm( 100, 1, 2 ) # log likelihood function (for individual observations) llf <- function( param ) { return( dnorm( x, mean = param[ 1 ], sd = param[ 2 ], log = TRUE ) ) } ## ML method ml <- maxLik( llf, start = c( mu = 0, sigma = 1 ) ) # return number of onservations nObs( ml )
This function returns the number of model parameters.
## S3 method for class 'maxim' nParam(x, free=FALSE, ...)
## S3 method for class 'maxim' nParam(x, free=FALSE, ...)
x |
a model returned by a maximisation method from the maxLik package. |
free |
logical, whether to report only the free parameters or the total number of parameters (default) |
... |
other arguments for methods |
Free parameters are the parameters with no equality restrictions. Some parameters may be jointly restricted (e.g. sum of two probabilities equals unity). In this case the total number of parameters may depend on the normalization.
Number of parameters in the model
Ott Toomet
nObs
for number of observations
## fit a normal distribution by ML # generate a variable from normally distributed random numbers x <- rnorm( 100, 1, 2 ) # log likelihood function (for individual observations) llf <- function( param ) { return( dnorm( x, mean = param[ 1 ], sd = param[ 2 ], log = TRUE ) ) } ## ML method ml <- maxLik( llf, start = c( mu = 0, sigma = 1 ) ) # return number of parameters nParam( ml )
## fit a normal distribution by ML # generate a variable from normally distributed random numbers x <- rnorm( 100, 1, 2 ) # log likelihood function (for individual observations) llf <- function( param ) { return( dnorm( x, mean = param[ 1 ], sd = param[ 2 ], log = TRUE ) ) } ## ML method ml <- maxLik( llf, start = c( mu = 0, sigma = 1 ) ) # return number of parameters nParam( ml )
Calculate (central) numeric gradient and Hessian, including of vector-valued functions.
numericGradient(f, t0, eps=1e-06, fixed, ...) numericHessian(f, grad=NULL, t0, eps=1e-06, fixed, ...) numericNHessian(f, t0, eps=1e-6, fixed, ...)
numericGradient(f, t0, eps=1e-06, fixed, ...) numericHessian(f, grad=NULL, t0, eps=1e-06, fixed, ...) numericNHessian(f, t0, eps=1e-6, fixed, ...)
f |
function to be differentiated. The first argument must be
the parameter vector with respect to which it is differentiated.
For numeric gradient, |
grad |
function, gradient of |
t0 |
vector, the parameter values |
eps |
numeric, the step for numeric differentiation |
fixed |
logical index vector, fixed parameters.
Derivative is calculated only with respect to the parameters
for which |
... |
furter arguments for |
numericGradient
numerically differentiates a (vector valued)
function with respect to it's (vector valued) argument. If the
functions value is a
vector and the argument is
vector, the resulting
gradient
is a
matrix.
numericHessian
checks whether a gradient function is present.
If yes, it calculates the gradient of the gradient, if not, it
calculates the full
numeric Hessian (numericNHessian
).
Matrix. For numericGradient
, the number of rows is equal to the
length of the function value vector, and the number of columns is
equal to the length of the parameter vector.
For the numericHessian
, both numer of rows and columns is
equal to the length of the parameter vector.
Be careful when using numerical differentiation in optimization routines. Although quite precise in simple cases, they may work very poorly in more complicated conditions.
Ott Toomet
# A simple example with Gaussian bell surface f0 <- function(t0) exp(-t0[1]^2 - t0[2]^2) numericGradient(f0, c(1,2)) numericHessian(f0, t0=c(1,2)) # An example with the analytic gradient gradf0 <- function(t0) -2*t0*f0(t0) numericHessian(f0, gradf0, t0=c(1,2)) # The results should be similar as in the previous case # The central numeric derivatives are often quite precise compareDerivatives(f0, gradf0, t0=1:2) # The difference is around 1e-10
# A simple example with Gaussian bell surface f0 <- function(t0) exp(-t0[1]^2 - t0[2]^2) numericGradient(f0, c(1,2)) numericHessian(f0, t0=c(1,2)) # An example with the analytic gradient gradf0 <- function(t0) -2*t0*f0(t0) numericHessian(f0, gradf0, t0=c(1,2)) # The results should be similar as in the previous case # The central numeric derivatives are often quite precise compareDerivatives(f0, gradf0, t0=1:2) # The difference is around 1e-10
This function returns the optimization objective function from a ‘maxim’ object.
objectiveFn(x, ...) ## S3 method for class 'maxim' objectiveFn(x, ...)
objectiveFn(x, ...) ## S3 method for class 'maxim' objectiveFn(x, ...)
x |
an optimization result, inheriting from class ‘maxim’ |
... |
other arguments for methods |
function, the function that was optimized. It can be directly called, given that all necessary variables are accessible from the current environment.
Ott Toomet
hatf <- function(theta) exp(- theta %*% theta) res <- maxNR(hatf, start=c(0,0)) print(summary(res)) print(objectiveFn(res)) print(objectiveFn(res)(2)) # 0.01832
hatf <- function(theta) exp(- theta %*% theta) res <- maxNR(hatf, start=c(0,0)) print(summary(res)) print(objectiveFn(res)) print(objectiveFn(res)(2)) # 0.01832
These function extract success or failure information from optimization objects.
The returnCode
gives a numeric code, and returnMessage
a
brief description about the success or
failure of the optimization, and point to the problems occured (see
documentation for the
corresponding functions).
returnCode(x, ...) ## Default S3 method: returnCode(x, ...) ## S3 method for class 'maxLik' returnCode(x, ...) returnMessage(x, ...) ## S3 method for class 'maxim' returnMessage(x, ...) ## S3 method for class 'maxLik' returnMessage(x, ...)
returnCode(x, ...) ## Default S3 method: returnCode(x, ...) ## S3 method for class 'maxLik' returnCode(x, ...) returnMessage(x, ...) ## S3 method for class 'maxim' returnMessage(x, ...) ## S3 method for class 'maxLik' returnMessage(x, ...)
x |
object, usually an optimization result |
... |
further arguments for other methods |
returnMessage
and returnCode
are a generic functions, with methods
for various optimisation algorithms.
The message should either describe
the convergence (stopping condition),
or the problem.
The known codes and the related messages are:
gradient close to zero (normal convergence).
successive function values within tolerance limit (normal convergence).
last step could not find higher value (probably not
converged). This is related to line search step getting too
small, usually because hitting the boundary of the parameter
space. It may also be related to attempts to move to a wrong
direction because of numerical errors. In some cases it can be
helped by changing steptol
.
iteration limit exceeded.
Infinite value.
Infinite gradient.
Infinite Hessian.
Successive function values withing relative tolerance limit (normal convergence).
(BFGS) Hessian approximation cannot be improved because of gradient did not change. May be related to numerical approximation problems or wrong analytic gradient.
Lost patience: the optimizer has hit an inferior value too many
times (see maxSGA
for more information)
Initial value out of range.
Integer for returnCode
, character for returnMessage
.
Different optimization routines may define it in a different way.
Ott Toomet
## maximise the exponential bell f1 <- function(x) exp(-x^2) a <- maxNR(f1, start=2) returnCode(a) # should be success (1 or 2) returnMessage(a) ## Now try to maximise log() function a <- maxNR(log, start=2) returnCode(a) # should give a failure (4) returnMessage(a)
## maximise the exponential bell f1 <- function(x) exp(-x^2) a <- maxNR(f1, start=2) returnCode(a) # should be success (1 or 2) returnMessage(a) ## Now try to maximise log() function a <- maxNR(log, start=2) returnCode(a) # should give a failure (4) returnMessage(a)
Retrieve the objective function value for each iteration if stored during the optimization.
storedValues(x, ...) ## S3 method for class 'maxim' storedValues(x, ...) storedParameters(x, ...) ## S3 method for class 'maxim' storedParameters(x, ...)
storedValues(x, ...) ## S3 method for class 'maxim' storedValues(x, ...) storedParameters(x, ...) ## S3 method for class 'maxim' storedParameters(x, ...)
x |
a result of maximization, created
by |
... |
further arguments for other methods |
These is a generic method.
If asked by control parameter storeValues=TRUE
or storeParameters=TRUE
, certain
optimization methods store the objective function value and the
parameter value at each epoch. These methods
retrieves the stored values.
storedValues
: a numeric vector, one value for each
iteration
storedParameters
: a numeric matrix with rows
corresponding to the iterations and columns to the parameter
components.
In both cases, the first value stored corresponds to the initial parameter.
Ott Toomet
## Estimate the exponential distribution parameter t <- rexp(100, 2) loglik <- function(theta, index) sum(log(theta) - theta*t[index]) ## Estimate with numeric gradient and numeric Hessian a <- maxSGA(loglik, start=1, control=list(storeValues=TRUE, storeParameters=TRUE, iterlim=10), nObs=100) storedValues(a) storedParameters(a)
## Estimate the exponential distribution parameter t <- rexp(100, 2) loglik <- function(theta, index) sum(log(theta) - theta*t[index]) ## Estimate with numeric gradient and numeric Hessian a <- maxSGA(loglik, start=1, control=list(storeValues=TRUE, storeParameters=TRUE, iterlim=10), nObs=100) storedValues(a) storedParameters(a)
Summarizes the general maximization results in a way that does not assume the function is log-likelihood.
## S3 method for class 'maxim' summary( object, hessian=FALSE, unsucc.step=FALSE, ... ) ## S3 method for class 'summary.maxim' print(x, max.rows=getOption("max.rows", 20), max.cols=getOption("max.cols", 7), ... )
## S3 method for class 'maxim' summary( object, hessian=FALSE, unsucc.step=FALSE, ... ) ## S3 method for class 'summary.maxim' print(x, max.rows=getOption("max.rows", 20), max.cols=getOption("max.cols", 7), ... )
object |
optimization result, object of class
|
hessian |
logical, whether to display Hessian matrix. |
unsucc.step |
logical, whether to describe last unsuccesful step
if |
x |
object of class |
max.rows |
maximum number of rows to be printed. This applies to the resulting coefficients (as those are printed as a matrix where the other column is the gradient), and to the Hessian if requested. |
max.cols |
maximum number of columns to be printed. Only Hessian output, if requested, uses this argument. |
... |
currently not used. |
Object of class summary.maxim
, intended to be printed with
corresponding print method.
Ott Toomet
maxNR
, returnCode
,
returnMessage
## minimize a 2D quadratic function: f <- function(b) { x <- b[1]; y <- b[2]; val <- -(x - 2)^2 - (y - 3)^2 # concave parabola attr(val, "gradient") <- c(-2*x + 4, -2*y + 6) attr(val, "hessian") <- matrix(c(-2, 0, 0, -2), 2, 2) val } ## Note that NR finds the minimum of a quadratic function with a single ## iteration. Use c(0,0) as initial value. res <- maxNR( f, start = c(0,0) ) summary(res) summary(res, hessian=TRUE)
## minimize a 2D quadratic function: f <- function(b) { x <- b[1]; y <- b[2]; val <- -(x - 2)^2 - (y - 3)^2 # concave parabola attr(val, "gradient") <- c(-2*x + 4, -2*y + 6) attr(val, "hessian") <- matrix(c(-2, 0, 0, -2), 2, 2) val } ## Note that NR finds the minimum of a quadratic function with a single ## iteration. Use c(0,0) as initial value. res <- maxNR( f, start = c(0,0) ) summary(res) summary(res, hessian=TRUE)
Summary the Maximum-Likelihood estimation including standard errors and t-values.
## S3 method for class 'maxLik' summary(object, eigentol=1e-12, ... ) ## S3 method for class 'summary.maxLik' coef(object, ...)
## S3 method for class 'maxLik' summary(object, eigentol=1e-12, ... ) ## S3 method for class 'summary.maxLik' coef(object, ...)
object |
object of class 'maxLik', or 'summary.maxLik', usually a result from Maximum-Likelihood estimation. |
eigentol |
The standard errors are only calculated if the ratio of the smallest and largest eigenvalue of the Hessian matrix is less than “eigentol”. Otherwise the Hessian is treated as singular. |
... |
currently not used. |
An object of class 'summary.maxLik' with following components:
type of maximization.
number of iterations.
code of success.
a short message describing the code.
the loglik value in the maximum.
numeric matrix, the first column contains the parameter estimates, the second the standard errors, third t-values and fourth corresponding probabilities.
logical vector, which parameters are treated as constants.
number of free parameters.
information about the constrained optimization.
Passed directly further from maxim
-object. NULL
if
unconstrained maximization.
Ott Toomet, Arne Henningsen
maxLik
for maximum likelihood estimation,
confint
for confidence intervals, and tidy
and glance
for alternative quick summaries of the ML
results.
## ML estimation of exponential distribution: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) summary(a) ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1, control=list(printLevel=2)) summary(a)
## ML estimation of exponential distribution: t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) summary(a) ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1, control=list(printLevel=2)) summary(a)
Sequentially Unconstrained Maximization Technique (SUMT) based optimization for linear equality constraints.
This implementation is primarily intended to be called from other
maximization routines, such as maxNR
.
sumt(fn, grad=NULL, hess=NULL, start, maxRoutine, constraints, SUMTTol = sqrt(.Machine$double.eps), SUMTPenaltyTol = sqrt(.Machine$double.eps), SUMTQ = 10, SUMTRho0 = NULL, printLevel=print.level, print.level = 0, SUMTMaxIter = 100, ...)
sumt(fn, grad=NULL, hess=NULL, start, maxRoutine, constraints, SUMTTol = sqrt(.Machine$double.eps), SUMTPenaltyTol = sqrt(.Machine$double.eps), SUMTQ = 10, SUMTRho0 = NULL, printLevel=print.level, print.level = 0, SUMTMaxIter = 100, ...)
fn |
function of a (single) vector parameter. The function may have more arguments (passed by ...), but those are not treated as the parameter. |
grad |
gradient function of |
hess |
function, Hessian of the |
start |
numeric, initial value of the parameter |
maxRoutine |
maximization algorithm, such as |
constraints |
list, information for constrained maximization.
Currently two components are supported: |
SUMTTol |
stopping condition. If the estimates at successive outer iterations are close enough, i.e. maximum of the absolute value over the component difference is smaller than SUMTTol, the algorithm stops. Note this does not necessarily mean that the constraints are satisfied. If the penalty function is too “weak”, SUMT may repeatedly find the same optimum. In that case a warning is issued. The user may set SUMTTol to a lower value, e.g. to zero. |
SUMTPenaltyTol |
stopping condition. If the barrier value (also called penalty)
|
SUMTQ |
a double greater than one, controlling the growth of the |
SUMTRho0 |
Initial value for One should consider supplying |
printLevel |
Integer, debugging information. Larger number prints more details. |
print.level |
same as ‘printLevel’, for backward compatibility |
SUMTMaxIter |
Maximum SUMT iterations |
... |
Other arguments to |
The Sequential Unconstrained Minimization Technique is a heuristic
for constrained optimization. To minimize a function
subject to
constraints, it uses a non-negative penalty function
,
such that
is zero
iff
satisfies the constraints. One iteratively minimizes
, where the
values are increased according to the rule
for some
constant
, until convergence is
achieved in the sense that the barrier value
is close to zero. Note that there is no
guarantee that the global constrained optimum is
found. Standard practice recommends to use the best solution
found in “sufficiently many” replications.
Any of
the maximization algorithms in the maxLik, such as
maxNR
, can be used for the unconstrained step.
Analytic gradient and hessian are used if provided.
Object of class 'maxim'. In addition, a component
constraints |
A list, describing the constrained optimization. Includes the following components:
|
In case of equality constraints, it may be more efficient to enclose the function in a wrapper function. The wrapper calculates full set of parameters based on a smaller set of parameters, and the constraints.
Ott Toomet, Arne Henningsen
sumt
in package clue.
## We maximize exp(-x^2 - y^2) where x+y = 1 hatf <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may prefer exp(- theta %*% theta) instead } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- sumt(hatf, start=c(0,0), maxRoutine=maxNR, constraints=list(eqA=A, eqB=B)) print(summary(res))
## We maximize exp(-x^2 - y^2) where x+y = 1 hatf <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may prefer exp(- theta %*% theta) instead } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- sumt(hatf, start=c(0,0), maxRoutine=maxNR, constraints=list(eqA=A, eqB=B)) print(summary(res))
These methods return summary information about the estimated model. Both require the tibble package to be installed.
## S3 method for class 'maxLik' tidy(x, ...) ## S3 method for class 'maxLik' glance(x, ...)
## S3 method for class 'maxLik' tidy(x, ...) ## S3 method for class 'maxLik' glance(x, ...)
x |
object of class 'maxLik'. |
... |
Not used. |
For tidy()
, a tibble with columns:
The name of the estimated parameter (parameters are sequentially numbered if names missing).
The estimated parameter.
The standard error of the estimate.
The -statistic of the estimate.
The -value.
This is essentially the same table as summary
-method prints,
just in form of a tibble (data frame).
For glance()
, a one-row tibble with columns:
The degrees of freedom of the model.
The log-likelihood of the model.
Akaike's Information Criterion for the model.
The number of observations, if this is available, otherwise NA
.
David Hugh-Jones
The functions tidy
and
glance
in package generics, and
summary
to display the
“standard” summary information.
## Example with a single parameter t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t a <- maxLik(loglik, start=2) tidy(a) glance(a) ## Example with a parameter vector x <- rnorm(100) loglik <- function(theta) { dnorm(x, mean=theta[1], sd=theta[2], log=TRUE) } a <- maxLik(loglik, start=c(mu=0, sd=1)) tidy(a) glance(a)
## Example with a single parameter t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t a <- maxLik(loglik, start=2) tidy(a) glance(a) ## Example with a parameter vector x <- rnorm(100) loglik <- function(theta) { dnorm(x, mean=theta[1], sd=theta[2], log=TRUE) } a <- maxLik(loglik, start=c(mu=0, sd=1)) tidy(a) glance(a)
Extract variance-covariance matrices from maxLik
objects.
## S3 method for class 'maxLik' vcov( object, eigentol=1e-12, ... )
## S3 method for class 'maxLik' vcov( object, eigentol=1e-12, ... )
object |
a ‘maxLik’ object. |
eigentol |
eigenvalue tolerance, controlling when the Hessian matrix is treated as numerically singular. |
... |
further arguments (currently ignored). |
The standard errors are only calculated if the ratio of the smallest and largest eigenvalue of the Hessian matrix is less than “eigentol”. Otherwise the Hessian is treated as singular.
the estimated variance covariance matrix of the coefficients. In
case of the estimated Hessian is singular, it's values are
Inf
. The values corresponding to fixed parameters are zero.
Arne Henningsen, Ott Toomet
## ML estimation of exponential random variables t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) vcov(a) ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1) vcov(a)
## ML estimation of exponential random variables t <- rexp(100, 2) loglik <- function(theta) log(theta) - theta*t gradlik <- function(theta) 1/theta - t hesslik <- function(theta) -100/theta^2 ## Estimate with numeric gradient and hessian a <- maxLik(loglik, start=1, control=list(printLevel=2)) vcov(a) ## Estimate with analytic gradient and hessian a <- maxLik(loglik, gradlik, hesslik, start=1) vcov(a)