Title: | Empirical Bayes Variable Selection via ICM/M Algorithm |
---|---|
Description: | Empirical Bayes variable selection via ICM/M algorithm for normal, binary logistic, and Cox's regression. The basic problem is to fit high-dimensional regression which sparse coefficients. This package allows incorporating the Ising prior to capture structure of predictors in the modeling process. More information can be found in the papers listed in the URL below. |
Authors: | Vitara Pungpapong [aut, cre], Min Zhang [ctb], Dabao Zhang [ctb] |
Maintainer: | Vitara Pungpapong <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2024-11-16 03:18:21 UTC |
Source: | https://github.com/cran/icmm |
Carries out empirical Bayes variable selection via ICM/M algorithm. The basic problem is to fit a high-dimensional regression which most of the coefficients are assumed to be zero. This package allows incorporating the Ising prior to capture structure of predictors in the modeling process. The current version of this package can handle the normal, binary logistic, and Cox's regression.
Package: | icmm |
Type: | Package |
Version: | 1.2 |
Date: | 2021-5-12 |
License: | GPL-2 |
LazyLoad: | yes |
Vitara Pungpapong, Min Zhang, Dabao Zhang
Maintainer: Vitara Pungpapong <[email protected]>
Pungpapong, V., Zhang, M. and Zhang, D. (2015). Selecting massive variables using an iterated conditional modes/medians algorithm. Electronic Journal of Statistics. 9:1243-1266. <doi:10.1214/15-EJS1034>.
Pungpapong, V., Zhang, M. and Zhang, D. (2020). Integrating Biological Knowledge Into Case-Control Analysis Through Iterated Conditional Modes/Medians Algorithm. Journal of Computational Biology. 27(7): 1171-1179. <doi:10.1089/cmb.2019.0319>.
a
and b
.This function estimates the hyperparameters a
and b
for the Ising prior. This function is for internal use called by the icmm
function.
get.ab(beta, structure, edgeind)
get.ab(beta, structure, edgeind)
beta |
a (p*1) matrix of regression coefficients. |
structure |
a data frame stores the information of structure among predictors. |
edgeind |
a vector stores primary keys of |
Estimate hyperparameters, a
and b
, using maximum pseudolikelihood estimators.
Return a two-dimensional vector where the fist element is a
and the second element is b
.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) data(linearrelation) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Suppose obtain beta from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) edgeind<-sort(unique(linearrelation[,1])) hyperparameter<-get.ab(beta=beta, structure=linearrelation, edgeind=edgeind)
data(simGaussian) data(linearrelation) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Suppose obtain beta from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) edgeind<-sort(unique(linearrelation[,1])) hyperparameter<-get.ab(beta=beta, structure=linearrelation, edgeind=edgeind)
alpha
.This function estimates a hyperparameter alpha
, a scale parameter in Laplace denstiy. This function is for internal use called by the icmm
function.
get.alpha(beta, scaledfactor)
get.alpha(beta, scaledfactor)
beta |
a (p*1) matrix of regression coefficients. |
scaledfactor |
a scalar value of multiplicative factor. |
This function estimates a hyperparameter alpha
, a scale parameter in Laplace density as the mode of its full conditional distribution function.
Return a scalar value of alpha
.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values of beta from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate alpha alpha<-0.5 # Estimate sigma e<-Y-X%*%beta nz<-sum(beta[,1]!=0) sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) # Update alpha as the mode of its full conditional distribution function alpha<-get.alpha(beta=beta, scaledfactor=1/(sqrt(n-1)*sum(abs(beta))/sigma))
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values of beta from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate alpha alpha<-0.5 # Estimate sigma e<-Y-X%*%beta nz<-sum(beta[,1]!=0) sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) # Update alpha as the mode of its full conditional distribution function alpha<-get.alpha(beta=beta, scaledfactor=1/(sqrt(n-1)*sum(abs(beta))/sigma))
Given a sufficient statistic for a regression coefficient, this funciton estimates a regression coefficient without assuming prior on structure of predictors.
get.beta(SS, w, alpha, scaledfactor)
get.beta(SS, w, alpha, scaledfactor)
SS |
a scalar value of sufficient statistic for a regression coefficient. |
w |
a scalar value of mixing weight. |
alpha |
a scalar value of hyperparameter |
scaledfactor |
a scalar value of multiplicative factor. |
Empirical Bayes thresholding is employed to obtain a posterior median of a regression coefficient.
a scalar value of regression coefficient.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters w<-0.5 alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) # Obtain a sufficient statistic j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) beta[j,1]<-get.beta(SS=SS, w=w, alpha=alpha, scaledfactor=sigma/sqrt(n-1))
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters w<-0.5 alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) # Obtain a sufficient statistic j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) beta[j,1]<-get.beta(SS=SS, w=w, alpha=alpha, scaledfactor=sigma/sqrt(n-1))
Given a sufficient statistic for a regression coefficient, this function estimates a coefficient when assuming the Ising model to incorporate the prior of structured predictors.
get.beta.ising(SS, wpost, alpha, scaledfactor)
get.beta.ising(SS, wpost, alpha, scaledfactor)
SS |
a sufficient statistic for a regression coefficient. |
wpost |
a posterior probability of mixing weight. |
alpha |
a scalar value for hyperparameter |
scaledfactor |
a scalar value for multiplicative factor. |
Given a posterior probability of mixing weight, empirical Bayes thresholding is employed to obtain a posterior median of a regression coefficient.
a scalar value of regression coefficient.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] data(linearrelation) edgeind<-sort(unique(linearrelation[,1])) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) hyperparam<-get.ab(beta, linearrelation, edgeind) # Obtain regression coefficient j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) wpost<-get.wpost(SS, beta, alpha, hyperparam, linearrelation, edgeind, j) beta[j,1]<-get.beta.ising(SS=SS, wpost=wpost, alpha=alpha, scaledfactor=sigma/sqrt(n-1))
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] data(linearrelation) edgeind<-sort(unique(linearrelation[,1])) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) hyperparam<-get.ab(beta, linearrelation, edgeind) # Obtain regression coefficient j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) wpost<-get.wpost(SS, beta, alpha, hyperparam, linearrelation, edgeind, j) beta[j,1]<-get.beta.ising(SS=SS, wpost=wpost, alpha=alpha, scaledfactor=sigma/sqrt(n-1))
For logistic regression, given the current estimates of regression coefficients, working responses and their corresponding weights are obtained.
get.pseudodata.binomial(Y, X, beta0, beta, niter)
get.pseudodata.binomial(Y, X, beta0, beta, niter)
Y |
an (n*1) numeric matrix of responses. |
X |
an (n*p) numeric design matrix. |
beta0 |
a scalar value of intercept term. |
beta |
a (p*1) matrix of regression coefficients. |
niter |
number of iterations in ICM/M algorithm. |
Return a list including elements
z |
an (n*1) matrix of working responses |
sigma2 |
an (n*1) matrix of inverse of weights. |
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1]) p<-dim(X)[2] # Obtain initial values from lasso data(initbetaBinomial) initbeta<-as.matrix(initbetaBinomial) # Get Pseudodata pseudodata<-get.pseudodata.binomial(Y=Y, X=X, beta0=0, beta=initbeta, niter=1) z<-pseudodata$z sigma<-sqrt(pseudodata$sigma2)
data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1]) p<-dim(X)[2] # Obtain initial values from lasso data(initbetaBinomial) initbeta<-as.matrix(initbetaBinomial) # Get Pseudodata pseudodata<-get.pseudodata.binomial(Y=Y, X=X, beta0=0, beta=initbeta, niter=1) z<-pseudodata$z sigma<-sqrt(pseudodata$sigma2)
For Cox's regression model, given the current estimates of regression coefficients, working responses and their corresponding weights are obtained.
get.pseudodata.cox(Y, X, event, beta, time, ntime, sumevent)
get.pseudodata.cox(Y, X, event, beta, time, ntime, sumevent)
Y |
an (n*1) numeric matrix of time response. |
X |
an (n*p) numeric design matrix. |
event |
an (n*1) numeric matrix of status: of status indicator: |
beta |
a (p*1) matrix of regression coefficients. |
time |
a vector or sorted value of |
ntime |
length of the vector |
sumevent |
a vector of size |
Return a list including elements
z |
an (n*1) matrix of working responses |
sigma2 |
an (n*1) matrix of inverse of weights. |
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)]) time<-sort(unique(Y)) ntime<-length(time) # sum of event_i where y_i =time_k sumevent<-rep(0, ntime) for(j in 1:ntime) { sumevent[j]<-sum(event[Y[,1]==time[j]]) } # Obtain initial values from lasso data(initbetaCox) initbeta<-as.matrix(initbetaCox) # Get Pseudodata pseudodata<-get.pseudodata.cox(Y, X, event, initbeta, time, ntime, sumevent) z<-pseudodata$z sigma<-sqrt(pseudodata$sigma2)
data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)]) time<-sort(unique(Y)) ntime<-length(time) # sum of event_i where y_i =time_k sumevent<-rep(0, ntime) for(j in 1:ntime) { sumevent[j]<-sum(event[Y[,1]==time[j]]) } # Obtain initial values from lasso data(initbetaCox) initbeta<-as.matrix(initbetaCox) # Get Pseudodata pseudodata<-get.pseudodata.cox(Y, X, event, initbeta, time, ntime, sumevent) z<-pseudodata$z sigma<-sqrt(pseudodata$sigma2)
This function estimates the standard deviation when family="gaussian"
. This function is for internal use called by the icmm
function.
get.sigma(Y, X, beta, alpha)
get.sigma(Y, X, beta, alpha)
Y |
an (n*1) numeric matrix of responses. |
X |
an (n*p) numeric design matrix. |
beta |
a (p*1) matrix of regression coefficients. |
alpha |
a scalar value of hyperparmeter |
Estimate standard deviation as the mode of its full conditional distribution function when specify family="gaussian"
. This function is for internal use called by the icmm
function.
Return a scalar value of standard deviation.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) alpha<-0.5 # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Obtain sigma sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha)
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) alpha<-0.5 # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Obtain sigma sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha)
With the Ising prior on structured predictors, this function gets the posterior probability of mixing weight.
get.wpost(SS, beta, alpha, hyperparam, structure, edgeind, j)
get.wpost(SS, beta, alpha, hyperparam, structure, edgeind, j)
SS |
a scalar value of sufficient statistic for regression coefficient. |
beta |
a (p*1) matrix of regression coefficients. |
alpha |
a scalar value of hyperparameter |
hyperparam |
a two-dimensional vector of hyperparameters |
structure |
a data frame stores the information of structure among predictors. |
edgeind |
a vector stores primary keys of |
j |
an index ranges from 1 to p. This function estimates a posterior probability of a mixing weight corresponding to predictor |
With the Ising prior on structured predictors, the problem is transformed into the realm of empirical Bayes thresholding with Laplace prior by estimating the posterior probability of mixing weight. The posterior probability is used to find the posterior median of a regression coefficient.
Return a scalar value of a posterior probability of mixing weight for predictor.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] data(linearrelation) edgeind<-sort(unique(linearrelation[,1])) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) hyperparam<-get.ab(beta, linearrelation, edgeind) # Estimate the posterior probability of first predictor j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) wpost<-get.wpost(SS=SS, beta=beta, alpha=alpha, hyperparam=hyperparam, structure=linearrelation, edgeind=edgeind, j=j)
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] data(linearrelation) edgeind<-sort(unique(linearrelation[,1])) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Initiate all other parameters alpha<-0.5 sigma<-get.sigma(Y=Y, X=X, beta=beta, alpha=alpha) hyperparam<-get.ab(beta, linearrelation, edgeind) # Estimate the posterior probability of first predictor j<-1 Yres<-Y-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) wpost<-get.wpost(SS=SS, beta=beta, alpha=alpha, hyperparam=hyperparam, structure=linearrelation, edgeind=edgeind, j=j)
Given other parameters, this function estimates a mixing weight from the mode of its full conditional distribution function.
get.wprior(beta)
get.wprior(beta)
beta |
a (p*1) matrix of regression coefficients. |
Given other parameters, this function estimates a mixing weight from the mode of its full conditional distribution function. This function is called when use the independent prior of predictors (no prior on structured predictors).
Return a scalar value of a mixing weight.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Estimate the mixing weight w<-get.wprior(beta)
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Obtain initial values from lasso data(initbetaGaussian) beta<-as.matrix(initbetaGaussian) # Estimate the mixing weight w<-get.wprior(beta)
This function estimates the local posterior probability when assuming no prior on structured predictors.
get.zeta(SS, w, alpha)
get.zeta(SS, w, alpha)
SS |
a scalar value of sufficient statistic for regression coefficient. |
w |
a scalar value of mixing weight. |
alpha |
a scalar value of hyperparameter |
Given all other parameters, this function estimates the local posterior probability or the probability that a regression coefficient is not zero conditional on other parameters. This function is called when assuming no prior on structured predictors.
Return a scalar value of local posterior probability.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) # Obtain the final output from ebvs output<-icmm(Y, X, b0.start=0, b.start=initbeta, family = "gaussian", ising.prior = FALSE, estalpha = FALSE, alpha = 0.5, maxiter = 100) b0<-output$coef[1] beta<-matrix(output$coef[-1], ncol=1) # Get all parameters for function arguments w<-get.wprior(beta) alpha<-0.5 sigma<-get.sigma(Y,X,beta,alpha) # Estimate local posterior probability j<-1 Yres<-Y-b0-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) zeta<-get.zeta(SS=SS, w=w, alpha=alpha)
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) # Obtain the final output from ebvs output<-icmm(Y, X, b0.start=0, b.start=initbeta, family = "gaussian", ising.prior = FALSE, estalpha = FALSE, alpha = 0.5, maxiter = 100) b0<-output$coef[1] beta<-matrix(output$coef[-1], ncol=1) # Get all parameters for function arguments w<-get.wprior(beta) alpha<-0.5 sigma<-get.sigma(Y,X,beta,alpha) # Estimate local posterior probability j<-1 Yres<-Y-b0-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) zeta<-get.zeta(SS=SS, w=w, alpha=alpha)
This function estimates the local posterior probability when assuming Ising prior on structured predictors.
get.zeta.ising(SS, beta, alpha, hyperparam, structure, edgeind, j)
get.zeta.ising(SS, beta, alpha, hyperparam, structure, edgeind, j)
SS |
a scalar value of sufficient statistic for regression coefficient. |
beta |
a (p*1) matrix of regression coefficients. |
alpha |
a scalar value of hyperparameter |
hyperparam |
a two-dimensional vector of hyperparameters |
structure |
a data frame stores the information of structure among predictors. |
edgeind |
a vector stores primary keys of |
j |
an index ranges from 1 to p. This function estimate a local posterior probability of predictor |
Given all other parameters, this function estimates the local posterior probability or the probability that a regression coefficient is not zero conditional on other par ameters. This function is called when assuming Ising prior on structured predictors.
Return a scalar value of local posterior probability.
Vitara Pungpapong, Min Zhang, Dabao Zhang
data(simGaussian) data(linearrelation) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) # Get final output from ebvs output<-icmm(Y, X, b0.start=0, b.start=initbeta, family = "gaussian", ising.prior = TRUE, structure=linearrelation, estalpha = FALSE, alpha = 0.5, maxiter = 100) b0<-output$coef[1] beta<-matrix(output$coef[-1], ncol=1) # Get all parameters for function arguments w<-get.wprior(beta) alpha<-0.5 sigma<-get.sigma(Y,X,beta,alpha) edgeind<-sort(unique(linearrelation[,1])) hyperparam<-get.ab(beta, linearrelation, edgeind) # Estimate local posterior probability j<-1 Yres<-Y-b0-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) zeta<-get.zeta.ising(SS=SS, beta=beta, alpha=alpha, hyperparam=hyperparam, structure=linearrelation, edgeind=edgeind, j=j)
data(simGaussian) data(linearrelation) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) n<-dim(X)[1] # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) # Get final output from ebvs output<-icmm(Y, X, b0.start=0, b.start=initbeta, family = "gaussian", ising.prior = TRUE, structure=linearrelation, estalpha = FALSE, alpha = 0.5, maxiter = 100) b0<-output$coef[1] beta<-matrix(output$coef[-1], ncol=1) # Get all parameters for function arguments w<-get.wprior(beta) alpha<-0.5 sigma<-get.sigma(Y,X,beta,alpha) edgeind<-sort(unique(linearrelation[,1])) hyperparam<-get.ab(beta, linearrelation, edgeind) # Estimate local posterior probability j<-1 Yres<-Y-b0-X%*%beta+X[,j]*beta[j,1] sxy<-t(Yres)%*%X[,j] ssx<-sum(X[,j]^2) SS<-sqrt(n-1)*sxy/(sigma*ssx) zeta<-get.zeta.ising(SS=SS, beta=beta, alpha=alpha, hyperparam=hyperparam, structure=linearrelation, edgeind=edgeind, j=j)
Empirical Bayes variable selection via the ICM/M algorithm.
icmm(Y, X, event, b0.start, b.start, family = "gaussian", ising.prior = FALSE, structure, estalpha = FALSE, alpha = 0.5, maxiter = 100)
icmm(Y, X, event, b0.start, b.start, family = "gaussian", ising.prior = FALSE, structure, estalpha = FALSE, alpha = 0.5, maxiter = 100)
Y |
an (n*1) numeric matrix of responses. |
X |
an (n*p) numeric design matrix. |
event |
an (n*1) numeric matrix of status for censored data: |
b0.start |
a starting value of intercept term (optional). |
b.start |
a (p*1) matrix of starting values for regression coefficients. |
family |
specification of the model. It can be one of these three models: |
ising.prior |
a logical flag for Ising prior utilization. |
structure |
a data frame stores the information of structured predictors (need to specify when |
estalpha |
a logical flag specifying whether to obtain |
alpha |
a scalar value of scale parameter in Laplace density (non-zero part of prior). The default value is |
maxiter |
a maximum values of iterations for ICM/M algorithm. |
The main function for empirical Bayes variable selection. Iterative conditional modes/medians (ICM/M) is implemented in this function. The basic problem is to estimate regression coefficients in high-dimensional data (i.e., large p small n) and we assume that most coefficients are zero. This function also allows the prior of structure of covariates to be incorporated in the model.
Return a list including elements
coef |
a vector of model coefficients. The first element is an intercept term when specifying |
iterations |
number of iterations of ICM/M. |
alpha |
a scalar value of |
postprob |
a p-vector of local posterior probabilities or zeta. |
Vitara Pungpapong, Min Zhang, Dabao Zhang
Pungpapong, V., Zhang, M. and Zhang, D. (2015). Selecting massive variables using an iterated conditional modes/medians algorithm. Electronic Journal of Statistics. 9:1243-1266. <doi:10.1214/15-EJS1034>.
Pungpapong, V., Zhang, M. and Zhang, D. (2020). Integrating Biological Knowledge Into Case-Control Analysis Through Iterated Conditional Modes/Medians Algorithm. Journal of Computational Biology. 27(7): 1171-1179. <doi:10.1089/cmb.2019.0319>.
get.ab
, get.alpha
, get.beta
, get.beta.ising
, get.pseudodata.binomial
, get.pseudodata.cox
, get.sigma
, get.wprior
, get.zeta
, get.zeta.ising
# Normal linear regression model # With no prior on structure among predictors data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) result<-icmm(Y=Y, X=X, b.start=initbeta, family="gaussian", ising.prior=FALSE, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # With prior on structure among predictors data(linearrelation) result<-icmm(Y=Y, X=X, b.start=initbeta, family="gaussian", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # Binary logistic regression model data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1]) p<-dim(X)[2] # Obtain initial values from lasso data(initbetaBinomial) initbeta<-as.matrix(initbetaBinomial) result<-icmm(Y=Y, X=X, b0.start=0, b.start=initbeta, family="binomial", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # Cox's model data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)]) # Obtain initial values from lasso data(initbetaCox) initbeta<-as.matrix(initbetaCox) result <- icmm(Y=Y, X=X, event=event, b.start=initbeta, family="cox", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost
# Normal linear regression model # With no prior on structure among predictors data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1]) # Obtain initial values from lasso data(initbetaGaussian) initbeta<-as.matrix(initbetaGaussian) result<-icmm(Y=Y, X=X, b.start=initbeta, family="gaussian", ising.prior=FALSE, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # With prior on structure among predictors data(linearrelation) result<-icmm(Y=Y, X=X, b.start=initbeta, family="gaussian", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # Binary logistic regression model data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1]) p<-dim(X)[2] # Obtain initial values from lasso data(initbetaBinomial) initbeta<-as.matrix(initbetaBinomial) result<-icmm(Y=Y, X=X, b0.start=0, b.start=initbeta, family="binomial", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost # Cox's model data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)]) # Obtain initial values from lasso data(initbetaCox) initbeta<-as.matrix(initbetaCox) result <- icmm(Y=Y, X=X, event=event, b.start=initbeta, family="cox", ising.prior=TRUE, structure=linearrelation, estalpha=FALSE, alpha=0.5, maxiter=100) result$coef result$iterations result$alpha result$wpost
Initial values for the regression coefficients obtained from binary logistic model with lasso regularization for simBinomial
data set.
data(initbetaBinomial)
data(initbetaBinomial)
A data frame with 400 rows.
V1
a numeric vector of the regression coefficients.
data(initbetaBinomial)
data(initbetaBinomial)
Initial values for the regression coefficients obtained from Cox's model with lasso regularization for simCox
data set.
data(initbetaCox)
data(initbetaCox)
A data frame with 400 rows.
V1
a numeric vector of the regression coefficients.
data(initbetaCox)
data(initbetaCox)
Initial values for the regression coefficients obtained from normal linear regression model with lasso regularization for simGaussian
data set.
data(initbetaGaussian)
data(initbetaGaussian)
A data frame with 400 rows.
V1
a numeric vector of the regression coefficients.
data(initbetaGaussian)
data(initbetaGaussian)
This data frame is used as an example to store the structure of predictors or the edge set of an undirected graph. For this data frame, the linear chain is assumed for each predictor.
data(linearrelation)
data(linearrelation)
A data frame with 400 observations and 2 variables as follows.
Index
an index of the predictor/node which has at least one edge.
EdgeIndices
a string of all indices having an edge connected to Index
separated by semicolon(;).
This structure of predictors assumes a linear chain for each predictor which its immediate neighbors. For example, j-predictor is connected to (j-1)-predictor and (j+1)-predictor. The example for the entry in the data frame is Index="5"
and EdgeIndices="4;6"
.
data(linearrelation) # To see the format of linearrelation data frame head(linearrelation)
data(linearrelation) # To see the format of linearrelation data frame head(linearrelation)
Simulated data from the binary logistic regression model. A data frame with 100 observations and 401 variables. The included variables are V1
A numeric vector of binary responses where each entry is either 0
or 1
. V2-V401
400 vectors of covariates.
data(simBinomial)
data(simBinomial)
A data frame of simulated data from the binary logistic regression with 100 observations and 401 variables.
data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1])
data(simBinomial) Y<-as.matrix(simBinomial[,1]) X<-as.matrix(simBinomial[,-1])
Simulated data from Cox's regression model. A data frame with 100 observations and 402 variables. The included variables are V1
A numeric vector of responses for right censored data. V2
A numeric vector of status indicator: 0
=right censored, 1
=event at time V1
. V3
-V402
400 vectors of covariates.
data(simCox)
data(simCox)
A data frame of simulated data from Cox's regression model with 100 observations and 402 variables.
data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)])
data(simCox) Y<-as.matrix(simCox[,1]) event<-as.matrix(simCox[,2]) X<-as.matrix(simCox[,-(1:2)])
Simulated data from the normal linear regression model. A data frame with 100 observations and 401 variables. The included variables areV1
A numeric vector of responses.V2-V401
400 vectors of covariates.
data(simGaussian)
data(simGaussian)
A data frame of simulated data from the normal linear regression with 100 observations and 401 variables.
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1])
data(simGaussian) Y<-as.matrix(simGaussian[,1]) X<-as.matrix(simGaussian[,-1])