lcca.datasim {lcca}R Documentation

Simulate random data from a latent-class causal model

Description

The generic method lcca.datasim simulates a random dataset of a given size from a latent-class causal model under user-supplied parameters. It may be used in simulations to evaluate the properties of inferential procedures over repeated samples.

Usage


lcca.datasim(obj, ...)


## Default S3 method:
lcca.datasim(obj, outcome.distribution="NORMAL", iseeds = NULL, nlevs, ncases,
   x.alpha, x.beta, case.names = NULL, item.names = NULL, ...)


## S3 method for class 'lcca'
lcca.datasim(obj, outcome.distribution="NORMAL", iseeds = NULL, 
   nlevs = obj$nlevs, ncases = obj$ncases, x.alpha = obj$x.alpha,
   x.beta = obj$x.beta, case.names = obj$case.names,
   item.names = obj$item.names, ...)

Arguments

obj

object used to select a method. Either an object of class "lcca" produced by the function lcca, or a list containing parameters from a latent-class causal model; see DETAILS.

outcome.distribution

The distribution of outcome variable, could be "NORMAL", "LOGISTIC" or "POISSON".

iseeds

two integers to initialize the random number generator; see DETAILS.

nlevs

integer vector of length nitems, where nitems is the number of response variables, indicating the number of levels or response categories for each variable.

ncases

number of cases to simulate (i.e., the sample size).

x.alpha

matrix of predictors (including a constant term, if present) for the logistic treatment model.

x.beta

matrix of predictors (including a constant term, if present) for the linear outcome model.

case.names

optional names to assign to the rows of the resulting data matrix. Should be a character vector of length ncases.

item.names

optional names to assign to the columns of the resulting data matrix. Should be a character vector of length nitems.

...

additional arguments to be passed to the methods.

Details

This generic method may be called in two ways. One way is to supply the parameters of a latent-class causal model as the first argument. The parameters are arranged as a list with four named named components: rho, alpha, beta and sigma2. The component rho should be an array of dimension c(nitems,maxlevs,nclass), where nitems is the number of items on the left-hand side of formula.treatment, maxlevs is the maximum number of levels (distinct response categories) among the items, and nclass is the number of treatment classes. The element starting.values$rho[j,k,c,g] is the probability that an individual in class c supplies a response of k to item j. The component alpha should be a matrix of dimension c(ncovs.alpha,nclass), where ncovs.alpha is the number of predictors in the logistic treatment model (including a constant term for the intercept, if present). The elements of starting.values$alpha[,c] are the coefficients determining the log-odds of membership in class c, versus the reference class. If c is the reference class, then all elements of starting.values$alpha[,c] must be zero. The component beta should be a matrix of dimension c(ncovs.beta,nclass), where ncovs.beta is the number of predictors in the outcome model (including a constant term for the intercept, if present). The elements of starting.values$beta[,c] are the coefficients for predicting the potential outcomes for class c. The component sigma2 should be a numeric vector of length nclass containing residual variances for the potential outcomes.

The second way to call this method is to supply as the first argument an object of class "lcca", which is the result of a call to lcca. In this case, a dataset will be generated with the same dimensions as the data in the original lcca call, with the same number of classes as in the original model, the same covariates, and parameters equal to the final estimates from that model.

This function uses its own internal random number generator which is seeded by two integers, for example, seeds=c(123,456), which allows results to be reproduced in the future. If seeds=NULL then the function will seed itself with two random integers from R. Therefore, results can also be made reproducible by calling set.seed beforehand and taking seeds=NULL.

Value

a list with two components:

u

a matrix of integers of dimension c(ncases,nitems) containing a random sample of items from the specified latent-class treatment model.

y.obs

a numeric vector of length ncases containing the simulated observed outcomes

Author(s)

Joe Schafer

Send questions to mchelpdesk@psu.edu

See Also

lcca

Examples

# Set up rho-parameters for a two-class model with 4 binary
# items and strong measurement.  Members of the first class
# have a high probability of endorsing (providing a response
# of 1 to) items 1 and 2. Members of the second class have
# a high probability of endorsing items 3 and 4.
rho <- array(NA, c(4,2,2) )
rho[,1,1] <- c(.9,.9,.1,.1)
rho[,2,1] <- 1 - rho[,1,1]
rho[,1,2] <- c(.1,.1,.9,.9)
rho[,2,2] <- 1 - rho[,1,2]

# create matrix of predictors for the treatment model
# consisting of a constant
# and two normally distributed covariates
N <- 1000
set.seed(102)
X1 <- rnorm(N)
X2 <- rnorm(N)
x.alpha <- cbind(1, X1, X2)

# use the same predictors in the outcome model
x.beta <- x.alpha

# Set up the logistic coefficents, with class 1 as the
# reference class
alpha <- matrix(NA, 3, 2)
alpha[,1] <- 0  # reference class
alpha[,2] <- c(0,1,1)

# Set up the linear coefficents for the outcomes model
# Note:  average treatment effect in this population is 1
beta <- matrix(NA, 3, 2)
beta[,1] <- c(1,2,3)
beta[,2] <- c(2,2,3)

# set up residual variances
sigma2 <- c(1,1.5)

# generate a sample of N=1000 observations, and
# fit the two-class model to the simulated data
param <- list(rho=rho, alpha=alpha, beta=beta, sigma2=sigma2)
tmp <- lcca.datasim( param, outcome.distribution="NORMAL", nlevs=c(2,2,2,2,2,2), ncases=N,
  x.alpha=x.alpha, x.beta=x.beta)
U <- tmp$u
Y <- tmp$y.obs
fit <- lcca( U ~ X1 + X2, Y ~ X1 + X2 )
summary(fit)

[Package lcca version 2.0.0 Index]