lca.datasim {lcca}R Documentation

Simulate random data from a latent-class model

Description

The generic method lca.datasim simulates a random dataset of a given size from a latent-class model under user-supplied parameters. It may be used in simulations to evaluate the properties of inferential procedures over repeated samples.

Usage



lca.datasim(obj, ...)


## Default S3 method:
lca.datasim(obj, iseeds = NULL, nlevs, ncases,
   groups = NULL, case.names = NULL, item.names = NULL, ...)


## S3 method for class 'lca'
lca.datasim(obj, iseeds = NULL, nlevs = obj$nlevs, 
   ncases = obj$ncases, groups = obj$ngroups, case.names = obj$case.names,
   item.names = obj$item.names, ...)

Arguments

obj

object used to select a method. Either an object of class "lca" produced by the function lca, or a list containing parameters from a latent-class model; see DETAILS.

iseeds

two integers to initialize the random number generator; see DETAILS.

nlevs

integer vector of length nitems, where nitems is the number of response variables, indicating the number of levels or response categories for each variable.

ncases

number of cases to simulate (i.e., the sample size).

groups

optional grouping variable; see DETAILS.

case.names

optional names to assign to the rows of the resulting data matrix. Should be a character vector of length ncases.

item.names

optional names to assign to the columns of the resulting data matrix. Should be a character vector of length nitems.

...

additional arguments to be passed to the methods.

Details

This generic method may be called in two ways. One way is to supply the parameters of a latent-class model as the first argument. The parameters are arranged as a list with two named components, rho and gamma, containing item-response probabilities and class prevalences, respectively. The component rho should be an array of dimension c(nitems,maxlevs,nclass,ngroups), where nitems is the number of response variables, maxlevs is the maximum number of levels (distinct response categories) among the response variables, nclass is the number of latent classes, and ngroups is the number of groups (equal to 1 if groups is not supplied). The element rho[j,k,c,g] is the probability that an individual in group g and class c supplies a response of k to item j. The component gamma should be a matrix with nclass rows and ngroups columns, with gamma[c,g] containing the prevalence of class c within group g. If ngroups=1, the last dimension of the parameter arrays may be dropped, so that rho may have dimension c(nitems,maxlevs,nclass) and gamma may be a vector of length nclass.

The second way to call this method is to supply as the first argument an object of class "lca", which is the result of a call to lca. In this case, a data matrix will be generated with the same dimensions as the response data (the matrix on the left-hand side of the model formula) in the original lca call, with the same number of classes as in the original model, and parameters equal to the final estimates from that model.

This function uses its own internal random number generator which is seeded by two integers, for example, seeds=c(123,456), which allows results to be reproduced in the future. If seeds=NULL then the function will seed itself with two random integers from R. Therefore, results can also be made reproducible by calling set.seed beforehand and taking seeds=NULL.

The groups variable, if present, should be integers coded as 1,2,...,ngroups, where ngroups is the number of distinct groups. The groups variable may also be a factor. If groups=NULL, then ngroups is taken to be 1.

Value

a matrix of integers of dimension c(ncases,nitems) containing a random sample of responses from the specified latent-class model.

Author(s)

Joe Schafer

Send questions to mchelpdesk@psu.edu

See Also

lca

Examples

# Set up parameters for a two-class model with four binary
# items and strong measurement.  Members of the first class, which
# comprises 40% of the population, have a high probability of
# endorsing (providing a response of 1 to) items 1 and 2.
# Members of the second class, which comprises 60% of the
# population, have a high probability of endorsing items 3 and 4.
rho <- array(NA, c(4,2,2) )
rho[,1,1] <- c(.9,.9,.1,.1)
rho[,2,1] <- 1 - rho[,1,1]
rho[,1,2] <- c(.1,.1,.9,.9)
rho[,2,2] <- 1 - rho[,1,2]
param <- list( rho=rho, gamma=c(.4,.6) )

# generate a sample of N=1000 observations, and
# fit the two-class model to the simulated data
set.seed(124)
Y <- lca.datasim( param, nlevs=c(2,2,2,2), ncases=1000)
fit <- lca( Y~1 )
summary(fit)

[Package lcca version 2.0.0 Index]