A Data Science Central Community
Hi all!!
I would like to know whether anybody has simulated in SAS.. I am trying to simulate a data set with 4 varaibles (3 are categorical) taking into account the correlation among them. My problem is how to consider the correlation...
Many thanks!
Tags:
Sorry my answer is a bit late but hope other people find it helpful.
/* This program generates 300 observations of the variables y_{1}, y_{2}, y_{3}, y_{4} . Beginning with the correlation matrix R and a vector of means m = (m_{1},m_{2},m_{3},m_{4})' and standard deviations s = (s_{1},s_{2},s_{3},s_{4})' read in as variables using a CARDS statement.
The means were arbitrarily selected as those in Liu and Gould 2002. The standard deviation vector and the correlation matrix were obtained from the data used by Allison et al (2003).*/
data MVN_par; /* data for the parameter for the multivariate normal data*/
input r_{1} r_{2} r_{3} r_{4} means vars ; /* these can be adjusted to cater for the other two scenarios*/
cards;
1 0.986 0.967 0.949 92 14.3 /* These can be extended to an n by n correlation
0.986 1 0.992 0.980 88 14 matrix and n-dimensional means and standard
0.967 0.992 1 0.995 85 14.2 deviations vectors
run;
proc iml;
use MVN_par;
read all var {r_{1} r_{2} r_{3} r_{4}} into R;
read all var {means} into mu;
read all var {vars} into sigma;
p = ncol(R); /* p is the number of variables generated*/
diag_sig = diag(sigma);
DRD = diag_sig * R * diag_sig` ; /* D is the a diagonal matrix whose element are
the standard deviations of each y_{i} */
U = half(DRD);
do i = 1 to 300; /* this can be replaced with k to generate the
dataset with k patients. */
z = rannor(j(p,1,1234)); /* Generating random numbers. Z_{i} i = 1,2,...,p
independent and have N(0,1)distribution.
The var-cov matrix for Z' is the identity matrix.*/
yprime = y`;
yall = yall // yprime;
end;
varnames = {y_{1} y_{2} y_{3} y_{4}}; /* naming the variables*/
create my_MVN from yall (|colname = varnames|);
append from yall;
proc print data = my_MVN;