# AnalyticBridge

A Data Science Central Community

Hi all!!

I would like to know whether anybody has simulated in SAS.. I am trying to simulate a data set with  4 varaibles (3 are categorical) taking into account the correlation among them. My problem is how to consider the correlation...

Many thanks!

Views: 476

### Replies to This Discussion

create crosstab (proc freq) and consider CHISQ.
dont forget to discretize numeric variables (proc rank groups=5; ...) .
here's example from sas doc:
proc freq data=Color order=data;
tables Eyes*(Hair Sex Country) / expected cellchi2 norow nocol chisq;
output out=ChiSqData n nmiss pchi lrchi;
weight Count;
title 'Chi-Square Tests for 3 by 5 Table of Eye and Hair Color and...';
run;

Sorry my answer is a bit late but hope other people find it helpful.

/*  This program generates 300 observations of the variables y1, y2, y3, y4 .  Beginning with the correlation matrix R and a vector of means m = (m1,m2,m3,m4)' and standard deviations s = (s1,s2,s3,s4)' read in as variables using a CARDS statement.

The means were arbitrarily selected as those in Liu and Gould 2002. The standard deviation vector and the correlation matrix were obtained from the data used by Allison et al (2003).

*/

data   MVN_par;                        /* data for the parameter for the multivariate normal data*/

input  r1 r2 r3 r4 means vars ;   /* these can be adjusted to cater for the other two scenarios*/

cards;

1     0.986   0.967   0.949   92  14.3            /* These can be extended to an  n by n correlation

0.986  1      0.992   0.980   88  14                  matrix and n-dimensional means and standard

0.967  0.992   1      0.995   85  14.2              deviations vectors

run;

proc iml;

use MVN_par;

read all var {r1 r2 r3 r4} into R;

read all var {means}       into mu;

read all var {vars}        into sigma;

p = ncol(R);                                            /* p is the number of variables generated*/

diag_sig = diag(sigma);

DRD = diag_sig * R * diag_sig` ;         /* D is the a diagonal matrix whose element are

the standard deviations of each yi */

U = half(DRD);

do i = 1 to 300;                                      /*  this can be replaced with k to generate the

dataset with k patients. */

z = rannor(j(p,1,1234));                        /* Generating random numbers. Zi i = 1,2,...,p

independent and have N(0,1)distribution.

The var-cov matrix for Z' is the identity matrix.*/

yprime = y`;

yall = yall // yprime;

end;

varnames = {y1 y2 y3 y4};                      /* naming the variables*/

create my_MVN from yall (|colname = varnames|);

append from yall;

proc print  data = my_MVN;