Subscribe to DSC Newsletter


Hi all!!


I would like to know whether anybody has simulated in SAS.. I am trying to simulate a data set with  4 varaibles (3 are categorical) taking into account the correlation among them. My problem is how to consider the correlation...


Many thanks!

Views: 387

Replies to This Discussion

create crosstab (proc freq) and consider CHISQ.
dont forget to discretize numeric variables (proc rank groups=5; ...) .
here's example from sas doc:
proc freq data=Color order=data;
tables Eyes*(Hair Sex Country) / expected cellchi2 norow nocol chisq;
output out=ChiSqData n nmiss pchi lrchi;
weight Count;
title 'Chi-Square Tests for 3 by 5 Table of Eye and Hair Color and...';

Sorry my answer is a bit late but hope other people find it helpful.


/*  This program generates 300 observations of the variables y1, y2, y3, y4 .  Beginning with the correlation matrix R and a vector of means m = (m1,m2,m3,m4)' and standard deviations s = (s1,s2,s3,s4)' read in as variables using a CARDS statement. 

The means were arbitrarily selected as those in Liu and Gould 2002. The standard deviation vector and the correlation matrix were obtained from the data used by Allison et al (2003). 



data   MVN_par;                        /* data for the parameter for the multivariate normal data*/

input  r1 r2 r3 r4 means vars ;   /* these can be adjusted to cater for the other two scenarios*/  


1     0.986   0.967   0.949   92  14.3            /* These can be extended to an  n by n correlation

0.986  1      0.992   0.980   88  14                  matrix and n-dimensional means and standard

0.967  0.992   1      0.995   85  14.2              deviations vectors



proc iml;

use MVN_par;

read all var {r1 r2 r3 r4} into R;

read all var {means}       into mu;

read all var {vars}        into sigma;

p = ncol(R);                                            /* p is the number of variables generated*/

diag_sig = diag(sigma);

DRD = diag_sig * R * diag_sig` ;         /* D is the a diagonal matrix whose element are

                                                                   the standard deviations of each yi */

U = half(DRD);

do i = 1 to 300;                                      /*  this can be replaced with k to generate the

                                                                  dataset with k patients. */


z = rannor(j(p,1,1234));                        /* Generating random numbers. Zi i = 1,2,...,p

                                                                  independent and have N(0,1)distribution.

                                                                  The var-cov matrix for Z' is the identity matrix.*/

yprime = y`;

yall = yall // yprime;


varnames = {y1 y2 y3 y4};                      /* naming the variables*/

create my_MVN from yall (|colname = varnames|);

append from yall;

proc print  data = my_MVN; 





Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2018 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service