Programming Tips The Bootstrap Method

Author: Ruurd Bennink - Sr. Analist/Programmer

Idea behind the bootstrap method is to find an as accurate estimate as possible of the standard error of the mean (SE) when only limited data are available.

More detailed information about the bootstrap method can be found via this link. https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

For example if an interim analysis is performed with just 10 subjects with primary parameter a change from baseline. Then there is not enough information to determine accurate estimates for the standard error of the mean. The bootstrap method picks randomly 10 observations from this dataset with replacement and repeats that 10, 20, 100 or 1000 times. The more often this is performed, the better estimates are possible for the standard error of the mean.

Below the actual SAS program with comments included:

/* Create a datasets _ORIG with 10 random numbers */

DATA _orig;

DROP j;

DO j = 1 TO 10;

x = Ranuni(1); /* Random numbers ranging from 0 to 1 from the uniform

distribution. Other distributions e.g. the normal

distribution are also fine. Because the seed > 0 each

run will create an identical dataset as the previous run */

z = j; /* Marker for the jth observation. This makes it easier to

identify which observations occur more often in the dataset

BOOT&j in the next datastep */

OUTPUT;

END;

RUN;

%macro bootstrap;

%do j=1 %to 1000; /* Create 1000 datasets with 10 samples with

replacement from dataset _ORIG */

DATA boot&j;

/* Pick 10 times randomly a number with replacement

from the dataset _ORIG */

DO obsi = 1 TO 10;

k = Ceil(ranuni(0)*10);

/* Use the CEIL function. If the Round function is used, also 0

would be a possible outcome and the 0th observation does not exist,

but no ERROR/WARNING will appear in the SAS log! Besides that the

10th observation will then have a probability of 5% to be selected,

which makes the selection less random.

Using seed = 0 means that the time of the day is used to initialize

the seed stream. If a seed > 0 is used all datasets BOOT&j will be

identical!

Here use only the uniform distribution to make sure that the

observation numbers 1 to 10 have equal probability to be selected. */

SET _orig Point=k;

/* The POINT option points to the kth observation to be selected.

Because k is a random number ranging from 1 to 10 every time the

'Obsi' loop starts, some numbers may appear more than once,

which reflects the resampling 'with replacement' element of the

Bootstrap method. */

seqnum = &j; /* To identify the dataset, later to be used as BY

variable for e.g. PROC MEANS */

OUTPUT;

END;

STOP; /* Mandatory for POINT= option */

RUN;

%end;

%mend bootstrap;

%bootstrap;

As a next step a PROC MEANS can be used to estimate the mean for each dataset BOOT&j and based on those means calculate the standard deviation of the mean. Another possibility is to append the datasets BOOT1 to BOOT1000 and use PROC MEANS with a BY statement, BY seqnum;

DATA total;

%macro append;

SET %do n=1 %to 1000; BOOT&n %end;

;

%mend append;

%append;

RUN;

The Bootstrap Method in SAS in a nutshell