| Title: | Block Designs for Observational Studies |
|---|---|
| Description: | Creates block designs of fixed size J with at least one treated and control unit per block. Blocks larger than pairs better distinguish effects caused by a treatment from unmeasured confounding in assignment of individuals to treatment. Somewhat counterintuitively, blocks larger than pairs can use more units while attaining better covariate balance and block homogeneity. A forthcoming manuscript by Brumberg and Rosenbaum details the design. |
| Authors: | Katherine Brumberg [aut, cre] (ORCID: <https://orcid.org/0000-0002-5193-6250>), Paul Rosenbaum [aut] |
| Maintainer: | Katherine Brumberg <[email protected]> |
| License: | GPL-2 |
| Version: | 1.0.0 |
| Built: | 2026-05-11 07:20:45 UTC |
| Source: | https://github.com/kkbrum/observationalblocks |
Computes balance diagnostics for a specified covariate in the output of
blockMatch. Compares treated vs control means before and after
matching, standardized differences, and within-block homogeneity.
balEq(vname, o, detail = FALSE)balEq(vname, o, detail = FALSE)
vname |
Character string naming the variable to assess (must be a column
in both |
o |
A list containing |
detail |
Logical. If |
If detail = FALSE, a 1-row matrix with columns:
Mean for treated and control before matching
Equally weighted averages of within-block treated or control means after matching
Raw difference (T mean - C mean) before and after
Standardized difference of means before and after;
for comparability, both use the pooled standard deviation of vname
in the full sample before matching, where the pooling equally weights the
treated and control groups
Median and 90th percentile of within-block means of pairwise absolute differences
Percent of blocks with within-block mean pairwise difference of 0
If detail = TRUE, a list with balance (that matrix), y,
z, and d.
#' data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female) out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4) balEq("age", out)#' data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female) out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4) balEq("age", out)
Compute distance matrix for matching
basicDistance( dat, xm = NULL, near = NULL, xinteger = NULL, prc.penalty = 1000, near.penalty = 100, integer.penalty = 20, compute_distance = TRUE )basicDistance( dat, xm = NULL, near = NULL, xinteger = NULL, prc.penalty = 1000, near.penalty = 100, integer.penalty = 20, compute_distance = TRUE )
dat |
A data frame with |
xm |
A numeric matrix or data frame with |
near |
A numeric vector of length |
xinteger |
A numeric vector of length |
prc.penalty |
A single finite positive number: penalty for propensity
score stratum ( |
near.penalty |
Nonnegative penalties for |
integer.penalty |
Nonnegative penalties for |
compute_distance |
If |
This function borrows much of its functionality from the package 'iTOS'.
Documentation for 'iTOS' functions addNearExact, addinteger, addMahal
could prove helpful.
A list with components:
dat |
The input data frame with column |
cost |
The cost/distance matrix for matching (rows = treated, cols = control),
or |
#' data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female)#' data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female)
Creates blocks of fixed size J with at least one control and one treated. Within each stratum, the function chooses a matching strategy based on the treated-to-control ratio: direct matching when one group dominates, or a two-stage seed-and-add approach when groups are more balanced.
blockMatch(dat, cost, J = 4, ratio = 4, solver = "rlemon", rseed = 12345)blockMatch(dat, cost, J = 4, ratio = 4, solver = "rlemon", rseed = 12345)
dat |
A data frame with |
cost |
Distance matrix: one row per treated unit and one column
per control, with |
J |
Target number of individuals per matched block. Each block has at least one control and at least one treated. |
ratio |
Minimum matching ratio, greater than or equal to |
solver |
Either |
rseed |
Single finite number.
Fix |
A list with components:
m |
A data frame of the matched sample, with columns |
all |
The full |
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24(2), 295–313.
data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female) out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4) table(out$all$matched, out$all$hepaA)data(Hpylori) df <- Hpylori[sample(1:nrow(Hpylori), 1000), ] pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07)) df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE)) df$z <- df$hepaA bd <- basicDistance(df, near = df$female) out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4) table(out$all$matched, out$all$hepaA)
Solves an integer program when there are nt treated and nc
control units. The smaller group is exhausted (all of those units are
placed in blocks). Subject to that, the linear program
maximizes units from the larger group.
blockSizes(nt, nc, J)blockSizes(nt, nc, J)
nt |
Number of treated units. |
nc |
Number of control units. |
J |
Block size (number of units per matched block). |
This function reproduces some calculations in Section 4 of the forthcoming paper “Constructing Observational Block Designs When the Propensity Score Exhibits Limited Overlap" by Brumberg and Rosenbaum.
If either nt or nc is 0, or if
nt + nc < J, a warning is issued and the function returns a degenerate
result with zero blocks and zero counts.
A list with components:
detail |
Named vector with |
counts |
Named integer vector of length |
blockSizes(nt = 2, nc = 10, J = 5) blockSizes(nt = 10, nc = 2, J = 5) blockSizes(nt = 6, nc = 6, J = 5)blockSizes(nt = 2, nc = 10, J = 5) blockSizes(nt = 10, nc = 2, J = 5) blockSizes(nt = 6, nc = 6, J = 5)
Motivated by the study by Bui et al. (2016), these data from NHANES 1999-2000 concern evidence about the possible fecal-oral transmission of Helicobacter Pylori.
data(Hpylori)data(Hpylori)
A data frame with observations (age >= 3, complete cases on key variables) on the following 11 variables.
SEQNNHANES id number
female1 if female, 0 if male
ageAge in years
educationEducation level. Ordered factor with levels
<9 < 9-11 < HS/GED < SomeCol < College <
Age<20
incomeFamily income relative to poverty. Ordered factor with
levels <2, >=2, Missing
black1 if black, 0 otherwise
hispanic1 if hispanic, 0 otherwise
bornCountry of birth. Ordered factor with levels
US < Mexico < Other
peopleroom11 if people per room > 1, 0 otherwise
hepaAHepatitis A antibody, 1 if positive, 0 if negative
helioBPHelicobacter pylori.
Does oral consumption of fecal matter – perhaps because someone prepared food without washing their hands – cause infection with Helicobacter Pylori, a type of bacteria that infects the stomach and may cause peptic ulcers or gastric cancer? It is difficult to study this question, because there is no record of incidents in which small amounts of fecal matter were ingested. It is known that hepatitis A virus is mostly transmitted by the fecal-oral route. Following prior studies, Bui et al. (2016) used antibodies for hepatitis A as an indicator of a higher level of ingestion of fecal matter, and examined its relationship with Helicobacter pylori, adjusting for possible confounders, such as age, country of birth, or a crowded home.
NHANES, US National Health and Nutrition Examination Survey, 1999-2000. https://wwwn.cdc.gov/nchs/nhanes/
Bui, D., Brown, H. E., Harris, R. B. and Oren, E. (2016) Serologic evidence for fecal–oral transmission of Helicobacter pylori. The American Journal of Tropical Medicine and Hygiene, 94(1), 82–88. doi:10.4269/ajtmh.15-0297 https://pmc.ncbi.nlm.nih.gov/articles/PMC4710451/
data(Hpylori) boxplot(Hpylori$helioBP ~ Hpylori$hepaA)data(Hpylori) boxplot(Hpylori$helioBP ~ Hpylori$hepaA)