Title: | Obtain Unweighted Natural Strata that Balance Many Covariates |
---|---|
Description: | Natural strata can be used in observational studies to balance the distributions of many covariates across any number of treatment groups and any number of comparisons. These strata have proportional amounts of units within each stratum across the treatments, allowing for simple interpretation and aggregation across strata. Within each stratum, the units are chosen using randomized rounding of a linear program that balances many covariates. To solve the linear program, the 'Gurobi' commercial optimization software is recommended, but not required. The 'gurobi' R package can be installed following the instructions at <https://www.gurobi.com/documentation/9.1/refman/ins_the_r_package.html>. |
Authors: | Katherine Brumberg [aut, cre] |
Maintainer: | Katherine Brumberg <[email protected]> |
License: | GPL-3 |
Version: | 2.0.0 |
Built: | 2025-03-10 05:37:12 UTC |
Source: | https://github.com/kkbrum/natstrat |
Reports standardized differences in means between the treated and control group before and after choosing a subset of controls. These differences are reported both across strata and within strata. This function can also generate love plots of the same quantities.
check_balance( z, X, st, selected, treated = 1, control = 0, denom_variance = "treated", plot = FALSE, message = TRUE )
check_balance( z, X, st, selected, treated = 1, control = 0, denom_variance = "treated", plot = FALSE, message = TRUE )
z |
a factor with the |
X |
a data frame containing the covariates in the columns over which balance is desired. The number
of rows should equal the length of |
st |
a stratum vector with the |
selected |
a boolean vector including whether each unit was selected as part of the treated and control
groups for analysis. Should be the same length as |
treated |
which treatment value should be considered the treated units. This
must be one of the values of |
control |
which treatment value should be considered the control units. This
must be one of the values of |
denom_variance |
character stating what variance to use in the standardization:
either the default "treated", meaning the standardization will use the
treated variance (across all strata), where the treated group is declared in
the |
plot |
a boolean denoting whether to generate love plots for the standardized differences. |
message |
a boolean denoting whether to print a message about the level of balance achieved |
List containing:
matrix with one row per covariate and two columns: one for the standardized difference before a subset of controls were selected and one for after.
matrix similar to sd_across
, but with separate
standardized differences for each stratum for each covariate.
matrix similar to sd_across
, but taking the
average of the standardized differences within the strata, weighted by stratum size.
ggplot object plotting sd_across
, only exists if plot = TRUE
.
a named list of ggplot objects plotting sd_strata
,
one for each stratum, only exists if plot = TRUE
.
ggplot object plotting sd_strata_avg
, only exists if plot = TRUE
.
ggplot object with two facets displaying sd_across
and
sd_strata_avg
with one y-axis, only exists if plot = TRUE
.
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Choose one control for every treated unit in each stratum, # balancing the covariates as described by the constraints results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, ratio = 1) cov_data <- nh0506[, c('sex', 'age', 'race', 'education', 'povertyr', 'bmi')] # Check balance stand_diffs <- check_balance(z = nh0506$z, X = cov_data, st = strata, selected = results$selected, plot = TRUE)
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Choose one control for every treated unit in each stratum, # balancing the covariates as described by the constraints results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, ratio = 1) cov_data <- nh0506[, c('sex', 'age', 'race', 'education', 'povertyr', 'bmi')] # Check balance stand_diffs <- check_balance(z = nh0506$z, X = cov_data, st = strata, selected = results$selected, plot = TRUE)
Create a distance matrix between strata levels created from the interactions
of factors. Used as input to generate_qs()
.
create_dist_matrix(...)
create_dist_matrix(...)
... |
any number of matrices that contain the distances between levels of a single stratifying factor. These should have both column and row names which correspond to the levels of the stratifying factor. |
Matrix containing the distances between all levels of the factor of all interactions between the inputted factors. The row and column names correspond to the levels of the strata, formed by combining the level name of each stratifying factor separated with ':'.
data('nh0506') age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) age_dist <- matrix(data = c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3, byrow = TRUE, dimnames = list(levels(age_cat), levels(age_cat))) sex_dist <- matrix(data = c(0, 1, 1, 0), nrow = 2, dimnames = list(levels(nh0506$sex), levels(nh0506$sex))) strata_dist <- create_dist_matrix(age_dist, sex_dist)
data('nh0506') age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) age_dist <- matrix(data = c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3, byrow = TRUE, dimnames = list(levels(age_cat), levels(age_cat))) sex_dist <- matrix(data = c(0, 1, 1, 0), nrow = 2, dimnames = list(levels(nh0506$sex), levels(nh0506$sex))) strata_dist <- create_dist_matrix(age_dist, sex_dist)
This function generates constraints that encourage covariate balance as specified.
The main inputs are formula like objects, where the left hand side indicates
the covariate to be balanced and the right hand side indicates the
groups within which to balance. The constraints are
weighted and standardized by stand()
to be used in optimize_controls()
. Missingness
indicators can also be added and weighted for any covariate that has NA
values.
generate_constraints( balance_formulas, z, data, default_rhs = NULL, weight_by_size = 0, denom_variance = "treated", treated = 1, autogen_missing = 50 )
generate_constraints( balance_formulas, z, data, default_rhs = NULL, weight_by_size = 0, denom_variance = "treated", treated = 1, autogen_missing = 50 )
balance_formulas |
a list of formulas where the left hand side represents the covariate to be balanced, and the terms on the right hand side represent the populations within which the covariate should be balanced. More information can be found in the details below. |
z |
a factor with the |
data |
a data frame containing the relevant covariates in the columns. The number
of rows should equal the length of |
default_rhs |
the list of |
weight_by_size |
numeric between 0 and 1 stating how to adjust constraints for the size of the population they represent. Default is 0, meaning imbalance within populations is viewed in absolute terms, not relative to the population size. The program may thus prioritize balancing the covariate in larger populations compared to smaller populations. A value of 1 means that imbalance will be measured relative to the population's size, not in absolute terms, implying that it is equally important to balance in every population. |
denom_variance |
character stating what variance to use in the standardization:
either the default "treated", meaning the standardization will use the
treated variance (across all strata), where the treated group is declared in
the |
treated |
which treatment value should be considered the treated group. This
must be one of the values of |
autogen_missing |
whether to automatically generate missingness constraints
and how heavily to prioritize them. Should be a numeric
or |
A list with two named components:
X
a matrix with constraints as columns and the same number of rows as the inputs. The column names provide information about the constraints, including the covariate names and the factor and level to which it pertains.
importances
a named vector with names corresponding to the constraint names
and values corresponding to how heavily that constraint should be prioritized,
based on the information provided through balance_formulas
, weight_by_size
,
and autogen_missing
.
The balance_formulas
argument can include formulas beyond those interpreted
by R
to be formulas
. Their interpretation is also different, as
explained below:
The left hand side of the formula contains the covariate
to be balanced. It can also be the sum of multiple covariates, in which case
each term will be balanced individually according to the right hand side. Additionally,
'.' on the left hand side will designate that all covariates in data
should be balanced according to the designated or default right hand side
(as usual, terms may be subtracted to remove them).
The right hand side should be the sum of factor, character,
or boolean variables. The covariate of the left hand side will be balanced within
each level of each term on the right hand side. The right hand side can also
contain '.', meaning the covariate will be balanced across all levels of all
categorical, character, or boolean variables found in data
(as usual,
terms may be subtracted to remove them). In the most common case, the user
will have one term on the right hand side consisting of the strata within
which balance in desired.
The formulas can contain coefficients specifying how much to weight a certain set of constraints. Coefficients of the left hand side terms will weight all constraints generated for that covariate, and coefficients of the right hand side will weight the constraints generated for each level of that term.
The intercept term, 1, is automatically included on the right hand side of the formula, and designates that the covariate of the left hand side will be balanced across all control units. You may enter a different numeric > 0 that will signify how much to weight the constraint, or you may enter "- 1" or "+ 0" to remove the intercept and its associated constraint, as per usual.
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Balance age and race both across and within the levels of strata, # with balance for race being prioritized twice as much as for age, # and balance across strata prioritized twice as much as within. # Balance education across and within strata, # with balance within strata prioritized twice as much as across. # Balance poverty ratio and bmi only within the levels of strata, # as specified in the default_rhs argument constraints <- generate_constraints( balance_formulas = list(age + 2 * race ~ 2 + strata, education ~ 1 + 2 * strata, 'povertyr', 'bmi'), z = nh0506$z, data = nh0506, default_rhs = '0 + strata')
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Balance age and race both across and within the levels of strata, # with balance for race being prioritized twice as much as for age, # and balance across strata prioritized twice as much as within. # Balance education across and within strata, # with balance within strata prioritized twice as much as across. # Balance poverty ratio and bmi only within the levels of strata, # as specified in the default_rhs argument constraints <- generate_constraints( balance_formulas = list(age + 2 * race ~ 2 + strata, education ~ 1 + 2 * strata, 'povertyr', 'bmi'), z = nh0506$z, data = nh0506, default_rhs = '0 + strata')
Figure out how many units to take from each stratum when some strata are deficient.
The result should be used as an input to optimize_controls()
.
generate_qs( z, st, ratio, treated = 1, max_ratio = NULL, max_extra_s = 5, strata_dist = NULL )
generate_qs( z, st, ratio, treated = 1, max_ratio = NULL, max_extra_s = 5, strata_dist = NULL )
z |
a factor with the |
st |
a stratum vector with the |
ratio |
a numeric or vector specifying the desired ratio of controls to 'treated' in
each stratum. If there is one control group and all treated units should be included,
this can be a numeric. Otherwise, this should be
a vector with one entry per treatment group, in the same order as the levels of
|
treated |
which treatment value should be considered the treated units. This
must be one of the values of |
max_ratio |
a numeric or vector specifying the maximum ratio to allow in a stratum to achieve
the overall |
max_extra_s |
single numeric or named vector or matrix with values corresponding to the maximum desired number
of extra controls to be chosen from each stratum to achieve the overall
|
strata_dist |
matrix with both row and column names with names corresponding
to the stratum values from |
A named vector stating how many controls to take from each stratum.
NHANES 2005-2006 data on smoking and homocysteine levels in adults.
nh0506
nh0506
A data frame with 2928 rows and 11 variables:
NHANES identification number.
smoking status treatment indicator: 1 = daily smoker, 0 = never smoker.
factor with levels "Male" and "Female".
age in years, 20-85, with 85 recorded for everyone >= 85 years.
factor with levels "Mexican American", "Other Hispanic", "Non-Hispanic White", "Non-Hispanic Black", and "Other Race - Including Multi-Racial".
factor with levels "< Grade 9", "9-11th grade", "High school grad/GED", "Some college or AA degree", "College graduate or above".
ratio of family income to the poverty level, capped at 5 times poverty, has missing entries.
BMI (body mass index), has missing entries.
cigarettes smoked per day, 0 for never smokers.
blood cotinine level, a biomarker of recent exposure to tobacco.
homocysteine level.
The code used to generate this data is documented in the source version of this package under 'data-raw/'. This data is composed of adults aged at least 20 years. Individuals who have smoked at least 100 cigarettes but do not now smoke at least 10 cigarettes daily are excluded. Individuals with missing homocysteine values, cotinine values, or smoking information are excluded. After filtering for all these criteria, one individual with unknown education remains and is also excluded. Missing values remain in the poverty ratio and bmi covariates.
https://wwwn.cdc.gov/nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear=2005
data('nh0506')
data('nh0506')
NHANES 2005-2006 data on smoking and homocysteine levels in adults, comparing daily smokers to never smokers and occasional smokers.
nh0506_3groups
nh0506_3groups
A data frame with 4457 rows and 11 variables:
NHANES identification number.
smoking status treatment factor: 0 = never smoker, 1 = some smoking, 2 = daily smoker.
factor with levels "Male" and "Female".
age in years, 20-85, with 85 recorded for everyone >= 85 years.
factor with levels "Mexican American", "Other Hispanic", "Non-Hispanic White", "Non-Hispanic Black", and "Other Race - Including Multi-Racial".
factor with levels "< Grade 9", "9-11th grade", "High school grad/GED", "Some college or AA degree", "College graduate or above".
ratio of family income to the poverty level, capped at 5 times poverty, has missing entries.
BMI (body mass index), has missing entries.
cigarettes smoked per day, 0 for never smokers.
blood cotinine level, a biomarker of recent exposure to tobacco.
homocysteine level.
The code used to generate this data is documented in the source version of this package under 'data-raw/'. This data is composed of adults aged at least 20 years. Individuals who have smoked at least 100 cigarettes but do not now smoke at least 10 cigarettes daily are excluded. Individuals with missing homocysteine values, cotinine values, or smoking information are excluded. After filtering for all these criteria, five individuals with unknown education remain and are also excluded. Missing values remain in the poverty ratio and bmi covariates.
https://wwwn.cdc.gov/nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear=2005
data('nh0506_3groups')
data('nh0506_3groups')
Select control units within strata that optimize covariate balance. Uses randomized rounding of a linear program or a mixed integer linear program.
optimize_controls( z, X, st, importances = NULL, treated = 1, ratio = NULL, q_s = NULL, treated_star = NULL, q_star_s = NULL, weight_star = 1, integer = FALSE, solver = "Rglpk", seed = NULL, runs = 1, time_limit = Inf, threads = 1, correct_sizes = TRUE, low_memory = FALSE )
optimize_controls( z, X, st, importances = NULL, treated = 1, ratio = NULL, q_s = NULL, treated_star = NULL, q_star_s = NULL, weight_star = 1, integer = FALSE, solver = "Rglpk", seed = NULL, runs = 1, time_limit = Inf, threads = 1, correct_sizes = TRUE, low_memory = FALSE )
z |
a factor with the |
X |
a matrix or data frame containing constraints in the columns. The number
of rows should equal the length of |
st |
a stratum vector with the |
importances |
a vector with length equal to the number of constraints or columns
in |
treated |
which treatment value should be considered the treated units. This
must be one of the values of |
ratio |
a numeric or vector specifying the desired ratio of controls to 'treated' in
each stratum. If there is one control group and all treated units should be included,
this can be a numeric. Otherwise, this should be
a vector with one entry per treatment group, in the same order as the levels of
|
q_s |
a named vector or matrix indicating how many units are to be selected from each stratum.
If there is one control group and all treated units are desired, this can be a vector; otherwise,
this should have one row per treatment group, where the order of the rows matches the order of
the levels of |
treated_star |
which treatment value should be considered the treated units
for the supplemental comparison. This
must be one of the values of |
q_star_s |
a named vector or matrix,
indicating how many supplemental units are to be selected from each stratum.
The matrix should have one row per treatment group, where the order of the rows matches the order of
the levels of |
weight_star |
a numeric stating how much to prioritize balance between the supplemental units as compared to balance between the main units. If multiple supplemental comparisons are desired, this should be a vector with one entry per supplemental comparison. |
integer |
a logical stating whether to use a mixed integer programming solver
instead of randomized rounding. Default is |
solver |
a character stating which solver to use to run the linear program. Options are "Rglpk" (default) or "gurobi". You must have the 'gurobi' package installed to use the "gurobi" option. If available, this is the recommended solver. |
seed |
the seed to use when doing the randomized rounding of the linear program.
This will allow results to be reproduced if desired. The default is |
runs |
the number of times to run randomized rounding of the linear solution. The objective values of all runs will be reported, but the detailed results will only be reported for the run with the lowest objective value. The default is 1. |
time_limit |
numeric stating maximum amount of seconds for which the
program is allowed to run before aborting. Default is |
threads |
The maximum number of threads that should be used. This is only
applicable if |
correct_sizes |
boolean stating whether the desired sample sizes should
be exactly correct (if |
low_memory |
boolean stating whether some outputs should not be included
due to the scale of the problem being too large compared to memory space.
If |
List containing:
objective
objective value of the randomized rounding or mixed integer linear program solution.
objective_wo_importances
objective value of the randomized rounding or mixed integer linear program solution not weighted by the variable importances.
eps
the amount of imbalance obtained in each constraint from the linear program. The row names specify the covariate, the population of interest, and, if there are more than two comparison groups, which groups are being compared.
eps_star
same as eps
but for the supplemental units instead of the units
in the main comparison. If there are multiple supplemental comparisons, this is a list.
If there are none, this is NULL
.
importances
the importance of each on the balance constraints.
weight_star
the importance of balancing in the supplemental comparison
relative to the main comparison. If there are multiple supplemental comparisons,
this is a vector. If there are none, this is NULL
.
selected
whether each unit was selected for the main comparison.
selected_star
whether each unit was selected for the supplement.
If there are multiple supplemental comparisons, this is a list.
If there are none, this is NULL
.
pr
the linear program weight assigned to each unit for the main comparison.
pr_star
the linear program weight assigned to each unit for the supplement.
If there are multiple supplemental comparisons, this is a list.
If there are none, this is NULL
.
rrdetails
A list containing:
seed
the seed used before commencing the random sampling.
run_objectives
the objective values for each run of randomized rounding.
run_objectives_wo_importances
the objective values for each run of randomized rounding, not scaled by constraint importances.
lpdetails
the full return of the function Rglpk_solve_LP()
or gurobi()
plus information about the epsilons and objective values
for the linear program solution.
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Choose one control for every treated unit in each stratum, # balancing the covariates as described by the constraints results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, ratio = 1) # If you want to use a ratio that's not feasible, # you can supply a vector of the desired number of controls per stratum, q_s, # typically generated by creating a distance matrix between strata and using # generate_qs(): ## Not run: age_dist <- matrix(data = c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3, byrow = TRUE, dimnames = list(levels(age_cat), levels(age_cat))) sex_dist <- matrix(data = c(0, 1, 1, 0), nrow = 2, dimnames = list(levels(nh0506$sex), levels(nh0506$sex))) strata_dist <- create_dist_matrix(age_dist, sex_dist) qs <- generate_qs(z = nh0506$z, st = strata, ratio = 2.5, max_ratio = 2.6, max_extra_s = 0, strata_dist = strata_dist) results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, q_s = qs) ## End(Not run) # We can also have multiple treatment and control groups, # as well as multiple simultaneous comparisons: ## Not run: data('nh0506_3groups') strata2 <- cut(nh0506_3groups$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) constraints2 <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi + sex ~ 1 + strata2), z = nh0506_3groups$z, data = nh0506_3groups, treated = 'daily smoker') q_star_s <- matrix(c(rep(table(nh0506_3groups$z, strata2)['some smoking', ] - table(nh0506_3groups$z, strata2)['daily smoker', ], 2), rep(0, 3)), byrow = TRUE, nrow = 3, dimnames = list(levels(nh0506_3groups$z), levels(strata2))) results <- optimize_controls(z = nh0506_3groups$z, X = constraints2$X, importances = constraints2$importances, st = strata2, ratio = 1, treated = 'daily smoker', treated_star = 'some smoking', q_star_s = q_star_s, correct_sizes = FALSE) ## End(Not run)
data('nh0506') # Create strata age_cat <- cut(nh0506$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) strata <- age_cat : nh0506$sex # Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata constraints <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata), z = nh0506$z, data = nh0506) # Choose one control for every treated unit in each stratum, # balancing the covariates as described by the constraints results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, ratio = 1) # If you want to use a ratio that's not feasible, # you can supply a vector of the desired number of controls per stratum, q_s, # typically generated by creating a distance matrix between strata and using # generate_qs(): ## Not run: age_dist <- matrix(data = c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3, byrow = TRUE, dimnames = list(levels(age_cat), levels(age_cat))) sex_dist <- matrix(data = c(0, 1, 1, 0), nrow = 2, dimnames = list(levels(nh0506$sex), levels(nh0506$sex))) strata_dist <- create_dist_matrix(age_dist, sex_dist) qs <- generate_qs(z = nh0506$z, st = strata, ratio = 2.5, max_ratio = 2.6, max_extra_s = 0, strata_dist = strata_dist) results <- optimize_controls(z = nh0506$z, X = constraints$X, st = strata, importances = constraints$importances, q_s = qs) ## End(Not run) # We can also have multiple treatment and control groups, # as well as multiple simultaneous comparisons: ## Not run: data('nh0506_3groups') strata2 <- cut(nh0506_3groups$age, breaks = c(19, 39, 50, 85), labels = c('< 40 years', '40 - 50 years', '> 50 years')) constraints2 <- generate_constraints( balance_formulas = list(age + race + education + povertyr + bmi + sex ~ 1 + strata2), z = nh0506_3groups$z, data = nh0506_3groups, treated = 'daily smoker') q_star_s <- matrix(c(rep(table(nh0506_3groups$z, strata2)['some smoking', ] - table(nh0506_3groups$z, strata2)['daily smoker', ], 2), rep(0, 3)), byrow = TRUE, nrow = 3, dimnames = list(levels(nh0506_3groups$z), levels(strata2))) results <- optimize_controls(z = nh0506_3groups$z, X = constraints2$X, importances = constraints2$importances, st = strata2, ratio = 1, treated = 'daily smoker', treated_star = 'some smoking', q_star_s = q_star_s, correct_sizes = FALSE) ## End(Not run)
This function is used by generate_constraints()
to standardize
covariate vectors to become balance constraints.
The function divides the covariate values by the treated or average group
standard deviation (across strata).
stand(z, x, denom_variance = "treated", treated = 1, autogen_missing = 50)
stand(z, x, denom_variance = "treated", treated = 1, autogen_missing = 50)
z |
a factor with the |
x |
a covariate vector with |
denom_variance |
character stating what variance to use in the standardization:
either the default "treated", meaning the standardization will use the
treated variance (across all strata), where the treated group is declared in
the |
treated |
which treatment value should be considered the treated units. This
must be one of the values of |
autogen_missing |
whether to automatically generate missingness constraint
and how heavily to prioritize it. Should be a numeric
or |
A list with two components:
a balance constraint for the standardized covariate values of all unites.
a corresponding balance constraint for the rate of missingness if
autogen_missing
not NULL
, otherwise NULL
.