Content Preview

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Odit molestiae mollitia laudantium assumenda nam eaque, excepturi, soluta, perspiciatis cupiditate sapiente, adipisci quaerat odio voluptates consectetur nulla eveniet iure vitae quibusdam? Excepturi aliquam in iure, repellat, fugiat illum voluptate repellendus blanditiis veritatis ducimus ad ipsa quisquam, commodi vel necessitatibus, harum quos a dignissimos.

Close Save changes

Keyboard Shortcuts

Help F1 or ? Previous Page ← + CTRL (Windows) ← + ⌘ (Mac) Next Page → + CTRL (Windows) → + ⌘ (Mac) Search Site CTRL + SHIFT + F (Windows) ⌘ + ⇧ + F (Mac) Close Message ESC

3.5 - Bias, Confounding and Effect Modification

Consider the figure below. If the true value is the center of the target, the measured responses in the first instance may be considered reliable, precise or as having negligible random error, but all the responses missed the true value by a wide margin. A biased estimate has been obtained. In contrast, the target on the right has more random error in the measurements, however, the results are valid, lacking systematic error. The average response is exactly in the center of the target. The middle target depicts our goal: observations that are both reliable (small random error) and valid (without systematic error).

Accuracy for a Sample Size of 5

Bias, confounding and effect modification in epidemiology Section

When examining the relationship between an explanatory factor and an outcome, we are interested in identifying factors that may modify the factor's effect on the outcome (effect modifiers). We must also be aware of potential bias or confounding in a study because these can cause a reported association (or lack thereof) to be misleading. Bias and confounding are related to the measurement and study design. Let 's define these terms:

Bias A systematic error in the design, recruitment, data collection or analysis that results in a mistaken estimation of the true effect of the exposure and the outcome.

Confounding A situation in which the effect or association between an exposure and outcome is distorted by the presence of another variable. Positive confounding (when the observed association is biased away from the null) and negative confounding (when the observed association is biased toward the null) both occur.

Effect modification a variable that differentially (positively and negatively) modifies the observed effect of a risk factor on disease status. Different groups have different risk estimates when effect modification is present

If the method used to select subjects or collect data results in an incorrect association, .

THINK >> Bias!

If an observed association is not correct because a different (lurking) variable is associated with both the potential risk factor and the outcome, but it is not a causal factor itself,

THINK >> Confounding!

If an effect is real but the magnitude of the effect is different for different groups of individuals (e.g., males vs females or blacks vs whites).

THINK >> Effect modification!

Bias Resulting from Study Design

Bias limits validity (the ability to measure the truth within the study design) and generalizability (the ability to confidently apply the results to a larger population) of study results. Bias is rarely eliminated during analysis. There are two major types of bias:

Selection bias: systematic error in the selection or retention of participants

Examples of selection bias in case-control studies:

Information bias (misclassification bias): Systematic error due to inaccurate measurement or classification of disease, exposure, or other variables.

Misclassification can be differential or non-differential.

Differential misclassification The probability of misclassification varies for the different study groups, i.e., misclassification is conditional upon exposure or disease status.

Are we more likely to misclassify cases than controls? For example, if you interview cases in-person for a long period of time, extracting exact information while the controls are interviewed over the phone for a shorter period of time using standard questions, this can lead to differential misclassification of exposure status between controls and cases.

Nondifferential misclassification The probability of misclassification does not vary for the different study groups; is not conditional upon exposure or disease status, but appears random. Using the above example, if half the subjects (cases and controls) were randomly selected to be interviewed by phone and the other half were interviewed in person, the misclassification would be nondifferential.

Either type of misclassification can produce misleading results.

Confounding and Confounders Section

Confounding: A situation in which a measure of association or relationship between exposure and outcome is distorted by the presence of another variable. Positive confounding (when the observed association is biased away from the null) and negative confounding (when the observed association is biased toward the null) both occur.

Confounder: an extraneous variable that wholly or partially accounts for the observed effect of a risk factor on disease status.. The presence of a confounder can lead to inaccurate results.

A confounder meets all three conditions listed below:

  1. It is a risk factor for the disease, independent of the putative risk factor.
  2. It is associated with putative risk factor.
  3. It is not in the causal pathway between exposure and disease.

The first two of these conditions can be tested with data. The third is more biological and conceptual.

Confounding masks the true effect of a risk factor on a disease or outcome due to the presence of another variable. We determine identify potential confounders from our:

  1. Knowledge
  2. Prior experience with data
  3. Three criteria for confounders

Example 3-6: Confounding Section

Hypothesis Diabetes is a positive risk factor for coronary heart disease

We survey patients as a part of the cross-sectional study asking whether they have coronary heart disease and if they are diabetic. We generate a 2 × 2 table (below):

Crude Diabetes- CHD association

Diabetes (Prevalent Diabetes) Frequency
Percent
Row Pct
Col Pct CHD (Prevalent Coronary Heart Disease) Total 0 1 0

91
3.56
3.89
77.78 2340
91.55

Prevalence Ratio:
\(PR=P_ / P_=12.0 / 3.9=3.10\)

Odds ratio \(= (2249 \times 26] /[91 \times 190]=3.38\)

'0' indicates those who do not have coronary heart disease, '1' is for those with coronary heart disease; similarly for diabetes, '0' is the absence, and '1' the presence of diabetes.

The prevalence of coronary heart disease among people without diabetes is 91 divided by 2340, or 3.9% of all people with diabetes have coronary heart disease. Similarly the prevalence among those with diabetes is 12.04%. Our prevalence ratio, considering whether diabetes is a risk factor for coronary heart disease is 12.04 / 3.9 = 3.1. The prevalence of coronary heart disease in people with diabetes is 3.1 times as great as it is in people without diabetes.

We can also use the 2 x 2 table to calculate an odds ratio as shown above:

( 2249 × 26) / ( 91 × 190) = 3.38

The odds of having diabetes among those with coronary heart disease is 3.38 times as high as the odds of having diabetes among those who do not have coronary heart disease.

Which of these do you use? They come up with slightly different estimates.

It depends upon your primary purpose. Is your purpose to compare prevalences? Or, do you wish to address the odds of dibetes as related to coronary health status?

Now, let's add hypertension as a potential confounder.

Ask: "Is hypertension a risk factor for CHD (among non-diabetics)?"

First of all, prior knowledge tells us that hypertension is related to many heart related diseases. Prior knowledge is an important first step but let's test this with data.

We consider the 2 × 2 table below:

Is hypertension a risk factor for CHD (among
non-diabetics)?
Statistics for a table of Hypert by CHD

Statistic DF Value Prob Chi-square 1 7.435 0.006 Likelihood Ratio Chi-square 1 6.998 0.008 Continuity Adj. Chi-square 1 6.811 0.009 Mantel- Haenszel Chi-square 1 7.432 0.006 Fisher's Exact Test (Left) 0.997 Fisher's Exact Test (Right) 5.45E-03 Fisher's Exact Test (2-Tail) 9.66E-03 Phi Coefficient 0.056 Contingency Coefficient 0.056 Cramer's V 0.056

Effective Sample Size = 2331
Frequency Missing = 49

We are evaluating the relationship of CHD to hypertension in non-diabetics. You can calculate the prevalence ratios and odds ratios as suits your purpose.

These data show that there is a positive relationship between hypertension and CHD in non-diabetics. (note the small p-values)

    This leads us to our next question, "Is diabetes (exposure) associated with hypertension?" We can answer this with our data as well (below):

HYPERT (Hypertension)
Frequency
Percent
Row Pct
Col Pct
DIABETES (Diabetes) Total
0 1
0 1650
63.66
95.10
69.59
85
3.28
4.90
38.46
1735
66.94
1 721
27.82
84.13
30.41
136
5.25
15.87
61.54
857
33.06
Total 2371
91.47
221
8.53
2592
100.00
Is diabetes (exposure) associated with HYP?
Statistics for a table of Hypert by Diabetes
Statistic DF Value Prob
Chi-square 1 88.515 0.001
Likelihood Ratio Chi-square 1 82.438 0.001
Continuity Adj. Chi-square 1 87.114 0.001
Mantel- Haenszel Chi-square 1 88.481 0.001
Fisher's Exact Test (Left) 1.000
Fisher's Exact Test (Right) 1.01E-19
Fisher's Exact Test (2-Tail) 1.79E-19

Example 3-7: A cross-sectional study Section

Stratification and Adjustment - Diabetes and CHD relationship confounded by hypertension:

Earlier we arrived at a crude odds ratio of 3.38.

Crude Diabetes- CHD association
Diabetes CHD Total
Yes No
Yes 26 190 216
No 91 2249 2340
Total 117 2439 2556
\(OR_>=(26 \times 2249) /(91 \times 190)=3.38\)

Now we will use an extended Maentel Hanzel method to adjust for hypertension and produce an adjusted odds ratio When we do so, the adjusted OR = 2.84.

The Mantel-Haenszel method takes into account the effect of the strata, presence or absence of hypertension.

If we limit the analysis to normotensives we get an odds ratio of 2.4.

Diabetes & CHD Among Normotensives
Diabetes CHD Total
Yes No
Yes 6 77 83
No 51 1572 1623
Total 57 1649 1706
\(OR_>=(6 \times 1572) /(77 \times 51)=2.40\)

Among hypertensives, we get an odds ratio of 3.04.

Diabetes & CHD Among Hypertensives
Diabetes CHD Total
Yes No
Yes 20 113 133
No 39 669 708
Total 59 782 841
\(OR_>=(20 \times 669) /(39 \times 113)=3.04\)

Both estimates of the odds ratio are lower than the odds ratio based on the entire sample. If you stratify a sample, without losing any data, wouldn't you expect to find the crude odds ratio to be a weighted average of the stratified odds ratios?

This is an example of confounding - the stratified results are both on the same side of the crude odds ratio. This is positive confounding because the unstratified estimate is biased away from the null hypothesis. The null is 1.0. The true odds ratio, accounting for the effect of hypertension, is 2.8 from the Maentel Hanzel test. The crude odds ratio of 3.38 was biased away from the null of 1.0. (In some studies you are looking for a positive association; in others, a negative association, a protective effect; either way, differing from the null of 1.0)

This is one way to demonstrate the presence of confounding. You may have a priori knowledge of confounded effects, or you may examine the data and determine whether confounding exists. Either way, when confounding is present, as, in this example, the adjusted odds ratio should be reported. In this example, we report the odds ratio for the association of diabetes with CHD = 2.84, adjusted for hypertension.

If you are analyzing data using multivariable logistic regression, a rule of thumb is if the odds ratio changes by 10% or more, include the potential confounder in the multi-variable model. The question is not so much the statistical significance, but the amount of the confounding variable changes the effect. If a variable changes the effect by 10% or more, then we consider it a confounder and leave it in the model.

We will talk more about this later, but briefly here are some methods to control for a confounding variable (known a priori):

Controlling potential confounding starts with a good study design including anticipating potential confounders.

Effect Modification (interaction) Section

Effect modification Effect modification occurs when the effect of a factor is different for different groups. We see evidence of this when the crude estimate of the association (odds ratio, rate ratio, risk ratio) is very close to a weighted average of group-specific estimates of the association. Effect modification is similar to statistical interaction, but in epidemiology, effect modification is related to the biology of disease, not just a data observation.

In the previous example, we saw both stratum-specific estimates of the odds ratio went to one side of the crude odds ratio. With effect modification, we expect the crude odds ratio to be between the estimates of the odds ratio for the stratum-specific estimates.

Effect modifier Effect modifier is a variable that differentially (positively and negatively) modifies the observed effect of a risk factor on disease status.

Consider the following examples:

  1. The immunization status of an individual modifies the effect of exposure to a pathogen and specific types of infectious diseases. Why?
  2. Breast Cancer occurs in both men and women. Breast cancer occurs in men at approximately a rate of 1.5/100,000 men. Breast cancer occurs in women at approximately a rate of 122.1/100,000 women. This is about an 800 fold difference. We can build a statistical model that shows that gender interacts with other risk factors for breast cancer, but why is this the case? Obviously, there are many biological reasons why this interaction should be present. This is the part that we want to look at from an epidemiological perspective. Consider whether the biology supports a statistical interaction that you might observe.

Think about it!

Why study effect modification? Why do we care?

If you do not identify and handle properly an effect modifier, you will get an incorrect crude estimate. The (incorrect) crude estimator (e.g., RR, OR) is a weighted average of the (correct) stratum-specific estimators. If you do not sort out the stratum-specific results, you miss an opportunity to understand the biologic or psychosocial nature of the relationship between risk factors and outcome.

To consider effect modification in the design and conduct of a study:

  1. Collect information on potential effect modifiers.
  2. Power the study to test potential effect modifiers - if a priori you think that the effect may differ depending on the stratum, power the study to detect a difference.
  3. Don't match on a potentially important effect modifier - if you do, you can't examine its effect.

To consider effect modification in the analysis of data:

  1. Again, consider what potential effect modifiers might be.
  2. Stratify the data by potential effect modifiers and calculate stratum-specific estimates of the effect of the risk on the outcome; determine if effect modification is present. If so,
  3. Present stratum-specific estimates. Use Breslow-Day Test for Homogeneity of the odds ratios, from Extended Mantel-Haenszel method, or -2 log-likelihood test from logistic regression to test the statistical significance of potential effect modifiers and to calculate the estimators of exposure-disease association according to the levels of significant effect modifiers. Alternatively, if assumptions are met, use proportional hazards regression to produce an adjusted hazards ratio.