Final biostats project

 

Presentation:

tb

Shane Ragland

Biostatistics research project

 

Statistical analysis of physical factors of patients who underwent a pulmonary bronchoscopy

 

Objective:

The physical characteristics were recorded of 304 healthcare workers who perform pulmonary bronchoscopies and are suspected to have contracted pulmonary tuberculosis from improper precaution during the procedures. The first hypothesis proposed is to see if there is any correlation or linear relationship between the Body Mass Index (BMI) and the age of the workers. The second is to ascertain if TB and Smoking are independent of each other. The third is to determine if the BMI population mean between three levels of Smoking History are statistically different.

Literature Review:

Healthcare workers that perform or are around patients who undergo a pulmonary bronchoscopy are recommended to take care when performing or are around the procedure. Pulmonary tuberculosis is a highly contagious disease, and particulate matter from the procedure can leave contagious particulates airborne. It is recommended that during the procedure face masks and equipment that can filter out these particulates are worn and such precaution is exceedingly important to take when a patient has pulmonary tuberculosis. However, if a patient nor the doctor knows they have TB, the patient can be unexpectedly diagnosed in the future which means the healthcare workers who performed the procedure can have been exposed to the disease. (Na et al., 2016) The paper that provides the data this study will used is a retrospective study of 1,954 healthcare workers for whom CT and bronchoscopy information was available from the Pusan National University Hospital in Busan South Korea. (Na et al., 2016) South Korea has a particularly high incidence rate of PTB, so determining risks of exposure is particularly important. 304 of the people used in the study are thought to be exposed to PTB from improper precaution. The paper states that there were no significant differences in the population used in the study in either age or body mass index. (Na et al., 2016) The smoking history of the patients were recorded as: never smoked (0), past smoker (1), or current smoker (2). The future diagnosis of the patient with PTB was determined from hospital records and was recorded as either diagnosed (1) or undiagnosed (0). (Na et al., 2016)

Statistical Analysis:

Descriptive Statistics for Numeric Variables

Variable N N Miss Minimum Mean Median Maximum Std Dev
BMI

Age

Smoking

TB

304

304

304

304

0

0

0

0

15.0000000

19.0000000

0

0

21.8473684

55.0888158

0.6480263

0.5230263

21.6000000

57.0000000

0

1.0000000

44.3000000

88.0000000

2.0000000

1.0000000

2.9693583

16.4226964

0.8430238

0.5002930

The mean is close to the median of both of the continuous variables, age and BMI, which suggests that the data is approximately symmetric in a normal bell curve.  The BMI data has a range of 29.3 and the Age data has a range of 69.

Correlation and Linear Regression model of BMI and Age

To determine whether or not Age is correlated and has a linear relationship to BMI, a correlation and a linear regression model were used. These methods were chosen because both BMI and Age are continuous variables, and these models suggest whether or not they are correlated and have a linear relationship. As shown in the correlation table below, the Pearson correlation coefficient ,r, is only -0.004, which means that the two variables age and BMI are very weakly correlated. A strong positive correlation would be indicated by a coefficient of between 0.7 and 1 , and a strong negative correlation would be between -0.7 and -1. The correlation coefficient of -0.004 does not lie in either of these intervals and is close to 0, which represents a very weak correlation between the variables BMI and Age.

This can be further seen with the linear regression analysis. The r-squared value is -0.003, which means that only 0.3% of the variance in the data can be explained by the linear regression model. The slope of the linear regression model is -0.0007, which suggests as one increases a year in age, one’s BMI lowers by -0.0007, starting from age 0 at the y-intercept of 21.89, however because of the low correlation value, this model does not explain the variance in data well.

 

Pearson Correlation Coefficients, N = 304
  BMI
Age -0.00400
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 21.88719 0.59800 36.60 <.0001
Age 1 -0.00072294 0.01040 -0.07 0.944

 

 Chi Square: TB versus smoking-

H0: That Tuberculosis Diagnosis and History of smoking are independent

HA: That Tuberculosis Diagnosis and History of smoking are not independent

In order to see if the two categorical variables, TB and Smoking, are independent of each other, a Chi-square test was conducted. SAS calculated the test statistic χ2= 0.990 and the P-value, P(χ2>0.990)=0.6068. At the 0.05 significance level, one should not reject the null hypothesis (as 0.6068 > 0.05.) In conclusion, the chi-square test indicates that Tuberculosis Diagnosis and Smoking history are independent of each other.

Frequency

Percent

Row Pct

Col Pct

Table of Smoking by TB
Smoking TB
0 1 Total
0 86

28.29

47.78

59.31

94

30.92

52.22

59.12

180

59.21

 

 

1 27

8.88

52.94

18.62

24

7.89

47.06

15.09

51

16.78

 

 

2 32

10.53

43.84

22.07

41

13.49

56.16

25.79

73

24.01

 

 

Total 145

47.70

159

52.30

304

100.00

Statistics for Table of Smoking by TB

Statistic DF Value Prob
Chi-Square 2 0.9990 0.6068

 

ANOVA Test: Smoking Versus BMI

H0: The mean BMI for all levels of smoking history (never, past, and current) are equal.

HA: At least two of the mean BMI’s for all levels of smoking history (never, past, and current) are not equal.

To test whether or not the BMI population mean between three levels of smoking history are statistically different, an ANOVA test was used. SAS calculated a test statistic of  F= 0.60 and a P-value of P(F>0.60)=0.50492. At the 0.05 significance level, one would not reject the null hypothesis (because 0.5492 > 0.05.) Thus, one cannot reject that the mean BMI for all levels of smoking history (never, past, and current) are equal. One can conclude that the population means are not statistically different.

Source DF Sum of Squares Mean Square F Value Pr > F  
Model 2 10.616569 5.308284 0.60 0.5492

 

Conclusion:

By testing for correlation, linear relationships, independence, and difference in means, one can begin to make inferences about this data set. From observing Pearson’s correlation coefficient, it was concluded that Age and BMI were weakly correlated. From the chi-square hypothesis test, it was determined that Smoking History is independent of TB, and thus past smoking has no significant effect on contracting TB. From the ANOVA table, it was established that the mean BMI for all levels of smoking history (never, past, and current) are equal. Thus, one can better understand how the three variables of Age, BMI, and Smoking History interact with TB and with each other.

Leave a Reply

Your email address will not be published. Required fields are marked *