Presentation:
Shane Ragland
Biostatistics research project
Statistical analysis of physical factors of patients who underwent a pulmonary bronchoscopy
Objective:
The physical characteristics were recorded of 304 healthcare workers who perform pulmonary bronchoscopies and are suspected to have contracted pulmonary tuberculosis from improper precaution during the procedures. The first hypothesis proposed is to see if there is any correlation or linear relationship between the Body Mass Index (BMI) and the age of the workers. The second is to ascertain if TB and Smoking are independent of each other. The third is to determine if the BMI population mean between three levels of Smoking History are statistically different.
Literature Review:
Healthcare workers that perform or are around patients who undergo a pulmonary bronchoscopy are recommended to take care when performing or are around the procedure. Pulmonary tuberculosis is a highly contagious disease, and particulate matter from the procedure can leave contagious particulates airborne. It is recommended that during the procedure face masks and equipment that can filter out these particulates are worn and such precaution is exceedingly important to take when a patient has pulmonary tuberculosis. However, if a patient nor the doctor knows they have TB, the patient can be unexpectedly diagnosed in the future which means the healthcare workers who performed the procedure can have been exposed to the disease. (Na et al., 2016) The paper that provides the data this study will used is a retrospective study of 1,954 healthcare workers for whom CT and bronchoscopy information was available from the Pusan National University Hospital in Busan South Korea. (Na et al., 2016) South Korea has a particularly high incidence rate of PTB, so determining risks of exposure is particularly important. 304 of the people used in the study are thought to be exposed to PTB from improper precaution. The paper states that there were no significant differences in the population used in the study in either age or body mass index. (Na et al., 2016) The smoking history of the patients were recorded as: never smoked (0), past smoker (1), or current smoker (2). The future diagnosis of the patient with PTB was determined from hospital records and was recorded as either diagnosed (1) or undiagnosed (0). (Na et al., 2016)
Statistical Analysis:
Descriptive Statistics for Numeric Variables
Variable  N  N Miss  Minimum  Mean  Median  Maximum  Std Dev 
BMI
Age Smoking TB 
304
304 304 304 
0
0 0 0 
15.0000000
19.0000000 0 0 
21.8473684
55.0888158 0.6480263 0.5230263 
21.6000000
57.0000000 0 1.0000000 
44.3000000
88.0000000 2.0000000 1.0000000 
2.9693583
16.4226964 0.8430238 0.5002930 
The mean is close to the median of both of the continuous variables, age and BMI, which suggests that the data is approximately symmetric in a normal bell curve. The BMI data has a range of 29.3 and the Age data has a range of 69.
Correlation and Linear Regression model of BMI and Age
To determine whether or not Age is correlated and has a linear relationship to BMI, a correlation and a linear regression model were used. These methods were chosen because both BMI and Age are continuous variables, and these models suggest whether or not they are correlated and have a linear relationship. As shown in the correlation table below, the Pearson correlation coefficient ,r, is only 0.004, which means that the two variables age and BMI are very weakly correlated. A strong positive correlation would be indicated by a coefficient of between 0.7 and 1 , and a strong negative correlation would be between 0.7 and 1. The correlation coefficient of 0.004 does not lie in either of these intervals and is close to 0, which represents a very weak correlation between the variables BMI and Age.
This can be further seen with the linear regression analysis. The rsquared value is 0.003, which means that only 0.3% of the variance in the data can be explained by the linear regression model. The slope of the linear regression model is 0.0007, which suggests as one increases a year in age, one’s BMI lowers by 0.0007, starting from age 0 at the yintercept of 21.89, however because of the low correlation value, this model does not explain the variance in data well.
Pearson Correlation Coefficients, N = 304  
BMI  
Age  0.00400 
Parameter Estimates  
Variable  DF  Parameter Estimate 
Standard Error 
t Value  Pr > t 
Intercept  1  21.88719  0.59800  36.60  <.0001 
Age  1  0.00072294  0.01040  0.07  0.944 
Chi Square: TB versus smoking
H_{0}: That Tuberculosis Diagnosis and History of smoking are independent
H_{A}: That Tuberculosis Diagnosis and History of smoking are not independent
In order to see if the two categorical variables, TB and Smoking, are independent of each other, a Chisquare test was conducted. SAS calculated the test statistic χ^{2}= 0.990 and the Pvalue, P(χ^{2}>0.990)=0.6068. At the 0.05 significance level, one should not reject the null hypothesis (as 0.6068 > 0.05.) In conclusion, the chisquare test indicates that Tuberculosis Diagnosis and Smoking history are independent of each other.


Statistics for Table of Smoking by TB
Statistic  DF  Value  Prob 
ChiSquare  2  0.9990  0.6068 
ANOVA Test: Smoking Versus BMI
H_{0}: The mean BMI for all levels of smoking history (never, past, and current) are equal.
H_{A}: At least two of the mean BMI’s for all levels of smoking history (never, past, and current) are not equal.
To test whether or not the BMI population mean between three levels of smoking history are statistically different, an ANOVA test was used. SAS calculated a test statistic of F= 0.60 and a Pvalue of P(F>0.60)=0.50492. At the 0.05 significance level, one would not reject the null hypothesis (because 0.5492 > 0.05.) Thus, one cannot reject that the mean BMI for all levels of smoking history (never, past, and current) are equal. One can conclude that the population means are not statistically different.
Source  DF  Sum of Squares  Mean Square  F Value  Pr > F  
Model  2  10.616569  5.308284  0.60  0.5492 
Conclusion:
By testing for correlation, linear relationships, independence, and difference in means, one can begin to make inferences about this data set. From observing Pearson’s correlation coefficient, it was concluded that Age and BMI were weakly correlated. From the chisquare hypothesis test, it was determined that Smoking History is independent of TB, and thus past smoking has no significant effect on contracting TB. From the ANOVA table, it was established that the mean BMI for all levels of smoking history (never, past, and current) are equal. Thus, one can better understand how the three variables of Age, BMI, and Smoking History interact with TB and with each other.