Data Analysis Paper

 

Mixing of specific Arabidopsis thaliana genotypes stabilize yield in diseased and non-diseased sample populations

Taylor Schilling

BIO 342

27 November 2016

Dr. Zeynep

 

 

On my honor, I have not given, nor received, nor witnessed any unauthorized assistance on this work. -Taylor Schilling

Objective

Statistical data analysis was done using the raw data from the study, Impact of disease on diversity and productivity of plant populations (Creissen et al. 2016). The purpose for this analysis is to better understand the overall spread of data from this experiment as well as the relationship between the variables used. The productivity of Arabidopsis thaliana can be shown by seed mass; in order to find out if vegetative production can also be measured by rosette size, the linear relationship was found between rosette size and seed mass using linear regression. Further, Anova tests were used to find out if population means of rosette size change depending on the genotype as well as if population means of rosette size differ depending on the number of genotypes per pot. This shows whether or not the specific genotype and number of genotypes present actually affect the population means of rosette size – and more importantly, whether or not the study was successful in finding significant data regarding their purpose. Last, a chi square test was done using the number days to flower (separated in low, medium and high categories) and the types of genotypic mix. This shows if the genotype mix affects the days A. thaliana takes to flower and further reinforce the ability of this study to portray results that are applicable to the overall population.

Lit Review

In Impact of disease on diversity and productivity of plant populations, Creissen et al. (2016) studied the effects that diversity in plant genotypes had on stabilizing plant productivity in Arabidopsis thaliana while being attacked by a pathogen Hyaloperonospora arabidopsidis (Hpa). They focused on plant competition and the effects on plant production when the pathogen is introduced as well as the effect of biodiversity on the system’s ability to buffer against the disease.

The research shows that pathogens promote plant biodiversity and prevent competitive exclusion – at least when a resistant genotype is present. Biodiversity is reduced when less competitive species are diseased. Additionally, species richness lessens the effect of disease and increases plant productivity. Four specific genotypes (Van-0, Ga-0, NFA-10 and NFA-8) of A. thaliana were chosen based on their fitness and planted in pots. There were four plants per pot – 20 pots of each of the 11 monocultures and mixtures in each pathogen treatment (220 pots total). The researchers measured the diseased leaf area after six and ten days, rosette leaf size, plant height and flowering time.

The results of this study show that, when diseased, the yield ultimately depended on the number and combination of certain genotypes. Hpa reduced seed production in all mixes with the most susceptible genotypes, NFA-8 and NFA-10. This is shown in the decrease in rosette diameter. There was an increased competitive ability in resistant genotypes, Ga-0 and Van-0 in the 2 and 4-way mixes. It is also important to note that Ga-0 is the most competitive with or without Hpa, while NFA-8 is highly competitive without Hpa, though less so with the presence of the pathogen. Without the pathogen, the pots with only highly competitive genotypes have the lowest yield while pots with less competitive genotypes have the highest yield. With the disease, the combination of the somewhat susceptible NFA-10 and the fully resistance Van-0 had the highest yield in monoculture and 2-way mix. Additionally, the study found that 2-way genotypic mixes had overall higher yields than monoculture and 4-way genotype mixed pots. In fact, 4-way mixes produced had the lowest yield without Hpa and the same yield as monoculture with Hpa. This shows that not only the combination of genotype matters in plant yield, but the number of different genotypes in the mix matters as well.

This research supports the ability of resistant genotypes to maintain productivity, stability and diversity. There is more resistance of plants to change their behavior (know as ecological resistance) and buffer negative effects during events such as the introduction of a pathogen in order to maintain their well-being and ensure their survival. With a pathogen, a high yield of a resistant genotype results especially when there is a mixture of resistant and susceptible genotypes. Disease helps to maintain genotypic diversity, which in turn enhances productivity because disease pressure leads to compensatory actions – in this case, the over-yielding of one genotype (for example, Van-0) compensating for the loss of another (NFA-10). These compensatory interactions are highest when the genotypes had different competitive abilities. They compensate depending on their specific response to disease, which ultimately leads to more production. Therefore, pathogens promote biodiversity by inhibiting competitive exclusion and supporting complementation. Mixtures, then, may reduce the effect of pathogens as well as the competitiveness between plants, as seen with the 2-way mix between Van-0 and NFA-10.

This research is applicable to those who work with agriculture because it helps them decide which plants and plant genotypes to plant to get the best yield and yield stability possible. It also highlights the importance of genotypes and number of genotypes in a mix to buffer the affect of a pathogen and to have the highest productivity possible.

Data Analysis

Descriptive Statistics: Days to Flower

Minimum Quartile 1 Median Quartile 3 Maximum
42 49 52 62 90
Mean Standard Deviation Interquartile Range Variance
55.4 8.7 13.0 74.8

The number of days that plants took to flower is between 42 and 90 days. The middle 50% of days it took to flower is between 49 and 62 days with the center of the sample being 52 days. With that being said, the mean is 55.4 days. The interquartile range shows the middle half of the data; 13 days less than quartile 1 (36 days), and 13 more than quartile 3 (75 days). The dispersion of days around the mean is around 8.7.

Descriptive Statistics: Rosette Size

Minimum Quartile 1 Median Quartile 3 Maximum
25 55 67 82 138
Mean Standard Deviation Interquartile Range Variance
69.4 18.1 27.0 328.8

The rosette size ranges between 25 and 138 mm. The middle 50% of rosette sizes is between 55 and 82 mm, with the middle of the sample being 67 mm. The mean is 69.4 mm. The interquartile range is 27 less than quartile 1 and more than quartile 3 (28-109 mm), which is more accurate as it disregards outliers. The dispersion of rosette sizes around the mean is 18.1.

Descriptive Statistics: Seed Mass

Minimum Quartile 1 Median Quartile 3 Maximum
0.01 0.20 0.28 0.35 0.68
Mean Standard Deviation Interquartile Range Variance
0.28 0.11 0.15 0.01

The seed mass ranges between 0.01 g to 0.68 g. The middle 50% of seed mass is between 0.20 to 0.35 g with the middle of the sample being approximately 0.28 g. The mean is also 0.28 g. The interquartile range is 0.15 g less than quartile 1 and more than quartile 3 (0.05-0.50 g), which is more accurate due to the exclusion of outliers. The dispersion of seed mass around the mean is 0.11.

Research question: Is there a linear relationship between the overall seed mass and rosette size in the plants?

Correlation

Seed Mass (g)
Rosette Size (mm) 0.49041

<0.0001 


Null hypothesis
: There is no relationship between the overall seed mass and rosette size in these plants.

Alternative hypothesis: There is a relationship between the overall seed mass and rosette size in these plants.

The correlation between seed mass and rosette size is ~0.49. This is a strong and positive correlation, which means that there is a strong and positive relationship between the two variables. As the seed mass increases, the rosette size increases. Further, the P-value is less than 0.0001, meaning that it is significant and the null hypothesis is rejected. Thus, there is statistical evidence to support the relationship between seed mass and rosette size. As such, it would be logical to calculate linear regression.

Regression

Seed Mass = 0.06989 + 0.00302*Rosette Size

P-value R-square
<0.0001 0.2405

 

  Parameter Estimate P-value
Intercept 0.06989 <0.0001
Rosette Size 0.00302 <0.0001


Null hypothesis:
There is no linear relationship between the overall seed mass and rosette size in these plants.

Alternative hypothesis: There is a linear relationship between the overall seed mass and rosette size in these plants.

The linear regression P-value (<0.0001) is less than the alpha value (0.05) meaning that the null hypothesis is rejected and there is a statistically significant linear relationship between seed mass and rosette size. The regression line shows that with every millimeter increase in rosette size, the seed mass increases by 0.00302 g. At 0 millimeters, the seed mass is 0.06989 g. The R-square value, however, is 0.2405, which means that 24.05% of the seed mass data is unexplained. This regression line is therefore not a good model for the linear relationship between seed mass and rosette size because the majority of the data is unexplained.

Research question: Are the population means of rosette size significantly different for each genotype?

Null hypothesis: The population means of rosette size are the same for each genotype.

Alternative hypothesis: The population means of rosette size are different for each genotype.

Due to the P-value (<0.0001) being less than alpha (0.05), the null hypothesis is rejected. There is statistically significant evidence to support that the population means of rosette sizes are different for each genotype (Ga-0, NFA-10, NFA-8, Van-0).

Research question: Are the population means of rosette size significantly different for each number of genotypes per pot?

Null hypothesis: The population means of rosette size are the same for each number of genotypes per pot.

Alternative hypothesis: The population means of rosette size are different for each number of genotypes per pot.

The P-value (<0.0001) is less than the alpha value (0.05), meaning that the null hypothesis is rejected and that there is statistically significant evidence to show that the population means of the rosette sizes are different for each number of genotypes per pot (1 genotype/per pot, 2 genotypes/pot and 4 genotypes/pot).

Research question: Do the plants take different numbers of days to flower between pots that are monocultures, 2-way genotypic mixes, and 4-way genotypic mixes?

Low 42-49 days to flower (lower 33%)
Medium 50-62 days to flower (middle 33%)
High 63-90 days to flower (upper 33%)

(Taken from 5-number summary of days to flower)

Observed and expected values
Mono 2-way mix 4-way mix Total
Low 86 (91) 263 (274) 97 (82) 446
Medium 106 (113) 392 (343) 59 (102) 558
High 63 (51) 116 (155) 74 (46) 252
Total 255 771 230 1256


Null hypothesis:
The number of days it takes to flower and genotypic mix are independent.

Alternative hypothesis: The number of days it takes to flower depends on the genotypic mix.

By calculating the obtained and expected values of days to flower (categories: low, medium and high number of days to flower) and type of genotype mix (categories: mono, 2-way mix, 4-way mix) the P-value (5.66×10-12) is found to be less than alpha (0.05). Therefore, the null hypothesis is rejected and there is statistically significant evidence to support that the days to flower and genotype mix are associated.

Conclusion

In order to better understand the raw data gathered from Creissen et al. (2016) during their study, correlation, regression, Anova and chi-square tests were performed. The conclusions made were that there is a linear relationship between seed mass and rosette size; population means of rosette size re different for each genotype; population means of rosette size are different for each number of genotypes per pot; and plants flower in different amounts of days depending on if they are in a monoculture, 2-way mix or 4-way mix. The study was therefore successful at gathering significant results that can be related to overall populations.

 

Creissen, H. E., Jorgensen, T. H., and Brown, J. K. M. (2016). Impact of disease on

diversity and productivity of plant populations. Functional Ecology 30, 649-657.

Displaying Data in Different Ways

This TedTalk shows data in two creative ways: maps and pictures of the speaker’s plant research. The speaker shows data taken from around the world and uses color and symbols that represent that specific data in each region. Shown below, the first image shows the predicted growth of population. The 2nd and 3rd photos show maps with red coloring. In the first, the red shows areas that were agriculturally successful but are no longer due to lack of rainfall. The second shows the prediction of unsuccessful agricultural areas in 2050. Later in the presentation, she shows data using the photos of plants. On the left side of the arrow are plants without an expressed gene with watered and un-watered plants, while on the right side of the arrow are plants with the expressed gene with watered and un-watered plants. You can observe the growth differences, especially between the un-watered plants.

screen-shot-2016-10-06-at-9-47-36-pm

screen-shot-2016-10-06-at-9-48-10-pmscreen-shot-2016-10-06-at-9-47-52-pm

screen-shot-2016-10-06-at-9-47-25-pm

Experimental vs. Observational Journal Article

Title of study: Nitrogen Loads to Estuaries: Using Loading Models to Assess the Effectiveness of Management Options to Restore Estuarine Water Quality

I chose an experimental study over how effective specific methods improve water quality in an estuarine system – specifically, Waquoit Bay, Massachusetts. The methods tested were diverting nitrogenous runoff from impervious surfaces, changing zoning ordinances, preserving forested tracts of land and wetlands, harvesting macroalgae, dredging estuary channels, and eradicating waterfowl.

This article is an example of using an experimental method because it is a case study using a control, dependent and independent variables. The researchers manipulated the subjects in order to find the effects and get results. Most of the results were shown as numeric values represented in graphs and tables.

Land cover data was obtained to show the amount of nitrogen released as well as amount of nitrogen that remains in the Walquoit Bay. The results for the effectiveness of each method were gathered differently because they were not related. For instance, the method that reduces fertilizer inputs is to add different dosages of fertilizer to farms, lawns and golf courses, whereas the method for preserving vegetated tracts is to use NLM to calculate nitrogen amounts from a hypothetical forested plot and comparing it to other areas (ie a golf course, residential area, etc.).

The researchers concluded that the methods holding the most potential in lessening the nitrogen loads in Walquoit Bay – as well as other estuaries – include improvement of the septic system, having zoning regulations, preserving forested tracts and freshwater bodies, and conserving salt marshes. The methods that are somewhat less promising in reducing nitrogen loads are through installing wastewater treatment plants, regulating the use of fertilizer, and harvesting macroalgae. Those that would be the least successful are diverting runoff from impervious surfaces, dredging, and eradicating waterfowl. It is mentioned that these methods are focused on this particular region and that similar studies should be done on other estuaries. Each estuary is different, therefore may react differently to each method.

(Bowen, J. L., & Valiela, I. (2004). Nitrogen loads to estuaries: Using loading models to assess the effectiveness of management options to restore estuarine water quality. Estuaries, 27(3), 482-500. doi:http://dx.doi.org/10.1007/BF02803540)

Why Do We Care About Biostatistics?

Why We Care About Biostatistics

While the subject and significance of biostatistics works in the background for most people, it is incorporated in our lives everyday. In fact, it is a vital field in that the work biostatisticians do improves health and ultimately saves lives around the globe.

The role of a biostatistician revolves around three essential aspects: collecting, analyzing, and interpreting human health data. Ultimately, the goal of processing these data is to understand and improve human health. Oftentimes data is used from clinical trials and epidemiological traits of a particular disease. While this seems simple, the process is quite complex and accuracy is crucial. Much of this research is done for medical professionals and those working in the public health sectors of the government. Biostatisticians must know how certain factors such as drugs, diet, age, environment, gender and race affect the behavior of a disease; similarly, they study geographical patterns, development, behavior and mortality rate of said disease. Knowing this information aids in developing prevention and treatment measures. Some biostatisticians have already researched these fields, while others use the research as background information to test their own theory. Indeed, biostatisticians must decide how to set up these experiments. They then not only need to know how to mathematically calculate the data, but also be able to communicate it to other professionals in a way that is understandable and can be used to improve certain conditions. It is therefore evident that their research must be accurate because it has the potential to affect entire countries and even the world.

Biostatisticians feed their research results to government officials and medical professionals – essentially, people who manage health facilities, create health policies and make clinical decisions. They use the information as a basis of their decision-making. Notably, many health professionals are found to lack this skill and often make inaccurate calculations, even though they are supposed to know the material. Research shows that proficiency in biostatistics improves the quality of research and connects theory with practice. The resulting research will also prove common theories. For instance, most people know that cigarette smoke negatively affects lungs. Biostatistics proves this observation. Thus, it is essential for medical professionals to understand biostatistics so that it can be applied correctly.

Research done by biostatisticians aids in predicting disease behavior and opens up prevention and treatment options. With such knowledge, measures can be taken to stop the spread of disease around the world. Recently, numbers were crunched with regards to Ebola and the Zika virus. These diseases were unfamiliar, dangerous and spreading very quickly so statistics on the current situations had to be calculated for damage control and prevention. With any disease, these numbers help with isolating the disease as much as possible so that it will not become a pandemic. With that being said, if I plan on doing any medical research during my career, it is critical that I know how to use biostatistics to yield correct results. It will also help me to be more aware of epidemiology locally and globally as well as know what types of questions to ask if I was interested in doing research. I will be able to tell whether or not treatments are effective and can therefore be more efficient in working towards a solution.

With regards to PA school – where I plan on going after graduating from Rollins – it will be important for me to understand biostatistics in order to accurately interpret statistical information in the medical field. Jobs in the medical field are now looking at candidates with knowledge of biostatistics as health professionals are being pushed to focus more on research in order to broaden knowledge and practice of treatments. I also need to be able to understand clinical guidelines, advertisements, and research publications as well as explain risk to patients and interpret test results. With this knowledge, I will be a stronger candidate when applying to this job field. My understanding of biostatistics will also aid me in being a better physician in that I will understand treatments and how they affect my patients. If they are not working, then I can react and adjust accordingly to ensure that his or her condition improves.

 

 

Citations

Biostatistics Department. Vanderbilt University, 2015, https://medschool.vanderbilt.edu/biostatistics/content/what-biostatistics. Accessed 31August 2016.

“Biostatistics in Public Health.” UCLA School of Public Health.

http://www.ph.ucla.edu/epi/faculty/detels/PH150/Cumberland_biostatpubhlth.pdf. pdf.

Gore, A.D., et al. “Application of biostatistics in research by teaching faculty and final-year

postgraduate students in colleges of modern medicine: A cross-sectional study”. International Journal of Applied and Basic Medical Research. 2012,             http://www.ncbi.nlm.nih.gov/pmc/articles / PMC3657982/citedby/. Accessed 31 August 2016.

Lorenzo, A., et al. “Five Reasons for Choosing Biostatistics”. The World of Statistics, Fresh Biostats, http://www.worldofstatistics.org/2013/05/13/five-reasons-for-choosing-   biostatistics/. Accessed 31 August 2016.

Perry, A. K. “What is the role of biostatistics in modern medicine?” How Stuff Works, 10 Oct. 2011. http://health.howstuffworks.com/medicine/modern-treatments /biostatistics-in- modern-medicine.htm.

Swift, L., et al. “Do doctors need statistics? Doctors’ use of and attitudes to probability and statistics.” International Journal of Applied and Basic Medical Research vol. 28, 2009. http://www.ncbi.nlm.nih.gov/pubmed/19452567. Accessed 31 August 2016.

WHO and Center University of Pittsburgh. Supercourse, 11 October, 2004. https://www.bibalex.org/supercourse/index.htm. Accessed 31 August 2016