Skip to content
Jason K. Moore edited this page Jul 27, 2015 · 40 revisions

The main headings are tabs on the top bar.

Tab 1: Load Data

This tab has a file loader button and a button for selecting sample data.

  • Load any properly formatted csv file or load one of four datasets from agricolae
  • Display the data as a segmented table
  • Only continuous variables and factors are supported.
  • Show the code used to load the data

Types of Data

These are the types of data we will support for analyses.

  1. A pair of continuous variables. (single variate linear regression)
  2. A continuous independent variable + one factor variable that has been randomized (CRD for ANOVA) Note "randomized" does NOT mean that this is designated as a random variable. It's referring to the DOE. example
  3. A continuous independent variable + two factor variable that has been randomized (CRD with factorial treatment structure for ANOVA) example
  4. A continuous independent variable + one factor variable + one factor block variable (RCBD for ANOVA) example
  5. (MIXED EFFECT MODEL) A continuous independent variable + one factor variable
    • one factor block + one RANDOM factor variable (e.g. location or year) (RCBD for mixed-effects ANOVA) concept example
  6. (MIXED EFFECT MODEL) A continuous independent variable + one factor variable
    • one factor block + two RANDOM factor variable (e.g. location and year) (RCBD for mixed-effects ANOVA) concept example
  7. A continuous independent variable + two factor variable + one factor block variable (RCBD with factorial treatment structure for ANOVA) example
  8. A continuous independent variable + two factor variables and + one factor plot variable (Split Plot CRD for ANOVA) example
  9. A continuous independent variable + two factor variables + one factor block variable (Split Plot RCBD for ANOVA) example

Example Code

Custom data:

my_data <- read.csv('path/to/data.csv')

Sample data:

library(agricolae)  # load "agricolae" package for the sample data
data("plots")  # plots, corn, cotton, etc
my_data <- plots

Tab 2: Analysis

Each sub-heading here represents a section in the side panel.

Side Panel Step 1: Experiment Design

The user will select from a drop down the type of experiment design they used:

  1. Generic data for correlation analysis
  2. Completely Randomized Design (CRD)
  3. Randomized Complete Block Design (RCBD)
  4. Randomized Complete Block Design (RCBD) Mixed Effect
  5. Split Plot CRD
  6. Split Plot RCBD

Side Panel Step 2: Analysis Type

If #1 is selected in Step 1 this is set to "linear regression". If 2, 3, 5, 6 is selected in step 1 then ANOVA is automatically selected. If 4 is selected then this will be done used a mixed effect model like lme or something.

Side Panel Step 3: Select the dependent variable

This is the same for all data types, i.e. the user selects a single indpendent variable. A list of all the variables available in the data set are shown and the user selects one. This variable should always be continuous.

Side Panel Step 4: Select the independent variable(s)

Linear Regression

The user can select any one of the remaining columns in the data set from a list. This column should be a continuous variable.

This will create a base formula like:

y ~ x

Completely Randomized Design

The user can select 1 or 2 variables from the remaining columns for the treatment(s). These should be a factors. They can also select an interaction if the there are two variables.

This will create a base formulas like:

y ~ x

or

y ~ x + z

or

y ~ x + z + x:z

Randomized Complete Block Design

The user can select 1 or 2 variables from the remaining columns for the treatment(s) and select a variable for the block. All these should be factors. In addition, the user can add an interaction term if two variables are selected.

This will create a base formula like:

y ~ x + blk

or

y ~ x + z + blk

or

y ~ x + z + x:z + blk

Single Random Variable Mixed Effect Model

The user selects a continous independent variable, a fixed factor variable, and a random factor variable.

y ~ x + z + (1|w)

Double Random Variable Mixed Effect Model

The user selects a continous independent variable, a fixed factor variable, one factor block and two random factor variables.

y ~ x + z + blk + (1|w) + (1|v)

Split Plot CRD

There are three drop downs for variable selection: main plot (A), sub plot (B), and replication (R). Which produce this formula:

y ~ A + B + A:B + Error(A:R)

Split Plot RCBD

There are three drop downs for factor variable selection: main plot (A), sub plot (B), block (blk).

y ~ A + B + + A:B + blk + Error(A:blk)

Side Panel Step 5: Variable Types

Each variable that was selected in the previous step will be listed along with a radio button allowing for different types: continuous or factor (grouping).

Note: This section likely no longer needed because the variable types are restricted by the analyses type.

Linear Regression

Both should be continuous so no selections are allowed.

ANOVA, CRD, RCBD, Split CRD, Split RCBD

The dependent variable should be continuous and the independent variables should be factors. No selections allowed.

Side Panel Step 6: Transformations

This will allow the user to apply transformations to any of the model's continuous variables. The variables that were selected will have radio buttons or a drop down beside them with options to apply log, sqrt, power. This will modify the formula, e.g. for ANOVA:

y^2 ~ x1 + ... + xn + x1*x1 + ... + xn*xn

or

log(y) ~ x1 + ... + xn + x1*x1 + ... + xn*xn

For linear regression we should allow transforming the dependent and independent variables:

y ~ x^2

Side Panel Step 7: Select Alpha

A text box that lets the user set the alpha value to something other than 0.05. The default is 0.05. This will be used in the confidence interval calcs and the post hoc analyses.

Side Panel Step 8: Run Analysis

The user presses the "Run Analysis" button which then displays the code needed to run the analyses and the assumptions tests <== or are the assumption tests initiated in Tab #3 (see below)?

The code produced follows this form:

fit <- aov(formula = y ~ A + B + A * B, data = my_data)
anova(fit)
confint(fit, level=0.95)

Several things are displayed in the main panel:

  1. The text output of the anova() function.
  2. The 95% confidence intervals.
  3. The text output of the shapiro.test() functions.
  4. The text output of the leveneTest function.
shapiro.test(A)
shapiro.test(B)
library(car)
leveneTest(y ~ A * B, data=my_data) <== I don't this this code is correct
***

R example for both Shapiro and Levene's test:

Inform R that Block and Trt are factors:

data.set$Block<-as.factor(firm.dat$Block)
data.set$Temp<-as.factor(firm.dat$Trt)

The ANOVA:

ANOVA.model<-lm(DependentVariable ~ Trt + Block, data.set)
anova(ANOVA.model)

TESTING ASSUMPTIONS:

  1. Generate residual and predicted values
data.set$resids <- residuals(ANOVA.model)
data.set$preds <- predict(ANOVA.model)
data.set$sq_preds <- data.set$preds^2
  1. Look at a plot of residual vs. predicted values
plot(resids ~ preds, data = data.set,
xlab = "Predicted Values", ylab = "Residuals")
  1. Perform a Shapiro-Wilk test for normality of residuals
shapiro.test(data.set$resids)

Perform Levene's Test for homogeneity of variances library(car) leveneTest(DependentVariable ~ Trt + Blk, data.set, show.table = TRUE)

Tab 4: Post Hoc Analysis

This will have a button to run post hoc analyses.

TODO : This needs some help. Not sure if I know exactly what to do here.

We could let the user be in charge of choosing the factors to run LSD.test() on and not have any checking. This would be the easiest to implement.

  1. Significant variables are identified.
  2. For each sig var LSD.test() is run on the variable.
  3. The text output of the $groups attribute is shown, i.e. a table that shows whether the factors are significantly different from each other.
  4. Are any plots needed from this? Maybe just identify the "letters" for each factor level, determined by LSD.test, so they can be called as labels in a bar graph. The plots will be produced under Tab 5.

Tab 5: Plots

The plots populate and update when you run the various analyses.

  1. Standard plots for a fit.
  2. A bar chart that compares the means and standard errors of the variables. (labeled with the significance letters if possible)
  3. Effects plots.
  4. Interaction plots (if any interactions are present). See page 5 for visual of what an interaction plot is example
plot(fit)

# bar chart
# TODO: Add example code.

# effects and interactions
for(i in names(ef)){
 ef[[i]] <- effect(i, .fit)
}
lapply(ef, plot)

Tab 6: Downloads

  1. A button to download a pdf report that include R code and graphs.
  2. A button to download an R script that will execute the analyses created by the GUI.

Tab 7: Help

Help text for all the functionality.

Tab 8: About

Image and link to USAID plus a disclaimer.

Clone this wiki locally