lab1.Rmd

---
title: "Lab 1"
subtitle: "Resampling"
date: "Assigned 10/14/20, Due 10/21/20"
output:
  html_document: 
    toc: true
    toc_float: true
    theme: "journal"
    css: "website-custom.css"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
                      warning = FALSE,
                      message = FALSE)

library(tidyverse)
library(tidymodels)
```

### Read in the `train.csv` data.

```{r, data}


```

### 1. Initial Split

Split the data into a training set and a testing set as two named objects. Produce the `class` type for the initial split object and the training and test sets.

```{r, initial_split}
set.seed(3000)

```

### 2. Use code to show the proportion of the `train.csv` data that went to each of the training and test sets.

```{r}


```

### 3. *k*-fold cross-validation

Use 10-fold cross-validation to resample the training data.

```{r, resample}
set.seed(3000)

```

### 4. Use `{purrr}` to add the following columns to your *k*-fold CV object:
* *analysis_n* = the *n* of the analysis set for each fold
* *assessment_n* = the *n* of the assessment set for each fold
* *analysis_p* = the proportion of the analysis set for each fold
* *assessment_p* = the proportion of the assessment set for each fold
* *sped_p* = the proportion of students receiving special education services (`sp_ed_fg`) in the analysis and assessment sets for each fold

```{r, purrr}


```

### 5. Please demonstrate that that there are **no** common values in the `id` columns of the `assessment` data between `Fold01` & `Fold02`, and `Fold09` & `Fold10` (of your 10-fold cross-validation object).

```{r}


```

### 6. Try to answer these next questions without running similar code on real data.

For the following code `vfold_cv(fictional_train, v = 20)`:

* What is the proportion in the analysis set for each fold?
* What is the proportion in the assessment set for each fold?

### 7. Use Monte Carlo CV to resample the training data with 20 resamples and .30 of each resample reserved for the assessment sets.

```{r}
set.seed(3000)


```

### 8. Please demonstrate that that there **are** common values in the `id` columns of the `assessment` data between `Resample 8` & `Resample 12`, and `Resample 2` & `Resample 20`in your MC CV object.

```{r}


```

### 9. You plan on doing bootstrap resampling with a training set with *n* = 500.

* What is the sample size of an analysis set for a given bootstrap resample?
* What is the sample size of an assessment set for a given bootstrap resample?
* If each row was selected only once for an analysis set:
  + what would be the size of the analysis set?
  + and what would be the size of the assessment set?