Wrangling

Libraries

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(here)
here() starts at /Users/nathangreenslit/Desktop/UGA/Spring 2023/MADA/nathangreenslit-MADA-portfolio
library(recipes)#Creating unordered factors/ordered factors

Attaching package: 'recipes'

The following object is masked from 'package:stringr':

    fixed

The following object is masked from 'package:stats':

    step

Load Data

d<- readRDS(here("fluanalysis", "data", "SympAct_Any_Pos.Rda"))

Remove Columns of non-interest

d1<- d %>%
  select(-contains(c("FluA", "FluB", "Score", "Total", "Dxname", "Activity", "Unique.Visit"))) %>% #Removes Columns of non-interest 
drop_na() #Drop NAs

Now we have a data set with no NAs and presence/absence of flu symptoms (categorical), and body temperature (continuous)

Remove Variables with multiple levels/ Yes-No

d2<- 
  d1 %>%
  select(!c(WeaknessYN, CoughYN, MyalgiaYN, CoughYN2))

Remove Binary predictors that have<50 entries in one category

Check Data set

summary(d2)
 SwollenLymphNodes ChestCongestion ChillsSweats NasalCongestion Sneeze   
 No :418           No :323         No :130      No :167         No :339  
 Yes:312           Yes:407         Yes:600      Yes:563         Yes:391  
                                                                         
                                                                         
                                                                         
                                                                         
 Fatigue   SubjectiveFever Headache      Weakness    CoughIntensity
 No : 64   No :230         No :115   None    : 49   None    : 47   
 Yes:666   Yes:500         Yes:615   Mild    :223   Mild    :154   
                                     Moderate:338   Moderate:357   
                                     Severe  :120   Severe  :172   
                                                                   
                                                                   
     Myalgia    RunnyNose AbPain    ChestPain Diarrhea  EyePn     Insomnia 
 None    : 79   No :211   No :639   No :497   No :631   No :617   No :315  
 Mild    :213   Yes:519   Yes: 91   Yes:233   Yes: 99   Yes:113   Yes:415  
 Moderate:325                                                              
 Severe  :113                                                              
                                                                           
                                                                           
 ItchyEye  Nausea    EarPn     Hearing   Pharyngitis Breathless ToothPn  
 No :551   No :475   No :568   No :700   No :119     No :436    No :565  
 Yes:179   Yes:255   Yes:162   Yes: 30   Yes:611     Yes:294    Yes:165  
                                                                         
                                                                         
                                                                         
                                                                         
 Vision    Vomit     Wheeze       BodyTemp     
 No :711   No :652   No :510   Min.   : 97.20  
 Yes: 19   Yes: 78   Yes:220   1st Qu.: 98.20  
                               Median : 98.50  
                               Mean   : 98.94  
                               3rd Qu.: 99.30  
                               Max.   :103.10  

We can see that Vision and Hearing have <50 entries for one category. Let’s remove them.

Remove Vision and Hearing

d3<- 
  d2 %>%
  select(!c(Vision, Hearing))

Now we have a dataframe with 730 observations and 26 variables.

Save RDS

saveRDS(d3, file= here("fluanalysis", "data", "SypAct_clean.rds"))