Is the Binghamton Community Eating Enough Vegetables? A Mixed-Methods Study of Veggie Meter Scores and Lay Beliefs on Food

A Mixed Meethods Review

Author
Affiliation

Ryan Horowitz, Vidhu Kumar, Anna Vishnevetsky, Kaitlyn Zerrenner

Published

November 18, 2025

Abstract

The current foodscape of the United States, along with the nutritional knowledge and perceptions of university students across the country, continues to be a critically understudied topic. University attendees today are especially susceptible to making poor nutritional choices due to the current environment of the Ultra-Processed Food (UPF) industry. A large gap that exists within the current literature is the lack of research on the dietary quality of Binghamton University students. Thus, this study aims to examine the overall dietary scores—or veggie scores—of Binghamton University students and how they correlate with lay beliefs and various sociodemographic variables. This study aims to employ a complementary multiphase concurrent mixed-method review (MMR) as well as objective biological sampling to determine the relationships, if any, between sociodemographics, nutritional beliefs, and veggie scores derived using the VeggieMeter®. Biological data were collected from willing participants who fit various criteria, and survey responses were collected both in-person and online. Once data was collected, it was cleaned and analyzed in R. Qualitative data was coded and analyzed using NVivo. All data was aggregated and cross-referenced in the form of codes. Various codes were referenced in the context of other codes as well as in the context of sociodemographic variables. After all data was collected, aggregated, and analyzed, these values were tested against veggie score, and a complete analysis of the data was conducted. The researchers hypothesized that participants with higher nutritional knowledge and perception scores would receive higher objective biological scores. Several correlations were examined tied to this relationship to determine its driving causes. From this data, the public can be effectively educated on the important link between nutritional knowledge and personal health.

Keywords

health status, veggie score, Age, Sex, college students

1 Results

Participants for the quantitative portion, including both Binghamton University students and Broome County residents, were between 18 and 89 years of age (n = 208) with a mean of 27.22 years. The sex distribution for the quantitative portion was majority female, with 111 females and 82 males. The Binghamton University population identifies as 80.1% White, 4.97% Asian, 5.38% Hispanic or Latino, and 5.09% Black or African American, 2.99% two or more races, 0.0465% Native Hawaiian or Other Pacific Islanders, and 0.0748% American Indian or Alaska Native (Binghamton University | Data USA, n.d.). The sample population for the qualitative portion, in comparison, was 50.7% White, 30.2% Asian, 9.5% two or more races, 6.3% Black, 1.6% other, and 1.6% Middle Eastern. Zero Binghamton University survey participants reported being Hispanic or Latino or Native Hawaiian or Pacific Islander. The sample was similar to the Broome County population in terms of the proportion of White students. Despite this, many racial groups went totally unrepresented or were greatly underrepresented.

1.1 Load

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)

Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha
library(knitr)
library(tibble)
library(dplyr)
library(tidyr)
library(scales)     # for number formatting like comma()

Attaching package: 'scales'

The following objects are masked from 'package:psych':

    alpha, rescale

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(english)    # to convert numbers to words

Attaching package: 'english'

The following object is masked from 'package:scales':

    ordinal
library(stringr)    # for text functions like str_c()
library(NHANES)
library(haven)
library(readxl)
library(tableone)
library(ggpubr)
library(readxl)

1.2 Import Data

library(readxl)
library(dplyr)
primary_data <- read_excel("10.20.2025.perceptionsdata.team5.clean.xlsx", col_names = TRUE)
secondary_data <- read_excel("10.20.2025.data.team5.clean.xlsx", col_names = TRUE)
New names:
• `` -> `...1`
# source:  (Hei & McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/import-once.html 
# explanation: import perceptions survey data as data frame primary_data and Veggie Meter survey data as data frame secondary_data

1.3 Transform

1.3.1 Combine Data Sets

## combine data sets primary_data and secondary_data by variable "PASSWORD"
combined <- primary_data %>%
  left_join(
    secondary_data %>% select("PASSWORD", "VEGGIESCORE", "AGE_1"), by = "PASSWORD",
  )
Warning in left_join(., secondary_data %>% select("PASSWORD", "VEGGIESCORE", : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 86 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
# source: (McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/merge.html
# explain: made a combined data set using the passwords shared by both the perceptions and Veggie Meter surveys.

1.3.2 Summarize Data

primary_data[primary_data == -99] <- NA
head(primary_data, 10)
# A tibble: 10 × 143
   StartDate           EndDate             Status IPAddress       Progress
   <dttm>              <dttm>               <dbl> <chr>              <dbl>
 1 2025-10-01 11:08:40 2025-10-01 11:15:05      0 149.125.187.238      100
 2 2025-10-01 11:28:22 2025-10-01 11:39:49      0 174.254.255.221      100
 3 2025-10-01 11:50:07 2025-10-01 11:50:19      0 172.59.177.124       100
 4 2025-10-01 11:48:30 2025-10-01 11:50:28      0 172.59.177.124       100
 5 2025-10-01 11:38:50 2025-10-01 11:51:45      0 74.254.78.49         100
 6 2025-10-01 11:41:56 2025-10-01 11:53:54      0 149.125.86.214       100
 7 2025-10-01 11:55:57 2025-10-01 12:04:56      0 149.125.51.245       100
 8 2025-10-01 11:52:59 2025-10-01 12:07:43      0 174.254.252.174      100
 9 2025-10-01 12:42:10 2025-10-01 12:51:48      0 104.28.55.254        100
10 2025-10-01 12:54:37 2025-10-01 13:10:53      0 174.254.234.155      100
# ℹ 138 more variables: `Duration (in seconds)` <dbl>, Finished <dbl>,
#   RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
#   RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
#   LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
#   UserLanguage <chr>, Q_RecaptchaScore <dbl>, CONSENT <dbl>,
#   AFFILIATION <dbl>, MEAL_PLAN <dbl>, `MEAL_PLAN_-50_TEXT` <lgl>,
#   VEGGIESCORE <dbl>, PASSWORD <chr>, UNHEALTHY_QUAL <chr>, …
names(primary_data)
  [1] "StartDate"               "EndDate"                
  [3] "Status"                  "IPAddress"              
  [5] "Progress"                "Duration (in seconds)"  
  [7] "Finished"                "RecordedDate"           
  [9] "ResponseId"              "RecipientLastName"      
 [11] "RecipientFirstName"      "RecipientEmail"         
 [13] "ExternalReference"       "LocationLatitude"       
 [15] "LocationLongitude"       "DistributionChannel"    
 [17] "UserLanguage"            "Q_RecaptchaScore"       
 [19] "CONSENT"                 "AFFILIATION"            
 [21] "MEAL_PLAN"               "MEAL_PLAN_-50_TEXT"     
 [23] "VEGGIESCORE"             "PASSWORD"               
 [25] "UNHEALTHY_QUAL"          "HEALTHY_QUAL"           
 [27] "DECISION_QUAL"           "CHOICES_QUAL"           
 [29] "DINING"                  "HEALTHY"                
 [31] "KNOW1"                   "KNOW2"                  
 [33] "KNOW3"                   "KNOW4"                  
 [35] "KNOW5"                   "KNOW6"                  
 [37] "KNOW7"                   "KNOW8"                  
 [39] "NOVA_ACCURACY_0_GROUP"   "NOVA_ACCURACY_1_GROUP"  
 [41] "NOVA_ACCURACY_2_GROUP"   "NOVA_ACCURACY_3_GROUP"  
 [43] "NOVA_ACCURACY_4_GROUP"   "NOVA_ACCURACY_0_1_RANK" 
 [45] "NOVA_ACCURACY_0_4_RANK"  "NOVA_ACCURACY_0_5_RANK" 
 [47] "NOVA_ACCURACY_0_6_RANK"  "NOVA_ACCURACY_0_7_RANK" 
 [49] "NOVA_ACCURACY_0_8_RANK"  "NOVA_ACCURACY_0_9_RANK" 
 [51] "NOVA_ACCURACY_0_10_RANK" "NOVA_ACCURACY_0_13_RANK"
 [53] "NOVA_ACCURACY_1_1_RANK"  "NOVA_ACCURACY_1_4_RANK" 
 [55] "NOVA_ACCURACY_1_5_RANK"  "NOVA_ACCURACY_1_6_RANK" 
 [57] "NOVA_ACCURACY_1_7_RANK"  "NOVA_ACCURACY_1_8_RANK" 
 [59] "NOVA_ACCURACY_1_9_RANK"  "NOVA_ACCURACY_1_10_RANK"
 [61] "NOVA_ACCURACY_1_13_RANK" "NOVA_ACCURACY_2_1_RANK" 
 [63] "NOVA_ACCURACY_2_4_RANK"  "NOVA_ACCURACY_2_5_RANK" 
 [65] "NOVA_ACCURACY_2_6_RANK"  "NOVA_ACCURACY_2_7_RANK" 
 [67] "NOVA_ACCURACY_2_8_RANK"  "NOVA_ACCURACY_2_9_RANK" 
 [69] "NOVA_ACCURACY_2_10_RANK" "NOVA_ACCURACY_2_13_RANK"
 [71] "NOVA_ACCURACY_3_1_RANK"  "NOVA_ACCURACY_3_4_RANK" 
 [73] "NOVA_ACCURACY_3_5_RANK"  "NOVA_ACCURACY_3_6_RANK" 
 [75] "NOVA_ACCURACY_3_7_RANK"  "NOVA_ACCURACY_3_8_RANK" 
 [77] "NOVA_ACCURACY_3_9_RANK"  "NOVA_ACCURACY_3_10_RANK"
 [79] "NOVA_ACCURACY_3_13_RANK" "NOVA_ACCURACY_4_1_RANK" 
 [81] "NOVA_ACCURACY_4_4_RANK"  "NOVA_ACCURACY_4_5_RANK" 
 [83] "NOVA_ACCURACY_4_6_RANK"  "NOVA_ACCURACY_4_7_RANK" 
 [85] "NOVA_ACCURACY_4_8_RANK"  "NOVA_ACCURACY_4_9_RANK" 
 [87] "NOVA_ACCURACY_4_10_RANK" "NOVA_ACCURACY_4_13_RANK"
 [89] "CALORIES_1"              "CALORIES_2"             
 [91] "CALORIES_3"              "CALORIES_4"             
 [93] "CALORIES_5"              "CALORIES_6"             
 [95] "CALORIES_7"              "CALORIES_8"             
 [97] "CALORIES_9"              "SATFAT_1"               
 [99] "SATFAT_2"                "SATFAT_3"               
[101] "SATFAT_4"                "SATFAT_5"               
[103] "SATFAT_6"                "SATFAT_7"               
[105] "SATFAT_8"                "SATFAT_9"               
[107] "CARBS_1"                 "CARBS_2"                
[109] "CARBS_3"                 "CARBS_4"                
[111] "CARBS_5"                 "CARBS_6"                
[113] "CARBS_7"                 "CARBS_8"                
[115] "CARBS_9"                 "SUGAR_1"                
[117] "SUGAR_2"                 "SUGAR_3"                
[119] "SUGAR_4"                 "SUGAR_5"                
[121] "SUGAR_6"                 "SUGAR_7"                
[123] "SUGAR_8"                 "SUGAR_9"                
[125] "PROTEIN_1"               "PROTEIN_2"              
[127] "PROTEIN_3"               "PROTEIN_4"              
[129] "PROTEIN_5"               "PROTEIN_6"              
[131] "PROTEIN_7"               "PROTEIN_8"              
[133] "PROTEIN_9"               "ILLNESS_1"              
[135] "AGE"                     "HEALTHSTATUS"           
[137] "HOPE"                    "ANXIETY"                
[139] "GENDER"                  "RACIALIZED"             
[141] "RACIALIZED_8_TEXT"       "SOCIALSTATUS"           
[143] "INCOME"                 
summary(primary_data)
   StartDate                      EndDate                        Status 
 Min.   :2025-09-26 13:34:04   Min.   :2025-09-26 13:57:36   Min.   :0  
 1st Qu.:2025-10-03 12:11:36   1st Qu.:2025-10-03 12:21:32   1st Qu.:0  
 Median :2025-10-08 13:24:48   Median :2025-10-08 13:27:30   Median :0  
 Mean   :2025-10-08 13:34:56   Mean   :2025-10-08 13:57:32   Mean   :0  
 3rd Qu.:2025-10-13 13:05:30   3rd Qu.:2025-10-13 13:16:45   3rd Qu.:0  
 Max.   :2025-10-19 00:06:38   Max.   :2025-10-19 00:29:46   Max.   :0  
                                                                        
  IPAddress            Progress      Duration (in seconds)    Finished     
 Length:187         Min.   :  3.00   Min.   :    5.0       Min.   :0.0000  
 Class :character   1st Qu.: 16.00   1st Qu.:   23.0       1st Qu.:0.0000  
 Mode  :character   Median :100.00   Median :  124.0       Median :1.0000  
                    Mean   : 70.88   Mean   : 1355.5       Mean   :0.6364  
                    3rd Qu.:100.00   3rd Qu.:  778.5       3rd Qu.:1.0000  
                    Max.   :100.00   Max.   :80341.0       Max.   :1.0000  
                                                                           
  RecordedDate                  ResponseId        RecipientLastName
 Min.   :2025-10-01 11:15:05   Length:187         Mode:logical     
 1st Qu.:2025-10-08 12:42:28   Class :character   NA's:187         
 Median :2025-10-10 14:03:43   Mode  :character                    
 Mean   :2025-10-11 03:03:06                                       
 3rd Qu.:2025-10-15 14:45:23                                       
 Max.   :2025-10-19 11:36:44                                       
                                                                   
 RecipientFirstName RecipientEmail ExternalReference LocationLatitude
 Mode:logical       Mode:logical   Mode:logical      Min.   :39.95   
 NA's:187           NA's:187       NA's:187          1st Qu.:40.77   
                                                     Median :41.53   
                                                     Mean   :41.60   
                                                     3rd Qu.:42.10   
                                                     Max.   :43.18   
                                                                     
 LocationLongitude DistributionChannel UserLanguage       Q_RecaptchaScore
 Min.   :-76.22    Length:187          Length:187         Min.   :0.4000  
 1st Qu.:-75.89    Class :character    Class :character   1st Qu.:1.0000  
 Median :-74.05    Mode  :character    Mode  :character   Median :1.0000  
 Mean   :-74.73                                           Mean   :0.9775  
 3rd Qu.:-73.90                                           3rd Qu.:1.0000  
 Max.   :-71.06                                           Max.   :1.0000  
                                                                          
    CONSENT        AFFILIATION      MEAL_PLAN      MEAL_PLAN_-50_TEXT
 Min.   :0.0000   Min.   :1.000   Min.   :-50.00   Mode:logical      
 1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:  3.00   NA's:187          
 Median :1.0000   Median :1.000   Median :  3.00                     
 Mean   :0.9462   Mean   :1.382   Mean   :  1.36                     
 3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:  6.00                     
 Max.   :1.0000   Max.   :5.000   Max.   :  9.00                     
 NA's   :1        NA's   :17      NA's   :62                         
  VEGGIESCORE       PASSWORD         UNHEALTHY_QUAL     HEALTHY_QUAL      
 Min.   :0.0000   Length:187         Length:187         Length:187        
 1st Qu.:1.0000   Class :character   Class :character   Class :character  
 Median :1.0000   Mode  :character   Mode  :character   Mode  :character  
 Mean   :0.9278                                                           
 3rd Qu.:1.0000                                                           
 Max.   :1.0000                                                           
 NA's   :90                                                               
 DECISION_QUAL      CHOICES_QUAL           DINING         HEALTHY     
 Length:187         Length:187         Min.   :1.000   Min.   :1.000  
 Class :character   Class :character   1st Qu.:2.000   1st Qu.:1.000  
 Mode  :character   Mode  :character   Median :2.000   Median :2.000  
                                       Mean   :2.036   Mean   :1.929  
                                       3rd Qu.:2.000   3rd Qu.:2.000  
                                       Max.   :4.000   Max.   :4.000  
                                       NA's   :159     NA's   :159    
     KNOW1            KNOW2            KNOW3            KNOW4       
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000  
 Median :0.0000   Median :1.0000   Median :1.0000   Median :1.0000  
 Mean   :0.3671   Mean   :0.7975   Mean   :0.9747   Mean   :0.9241  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :108      NA's   :108      NA's   :108      NA's   :108     
     KNOW5            KNOW6            KNOW7            KNOW8       
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000  
 Median :1.0000   Median :1.0000   Median :1.0000   Median :1.0000  
 Mean   :0.6835   Mean   :0.8861   Mean   :0.9241   Mean   :0.8481  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :108      NA's   :108      NA's   :108      NA's   :108     
 NOVA_ACCURACY_0_GROUP NOVA_ACCURACY_1_GROUP NOVA_ACCURACY_2_GROUP
 Length:187            Length:187            Length:187           
 Class :character      Class :character      Class :character     
 Mode  :character      Mode  :character      Mode  :character     
                                                                  
                                                                  
                                                                  
                                                                  
 NOVA_ACCURACY_3_GROUP NOVA_ACCURACY_4_GROUP NOVA_ACCURACY_0_1_RANK
 Length:187            Length:187            Min.   :1.000         
 Class :character      Class :character      1st Qu.:1.500         
 Mode  :character      Mode  :character      Median :2.000         
                                             Mean   :1.667         
                                             3rd Qu.:2.000         
                                             Max.   :2.000         
                                             NA's   :184           
 NOVA_ACCURACY_0_4_RANK NOVA_ACCURACY_0_5_RANK NOVA_ACCURACY_0_6_RANK
 Min.   :1.000          Min.   :1.0            Min.   :1.000         
 1st Qu.:1.000          1st Qu.:1.5            1st Qu.:2.000         
 Median :2.000          Median :2.0            Median :3.000         
 Mean   :2.286          Mean   :2.0            Mean   :3.097         
 3rd Qu.:3.000          3rd Qu.:2.5            3rd Qu.:4.000         
 Max.   :5.000          Max.   :3.0            Max.   :6.000         
 NA's   :166            NA's   :185            NA's   :156           
 NOVA_ACCURACY_0_7_RANK NOVA_ACCURACY_0_8_RANK NOVA_ACCURACY_0_9_RANK
 Min.   :1.00           Min.   :1.00           Min.   :1.000         
 1st Qu.:1.00           1st Qu.:1.00           1st Qu.:2.000         
 Median :2.00           Median :2.00           Median :3.000         
 Mean   :2.30           Mean   :2.22           Mean   :3.222         
 3rd Qu.:2.75           3rd Qu.:3.00           3rd Qu.:5.000         
 Max.   :8.00           Max.   :7.00           Max.   :5.000         
 NA's   :157            NA's   :137            NA's   :178           
 NOVA_ACCURACY_0_10_RANK NOVA_ACCURACY_0_13_RANK NOVA_ACCURACY_1_1_RANK
 Min.   :1.000           Min.   :1.000           Min.   :1.000         
 1st Qu.:1.000           1st Qu.:2.500           1st Qu.:1.500         
 Median :2.000           Median :3.000           Median :3.000         
 Mean   :2.244           Mean   :3.455           Mean   :2.947         
 3rd Qu.:3.000           3rd Qu.:4.500           3rd Qu.:4.000         
 Max.   :5.000           Max.   :6.000           Max.   :8.000         
 NA's   :142             NA's   :176             NA's   :168           
 NOVA_ACCURACY_1_4_RANK NOVA_ACCURACY_1_5_RANK NOVA_ACCURACY_1_6_RANK
 Min.   :1.000          Min.   :1.000          Min.   :1.0           
 1st Qu.:1.000          1st Qu.:1.750          1st Qu.:2.0           
 Median :2.000          Median :2.000          Median :2.0           
 Mean   :2.217          Mean   :2.833          Mean   :2.5           
 3rd Qu.:3.000          3rd Qu.:3.250          3rd Qu.:3.0           
 Max.   :6.000          Max.   :9.000          Max.   :6.0           
 NA's   :164            NA's   :175            NA's   :155           
 NOVA_ACCURACY_1_7_RANK NOVA_ACCURACY_1_8_RANK NOVA_ACCURACY_1_9_RANK
 Min.   :1.000          Min.   :1.000          Min.   :1.00          
 1st Qu.:1.000          1st Qu.:1.000          1st Qu.:1.00          
 Median :2.000          Median :1.500          Median :2.00          
 Mean   :2.533          Mean   :2.222          Mean   :2.45          
 3rd Qu.:3.750          3rd Qu.:2.750          3rd Qu.:3.00          
 Max.   :7.000          Max.   :6.000          Max.   :6.00          
 NA's   :157            NA's   :169            NA's   :167           
 NOVA_ACCURACY_1_10_RANK NOVA_ACCURACY_1_13_RANK NOVA_ACCURACY_2_1_RANK
 Min.   :1.000           Min.   :1.000           Min.   :1.000         
 1st Qu.:1.000           1st Qu.:1.000           1st Qu.:1.000         
 Median :3.000           Median :2.000           Median :2.000         
 Mean   :2.619           Mean   :2.316           Mean   :1.852         
 3rd Qu.:3.000           3rd Qu.:3.000           3rd Qu.:2.000         
 Max.   :8.000           Max.   :7.000           Max.   :5.000         
 NA's   :166             NA's   :149             NA's   :160           
 NOVA_ACCURACY_2_4_RANK NOVA_ACCURACY_2_5_RANK NOVA_ACCURACY_2_6_RANK
 Min.   :1.0            Min.   :1.000          Min.   :1             
 1st Qu.:1.0            1st Qu.:1.000          1st Qu.:1             
 Median :2.0            Median :2.000          Median :2             
 Mean   :2.4            Mean   :1.808          Mean   :2             
 3rd Qu.:3.0            3rd Qu.:2.000          3rd Qu.:3             
 Max.   :6.0            Max.   :4.000          Max.   :3             
 NA's   :167            NA's   :161            NA's   :182           
 NOVA_ACCURACY_2_7_RANK NOVA_ACCURACY_2_8_RANK NOVA_ACCURACY_2_9_RANK
 Min.   :1.000          Min.   :1.000          Min.   :1             
 1st Qu.:1.000          1st Qu.:1.000          1st Qu.:1             
 Median :2.000          Median :1.000          Median :2             
 Mean   :1.778          Mean   :1.667          Mean   :2             
 3rd Qu.:2.000          3rd Qu.:2.000          3rd Qu.:2             
 Max.   :3.000          Max.   :3.000          Max.   :5             
 NA's   :178            NA's   :184            NA's   :154           
 NOVA_ACCURACY_2_10_RANK NOVA_ACCURACY_2_13_RANK NOVA_ACCURACY_3_1_RANK
 Min.   :1               Min.   :1.000           Min.   :1.000         
 1st Qu.:1               1st Qu.:1.500           1st Qu.:1.000         
 Median :1               Median :2.000           Median :1.000         
 Mean   :1               Mean   :2.316           Mean   :1.273         
 3rd Qu.:1               3rd Qu.:3.000           3rd Qu.:1.750         
 Max.   :1               Max.   :4.000           Max.   :2.000         
 NA's   :183             NA's   :168             NA's   :165           
 NOVA_ACCURACY_3_4_RANK NOVA_ACCURACY_3_5_RANK NOVA_ACCURACY_3_6_RANK
 Min.   :1.000          Min.   :1.000          Min.   :3             
 1st Qu.:1.000          1st Qu.:1.000          1st Qu.:3             
 Median :1.500          Median :1.000          Median :3             
 Mean   :1.833          Mean   :1.452          Mean   :3             
 3rd Qu.:2.750          3rd Qu.:2.000          3rd Qu.:3             
 Max.   :3.000          Max.   :3.000          Max.   :3             
 NA's   :181            NA's   :156            NA's   :186           
 NOVA_ACCURACY_3_7_RANK NOVA_ACCURACY_3_8_RANK NOVA_ACCURACY_3_9_RANK
 Min.   :2              Mode:logical           Min.   :1.000         
 1st Qu.:2              NA's:187               1st Qu.:1.000         
 Median :2                                     Median :2.000         
 Mean   :2                                     Mean   :1.556         
 3rd Qu.:2                                     3rd Qu.:2.000         
 Max.   :2                                     Max.   :2.000         
 NA's   :186                                   NA's   :178           
 NOVA_ACCURACY_3_10_RANK NOVA_ACCURACY_3_13_RANK NOVA_ACCURACY_4_1_RANK
 Mode:logical            Min.   :1.0             Min.   :1             
 NA's:187                1st Qu.:1.5             1st Qu.:1             
                         Median :2.0             Median :1             
                         Mean   :2.0             Mean   :1             
                         3rd Qu.:2.5             3rd Qu.:1             
                         Max.   :3.0             Max.   :1             
                         NA's   :184             NA's   :186           
 NOVA_ACCURACY_4_4_RANK NOVA_ACCURACY_4_5_RANK NOVA_ACCURACY_4_6_RANK
 Min.   :3.0            Min.   :1              Min.   :1             
 1st Qu.:3.5            1st Qu.:1              1st Qu.:1             
 Median :4.0            Median :1              Median :1             
 Mean   :4.0            Mean   :1              Mean   :1             
 3rd Qu.:4.5            3rd Qu.:1              3rd Qu.:1             
 Max.   :5.0            Max.   :1              Max.   :1             
 NA's   :185            NA's   :186            NA's   :184           
 NOVA_ACCURACY_4_7_RANK NOVA_ACCURACY_4_8_RANK NOVA_ACCURACY_4_9_RANK
 Min.   :2.00           Min.   :6              Min.   :7             
 1st Qu.:2.25           1st Qu.:6              1st Qu.:7             
 Median :2.50           Median :6              Median :7             
 Mean   :2.50           Mean   :6              Mean   :7             
 3rd Qu.:2.75           3rd Qu.:6              3rd Qu.:7             
 Max.   :3.00           Max.   :6              Max.   :7             
 NA's   :185            NA's   :186            NA's   :186           
 NOVA_ACCURACY_4_10_RANK NOVA_ACCURACY_4_13_RANK   CALORIES_1   
 Min.   :2.0             Min.   :2               Min.   : 50.0  
 1st Qu.:2.5             1st Qu.:2               1st Qu.:102.0  
 Median :3.0             Median :2               Median :185.0  
 Mean   :3.0             Mean   :2               Mean   :191.3  
 3rd Qu.:3.5             3rd Qu.:2               3rd Qu.:251.5  
 Max.   :4.0             Max.   :2               Max.   :500.0  
 NA's   :185             NA's   :186             NA's   :116    
   CALORIES_2      CALORIES_3      CALORIES_4      CALORIES_5   
 Min.   : 23.0   Min.   : 31.0   Min.   : 20.0   Min.   : 21.0  
 1st Qu.: 97.5   1st Qu.:170.5   1st Qu.:100.0   1st Qu.:150.0  
 Median :139.0   Median :207.0   Median :132.0   Median :197.0  
 Mean   :142.9   Mean   :233.0   Mean   :142.8   Mean   :200.1  
 3rd Qu.:180.5   3rd Qu.:286.0   3rd Qu.:182.0   3rd Qu.:246.0  
 Max.   :342.0   Max.   :500.0   Max.   :334.0   Max.   :497.0  
 NA's   :116     NA's   :116     NA's   :116     NA's   :116    
   CALORIES_6       CALORIES_7      CALORIES_8      CALORIES_9   
 Min.   : 16.00   Min.   :129.0   Min.   : 19.0   Min.   : 30.0  
 1st Qu.: 59.00   1st Qu.:247.5   1st Qu.: 72.0   1st Qu.:135.0  
 Median : 90.00   Median :286.0   Median :105.0   Median :181.0  
 Mean   : 99.69   Mean   :298.6   Mean   :131.5   Mean   :203.3  
 3rd Qu.:123.00   3rd Qu.:348.5   3rd Qu.:162.5   3rd Qu.:265.0  
 Max.   :293.00   Max.   :500.0   Max.   :500.0   Max.   :500.0  
 NA's   :116      NA's   :116     NA's   :116     NA's   :116    
    SATFAT_1        SATFAT_2       SATFAT_3        SATFAT_4     
 Min.   : 0.00   Min.   : 0.0   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 8.25   1st Qu.: 3.0   1st Qu.: 6.00   1st Qu.: 2.000  
 Median :10.50   Median : 4.0   Median :11.00   Median : 4.000  
 Mean   :11.38   Mean   : 5.2   Mean   :10.51   Mean   : 3.883  
 3rd Qu.:15.00   3rd Qu.: 7.0   3rd Qu.:15.00   3rd Qu.: 5.000  
 Max.   :20.00   Max.   :17.0   Max.   :20.00   Max.   :15.000  
 NA's   :121     NA's   :122    NA's   :122     NA's   :127     
    SATFAT_5         SATFAT_6       SATFAT_7        SATFAT_8     
 Min.   : 0.000   Min.   : 0.0   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 3.000   1st Qu.: 1.0   1st Qu.: 8.00   1st Qu.: 1.750  
 Median : 4.000   Median : 2.0   Median :12.00   Median : 3.000  
 Mean   : 5.422   Mean   : 3.4   Mean   :11.34   Mean   : 3.783  
 3rd Qu.: 7.000   3rd Qu.: 4.0   3rd Qu.:15.00   3rd Qu.: 5.000  
 Max.   :20.000   Max.   :15.0   Max.   :20.00   Max.   :14.000  
 NA's   :123      NA's   :127    NA's   :122     NA's   :127     
    SATFAT_9         CARBS_1          CARBS_2          CARBS_3     
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 2.00  
 1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 4.000   1st Qu.:12.00  
 Median : 5.000   Median : 6.000   Median : 6.000   Median :17.00  
 Mean   : 6.238   Mean   : 7.933   Mean   : 7.397   Mean   :16.94  
 3rd Qu.: 9.000   3rd Qu.:11.250   3rd Qu.:10.000   3rd Qu.:21.00  
 Max.   :16.000   Max.   :30.000   Max.   :25.000   Max.   :30.00  
 NA's   :124      NA's   :127      NA's   :129      NA's   :122    
    CARBS_4         CARBS_5          CARBS_6          CARBS_7     
 Min.   : 2.00   Min.   : 0.000   Min.   : 0.000   Min.   : 2.00  
 1st Qu.: 8.00   1st Qu.: 4.000   1st Qu.: 3.000   1st Qu.: 9.00  
 Median :13.50   Median : 7.000   Median : 5.000   Median :14.00  
 Mean   :14.86   Mean   : 8.367   Mean   : 6.576   Mean   :13.73  
 3rd Qu.:21.00   3rd Qu.:10.250   3rd Qu.: 9.000   3rd Qu.:17.25  
 Max.   :30.00   Max.   :24.000   Max.   :21.000   Max.   :30.00  
 NA's   :123     NA's   :127      NA's   :128      NA's   :123    
    CARBS_8          CARBS_9         SUGAR_1          SUGAR_2      
 Min.   : 0.000   Min.   : 4.00   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 3.000   1st Qu.:10.50   1st Qu.: 2.000   1st Qu.: 2.000  
 Median : 5.000   Median :15.00   Median : 3.500   Median : 3.000  
 Mean   : 6.456   Mean   :15.95   Mean   : 4.661   Mean   : 3.368  
 3rd Qu.: 8.000   3rd Qu.:22.00   3rd Qu.: 6.250   3rd Qu.: 5.000  
 Max.   :20.000   Max.   :30.00   Max.   :15.000   Max.   :11.000  
 NA's   :130      NA's   :124     NA's   :131      NA's   :130     
    SUGAR_3          SUGAR_4          SUGAR_5         SUGAR_6      
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 3.00   1st Qu.: 2.250  
 Median : 4.500   Median : 3.000   Median : 5.00   Median : 4.000  
 Mean   : 5.167   Mean   : 4.379   Mean   : 5.15   Mean   : 4.707  
 3rd Qu.: 7.250   3rd Qu.: 6.750   3rd Qu.: 7.00   3rd Qu.: 6.000  
 Max.   :15.000   Max.   :13.000   Max.   :12.00   Max.   :13.000  
 NA's   :127      NA's   :129      NA's   :127     NA's   :129     
    SUGAR_7          SUGAR_8          SUGAR_9         PROTEIN_1    
 Min.   : 2.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
 1st Qu.: 4.000   1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.: 7.00  
 Median : 6.000   Median : 3.000   Median : 4.000   Median :10.00  
 Mean   : 6.677   Mean   : 3.661   Mean   : 4.772   Mean   :11.34  
 3rd Qu.: 9.000   3rd Qu.: 5.000   3rd Qu.: 6.000   3rd Qu.:16.00  
 Max.   :15.000   Max.   :13.000   Max.   :15.000   Max.   :30.00  
 NA's   :125      NA's   :128      NA's   :130      NA's   :122    
   PROTEIN_2       PROTEIN_3        PROTEIN_4       PROTEIN_5    
 Min.   : 3.00   Min.   : 0.000   Min.   : 0.00   Min.   : 2.00  
 1st Qu.: 9.00   1st Qu.: 1.750   1st Qu.: 1.75   1st Qu.:13.00  
 Median :12.00   Median : 2.500   Median : 3.00   Median :17.00  
 Mean   :13.17   Mean   : 3.732   Mean   : 4.75   Mean   :17.95  
 3rd Qu.:17.00   3rd Qu.: 5.000   3rd Qu.: 6.00   3rd Qu.:23.00  
 Max.   :30.00   Max.   :14.000   Max.   :19.00   Max.   :30.00  
 NA's   :122     NA's   :131      NA's   :135     NA's   :122    
   PROTEIN_6       PROTEIN_7       PROTEIN_8        PROTEIN_9     
 Min.   : 0.00   Min.   : 5.00   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 2.00   1st Qu.:11.00   1st Qu.: 2.000   1st Qu.: 2.000  
 Median : 3.00   Median :15.00   Median : 3.000   Median : 3.000  
 Mean   : 3.92   Mean   :16.48   Mean   : 4.868   Mean   : 4.966  
 3rd Qu.: 5.00   3rd Qu.:22.00   3rd Qu.: 5.000   3rd Qu.: 7.750  
 Max.   :17.00   Max.   :30.00   Max.   :30.000   Max.   :30.000  
 NA's   :137     NA's   :123     NA's   :134      NA's   :129     
   ILLNESS_1          AGE             HEALTHSTATUS      HOPE      
 Min.   : 1.000   Length:187         Min.   :1.0   Min.   :1.000  
 1st Qu.: 1.000   Class :character   1st Qu.:3.0   1st Qu.:3.000  
 Median : 1.000   Mode  :character   Median :3.0   Median :4.000  
 Mean   : 2.394                      Mean   :3.2   Mean   :3.625  
 3rd Qu.: 4.000                      3rd Qu.:4.0   3rd Qu.:4.000  
 Max.   :10.000                      Max.   :5.0   Max.   :5.000  
 NA's   :121                         NA's   :122   NA's   :123    
    ANXIETY          GENDER        RACIALIZED        RACIALIZED_8_TEXT 
 Min.   :1.000   Min.   :0.0000   Length:187         Length:187        
 1st Qu.:2.750   1st Qu.:0.0000   Class :character   Class :character  
 Median :3.000   Median :0.0000   Mode  :character   Mode  :character  
 Mean   :3.359   Mean   :0.2923                                        
 3rd Qu.:4.000   3rd Qu.:1.0000                                        
 Max.   :5.000   Max.   :1.0000                                        
 NA's   :123     NA's   :122                                           
  SOCIALSTATUS     INCOME     
 Min.   :3.0   Min.   :1.000  
 1st Qu.:6.0   1st Qu.:3.000  
 Median :7.0   Median :3.000  
 Mean   :6.6   Mean   :3.185  
 3rd Qu.:7.0   3rd Qu.:4.000  
 Max.   :9.0   Max.   :5.000  
 NA's   :122   NA's   :122    
secondary_data[secondary_data == -99] <- NA
head(secondary_data, 10)
# A tibble: 10 × 31
   ...1                EndDate             Status IPAddress       Progress
   <dttm>              <dttm>               <dbl> <chr>              <dbl>
 1 2025-10-01 11:18:55 2025-10-01 11:19:59      0 172.59.182.54        100
 2 2025-10-01 11:23:43 2025-10-01 11:25:09      0 149.125.156.227      100
 3 2025-10-01 11:27:33 2025-10-01 11:28:19      0 174.254.255.221      100
 4 2025-10-01 11:37:23 2025-10-01 11:38:47      0 149.125.141.26       100
 5 2025-10-01 11:49:28 2025-10-01 11:50:04      0 172.59.177.124       100
 6 2025-10-01 11:51:53 2025-10-01 11:52:57      0 149.125.202.46       100
 7 2025-10-01 11:54:52 2025-10-01 11:55:55      0 149.125.51.245       100
 8 2025-10-01 12:01:07 2025-10-01 12:05:09      0 149.125.46.181       100
 9 2025-10-01 12:07:36 2025-10-01 12:08:38      0 149.125.54.62        100
10 2025-10-01 12:06:25 2025-10-01 12:11:11      0 149.125.141.18       100
# ℹ 26 more variables: `Duration (in seconds)` <dbl>, Finished <dbl>,
#   RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
#   RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
#   LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
#   UserLanguage <chr>, Consent <dbl>, PASSWORD <chr>, VEGGIESCORE <chr>,
#   CONSUMPTION <dbl>, SEX <dbl>, AGE_1 <dbl>, LOCATION <dbl>,
#   HEALTHSTATUS <lgl>, HOPE <lgl>, ANXIETY <lgl>, RACIALIZED <lgl>, …
names(secondary_data)
 [1] "...1"                  "EndDate"               "Status"               
 [4] "IPAddress"             "Progress"              "Duration (in seconds)"
 [7] "Finished"              "RecordedDate"          "ResponseId"           
[10] "RecipientLastName"     "RecipientFirstName"    "RecipientEmail"       
[13] "ExternalReference"     "LocationLatitude"      "LocationLongitude"    
[16] "DistributionChannel"   "UserLanguage"          "Consent"              
[19] "PASSWORD"              "VEGGIESCORE"           "CONSUMPTION"          
[22] "SEX"                   "AGE_1"                 "LOCATION"             
[25] "HEALTHSTATUS"          "HOPE"                  "ANXIETY"              
[28] "RACIALIZED"            "RACIALIZED_8_TEXT"     "SOCIALSTATUS"         
[31] "INCOME"               
summary(secondary_data)
      ...1                        EndDate                        Status        
 Min.   :2025-09-26 13:57:24   Min.   :2025-09-26 13:57:33   Min.   :0.000000  
 1st Qu.:2025-10-03 13:17:06   1st Qu.:2025-10-03 13:19:31   1st Qu.:0.000000  
 Median :2025-10-09 15:38:01   Median :2025-10-09 15:56:26   Median :0.000000  
 Mean   :2025-10-09 23:35:47   Mean   :2025-10-10 00:09:12   Mean   :0.009709  
 3rd Qu.:2025-10-15 14:53:10   3rd Qu.:2025-10-15 14:56:13   3rd Qu.:0.000000  
 Max.   :2025-10-18 12:50:27   Max.   :2025-10-18 16:37:22   Max.   :1.000000  
                                                                               
  IPAddress            Progress      Duration (in seconds)    Finished     
 Length:309         Min.   :  7.00   Min.   :     5        Min.   :0.0000  
 Class :character   1st Qu.:100.00   1st Qu.:    99        1st Qu.:1.0000  
 Mode  :character   Median :100.00   Median :   171        Median :1.0000  
                    Mean   : 92.52   Mean   :  2005        Mean   :0.8964  
                    3rd Qu.:100.00   3rd Qu.:   249        3rd Qu.:1.0000  
                    Max.   :100.00   Max.   :348270        Max.   :1.0000  
                                                                           
  RecordedDate                  ResponseId        RecipientLastName
 Min.   :2025-10-01 11:19:59   Length:309         Mode:logical     
 1st Qu.:2025-10-03 13:58:10   Class :character   NA's:309         
 Median :2025-10-10 12:01:18   Mode  :character                    
 Mean   :2025-10-10 17:33:06                                       
 3rd Qu.:2025-10-16 18:13:46                                       
 Max.   :2025-10-19 12:00:20                                       
                                                                   
 RecipientFirstName RecipientEmail ExternalReference LocationLatitude
 Mode:logical       Mode:logical   Mode:logical      Min.   :38.05   
 NA's:309           NA's:309       NA's:309          1st Qu.:40.78   
                                                     Median :41.60   
                                                     Mean   :41.65   
                                                     3rd Qu.:42.10   
                                                     Max.   :47.61   
                                                                     
 LocationLongitude DistributionChannel UserLanguage          Consent      
 Min.   :-122.33   Length:309          Length:309         Min.   :0.0000  
 1st Qu.: -75.89   Class :character    Class :character   1st Qu.:1.0000  
 Median : -74.36   Mode  :character    Mode  :character   Median :1.0000  
 Mean   : -75.12                                          Mean   :0.9968  
 3rd Qu.: -73.91                                          3rd Qu.:1.0000  
 Max.   : -71.06                                          Max.   :1.0000  
                                                          NA's   :1       
   PASSWORD         VEGGIESCORE         CONSUMPTION         SEX        
 Length:309         Length:309         Min.   :1.000   Min.   :0.0000  
 Class :character   Class :character   1st Qu.:4.000   1st Qu.:0.0000  
 Mode  :character   Mode  :character   Median :5.000   Median :1.0000  
                                       Mean   :4.492   Mean   :0.5607  
                                       3rd Qu.:5.000   3rd Qu.:1.0000  
                                       Max.   :6.000   Max.   :1.0000  
                                       NA's   :110     NA's   :29      
     AGE_1          LOCATION      HEALTHSTATUS     HOPE         ANXIETY       
 Min.   :18.00   Min.   :0.0000   Mode:logical   Mode:logical   Mode:logical  
 1st Qu.:18.00   1st Qu.:0.0000   NA's:309       NA's:309       NA's:309      
 Median :19.00   Median :1.0000                                               
 Mean   :25.09   Mean   :0.7044                                               
 3rd Qu.:21.00   3rd Qu.:1.0000                                               
 Max.   :89.00   Max.   :1.0000                                               
 NA's   :26      NA's   :106                                                  
 RACIALIZED     RACIALIZED_8_TEXT SOCIALSTATUS    INCOME       
 Mode:logical   Mode:logical      Mode:logical   Mode:logical  
 NA's:309       NA's:309          NA's:309       NA's:309      
                                                               
                                                               
                                                               
                                                               
                                                               
# source: (Hei & McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/import-once.html 
# explanation: Filtered out NA responses from primary_data and secondary_data; used names() and summary() functions to see the column names and summarize the column descriptive statistics from the data frame based on code from the FRI playbook (Hei & McCarty, 2025)

1.3.3 Transform Veggie Scores

secondary_data$VEGGIESCORE <- as.numeric(secondary_data$VEGGIESCORE)
Warning: NAs introduced by coercion
# add a filtered VEGGIE column to secondary_data 
secondary_data <- secondary_data %>%
  mutate(
    VEGGIE = as.numeric(VEGGIESCORE),
  ) %>%
  filter(VEGGIE >= 50 & VEGGIE <= 800)
# add a filtered VEGGIE column to combined
combined <- combined %>%
  mutate(
    VEGGIE = as.numeric(VEGGIESCORE.y),
    HEALTHSTATUS = as.numeric(HEALTHSTATUS)
  ) %>%
  filter(VEGGIE >= 50 & VEGGIE <= 800)
# source: (Estreich, 2025) https://shanemccarty.github.io/FRIplaybook/dplyr.html 
# explanation: filtered out outlier variables from 'VEGGIESCORE' in secondary_data

1.3.4 Transform Variables for Significance Tests

library(dplyr)
# filter 'GENDER' to only include two groups for independent sample t-test
primary_data <- primary_data %>%
  filter(
    GENDER %in% c(0,1)
  ) %>%
  mutate(
    GENDER = factor(GENDER, levels = c(0,1), labels = c("Male" , "Female"))
  ) %>%
  drop_na(
    GENDER, HEALTHSTATUS
  )
# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf 
# explanation: filtered column 'GENDER' in primary_data to only include two values 'Male' and 'Female'
library(dplyr)
# filter 'GENDER' to only include two gruops for pearson correlation
combined <- combined %>%
  filter(
    GENDER %in% c(0,1)
  ) %>%
  mutate(
    GENDER = factor(GENDER, levels = c(0,1), labels = c("Male" , "Female"))
  )
# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf 
# explanation: filtered column 'GENDER' in combined to only include two values 'Male' and 'Female'
library(dplyr)
# filter 'SEX' to only include two groups for independent samples t-test
secondary_data <- secondary_data %>%
filter(
  SEX %in% c(0,1)
  ) %>%
mutate(
  SEX = factor(SEX, levels = c(0,1), labels = c("Male" , "Female"))
) %>%
drop_na(
  VEGGIE, SEX
)
# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf 
# explanation: filtered column 'SEX' to only include two values 'Male' and 'Female'
library(dplyr)
# filter 'LOCATION' to only include two groups 
secondary_data <- secondary_data %>%
filter(
  LOCATION %in% c(0,1)
  ) %>%
mutate(
  LOCATION = factor(LOCATION, levels = c(0,1), labels = c("Farmers Market" , "Binghamton University"))
) %>%
drop_na(
  VEGGIESCORE, LOCATION
)

summary(secondary_data$LOCATION)
       Farmers Market Binghamton University 
                   55                   138 
summary(secondary_data$VEGGIESCORE)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   65.0   245.0   299.0   298.5   342.0   752.0 
# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf 
# explanation: filtered column 'LOCATION' to only include two values 'FARMERS MARKET' and 'BINGHAMTON UNIVERSITY'

1.4 Visualize

# calculate descriptive statistics (mean, sd) for 'SEX', 'VEGGIE', and 'AGE_1'
summary(secondary_data$SEX)
  Male Female 
    82    111 
summary(secondary_data$VEGGIE)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   65.0   245.0   299.0   298.5   342.0   752.0 
summary(secondary_data$AGE_1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  18.00   19.00   20.00   27.22   25.00   89.00 
# source: (McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/ggplot2.html#summarize-data
# explanation: mean and standard deviation calculations for variables 'SEX', 'VEGGIE', and 'AGE_1'

1.4.1 Histograms

ggplot(
  data = combined,
  mapping = aes(
    x = HEALTHSTATUS,
    y = VEGGIE)) +
  geom_bar(
    stat = 'summary',
    fun = 'mean',
    fill = "#005a43")

The bar graph shows health status on the x axis and average veggie score on the y axis. The bars are detoned in green. A reported health status of 5, meaning excellent, is associated with the highest average veggie score while a reported health status of 1, meaning poor, is associated with the lowest average veggie score.

A bar graph depicting the relationship between perceived health status and average veggie score of Binghamton University students.
theme_bw()
<theme> List of 144
 $ line                            : <ggplot2::element_line>
  ..@ colour       : chr "black"
  ..@ linewidth    : num 0.5
  ..@ linetype     : num 1
  ..@ lineend      : chr "butt"
  ..@ linejoin     : chr "round"
  ..@ arrow        : logi FALSE
  ..@ arrow.fill   : chr "black"
  ..@ inherit.blank: logi TRUE
 $ rect                            : <ggplot2::element_rect>
  ..@ fill         : chr "white"
  ..@ colour       : chr "black"
  ..@ linewidth    : num 0.5
  ..@ linetype     : num 1
  ..@ linejoin     : chr "round"
  ..@ inherit.blank: logi TRUE
 $ text                            : <ggplot2::element_text>
  ..@ family       : chr ""
  ..@ face         : chr "plain"
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : chr "black"
  ..@ size         : num 11
  ..@ hjust        : num 0.5
  ..@ vjust        : num 0.5
  ..@ angle        : num 0
  ..@ lineheight   : num 0.9
  ..@ margin       : <ggplot2::margin> num [1:4] 0 0 0 0
  ..@ debug        : logi FALSE
  ..@ inherit.blank: logi TRUE
 $ title                           : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : NULL
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ point                           : <ggplot2::element_point>
  ..@ colour       : chr "black"
  ..@ shape        : num 19
  ..@ size         : num 1.5
  ..@ fill         : chr "white"
  ..@ stroke       : num 0.5
  ..@ inherit.blank: logi TRUE
 $ polygon                         : <ggplot2::element_polygon>
  ..@ fill         : chr "white"
  ..@ colour       : chr "black"
  ..@ linewidth    : num 0.5
  ..@ linetype     : num 1
  ..@ linejoin     : chr "round"
  ..@ inherit.blank: logi TRUE
 $ geom                            : <ggplot2::element_geom>
  ..@ ink        : chr "black"
  ..@ paper      : chr "white"
  ..@ accent     : chr "#3366FF"
  ..@ linewidth  : num 0.5
  ..@ borderwidth: num 0.5
  ..@ linetype   : int 1
  ..@ bordertype : int 1
  ..@ family     : chr ""
  ..@ fontsize   : num 3.87
  ..@ pointsize  : num 1.5
  ..@ pointshape : num 19
  ..@ colour     : NULL
  ..@ fill       : NULL
 $ spacing                         : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ margins                         : <ggplot2::margin> num [1:4] 5.5 5.5 5.5 5.5
 $ aspect.ratio                    : NULL
 $ axis.title                      : NULL
 $ axis.title.x                    : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 1
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 2.75 0 0 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.title.x.top                : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 0
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 0 2.75 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 1
  ..@ angle        : num 90
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 2.75 0 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 1
  ..@ angle        : num -90
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 0 0 2.75
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text                       : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : chr "#4D4D4DFF"
  ..@ size         : 'rel' num 0.8
  ..@ hjust        : NULL
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : NULL
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text.x                     : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 1
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 2.2 0 0 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text.x.top                 : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : NULL
  ..@ vjust        : num 0
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 0 2.2 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : num 1
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 2.2 0 0
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : num 0
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 0 0 2.2
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.text.theta                 : NULL
 $ axis.text.r                     : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : num 0.5
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : <ggplot2::margin> num [1:4] 0 2.2 0 2.2
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ axis.ticks                      : <ggplot2::element_line>
  ..@ colour       : chr "#333333FF"
  ..@ linewidth    : NULL
  ..@ linetype     : NULL
  ..@ lineend      : NULL
  ..@ linejoin     : NULL
  ..@ arrow        : logi FALSE
  ..@ arrow.fill   : chr "#333333FF"
  ..@ inherit.blank: logi TRUE
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'rel' num 0.5
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : <ggplot2::element_blank>
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : <ggplot2::element_rect>
  ..@ fill         : NULL
  ..@ colour       : logi NA
  ..@ linewidth    : NULL
  ..@ linetype     : NULL
  ..@ linejoin     : NULL
  ..@ inherit.blank: logi TRUE
 $ legend.margin                   : NULL
 $ legend.spacing                  : 'rel' num 2
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : NULL
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : NULL
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.key.justification        : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : 'rel' num 0.8
  ..@ hjust        : NULL
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : NULL
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ legend.text.position            : NULL
 $ legend.title                    : <ggplot2::element_text>
  ..@ family       : NULL
  ..@ face         : NULL
  ..@ italic       : chr NA
  ..@ fontweight   : num NA
  ..@ fontwidth    : num NA
  ..@ colour       : NULL
  ..@ size         : NULL
  ..@ hjust        : num 0
  ..@ vjust        : NULL
  ..@ angle        : NULL
  ..@ lineheight   : NULL
  ..@ margin       : NULL
  ..@ debug        : NULL
  ..@ inherit.blank: logi TRUE
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
  [list output truncated]
 @ complete: logi TRUE
 @ validate: logi TRUE
ggsave("health_veggie_plot.png", width = 8, height = 6)
#source 1: The FRI Playbook (McCarty, 2025)
#explanation: creating a bar graph to display reported health status and average veggie score
library(ggplot2)
#| label: histogram 
#| fig-cap: Figures 1 and 2. Two histograms depicting the distribution of veggie scores and age of participants. 
#| fig-alt: the first histogram has participant age on the x-axis and frequency on the y-axis. The second histogram has participant veggie scores on the x-axis and frequency on the y-axis. The age histogram appears to be right-skewed and is not a normall distribution. The veggie score histogram appears to be about normally distributed with a single right-skewing outlier.
# create a histogram for age
ggplot(secondary_data, aes(x = AGE_1)) + geom_histogram(binwidth = .5) + theme_bw() + ggtitle("Age of Participants") + xlab("Age") 

ggsave("plot/age_hist.png" , width = 8, height = 6)
# create a histogram for veggie score
ggplot(secondary_data, aes(x = VEGGIE)) + geom_histogram(binwidth = 10) + theme_bw() + ggtitle("Participant Veggie Scores") + xlab("Veggie Score")

ggsave("plot/veggie_hist.png")
Saving 8 x 4 in image
# source: https://rstudio.github.io/cheatsheets/data-visualization.pdf
# explanation: made histograms to check for normal distributions 

1.4.2 Density Plots

library(ggpubr)
library(ggplot2)
#| label: sex density plot 
#| fig-cap: Figure 3. Density plot depicting the distributions of veggie scores of males in orange and female in pink. 
#| fig-alt: Both male and female distribution appear to be about normal, with the female distribution being right skewed. An independent samples t-test showed a statistically significant difference between male and female veggie scores (t = 2.0588 , p < 0.05).
# create a second density plot for SEX and VEGGIESCORE
plot.sex.veggie <- ggdensity(secondary_data, x = "VEGGIE" ,
          add = "mean" , rug = TRUE ,
          color = "SEX" , fill = "SEX" ,
          palette = c("#ff8c6b" , "#e8a7d0"),
          title = "Distribution of Veggie Scores by Sex" ,
          xlab = "Veggie Score" ,
          add.params = list(linewidth = 1 , alpha = 1,
                            linetype = "dashed")) # change color of average lines
# customize x-axis to range from 100 to 800
plot.sex.veggie <- plot.sex.veggie + scale_x_continuous(breaks = seq(100, 800, by = 100))
print(plot.sex.veggie)

ggsave("plot/veggiescore_sex_plot.png",
       plot = plot.sex.veggie, 
       width = 10, height = 4, dpi = 300)
# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)
# explanation: created a veggie score distribution plot to visualize the difference between male and female participant's veggie scores
library(ggpubr)
library(ggplot2)
#| label: location density plot 
#| fig-cap: Figure 4. Density plot depicting the distributions of veggie scores taken at the farmers market in orange and at Bighamton University in pink. 
#| fig-alt: The Binghamton University plot appears to be about normally distirubted with a slight right-skew, while the farmers market plot appears to be heavily right skewed. An indpendent samples did not show a statiscally significant difference between farmers market and Binghamton University veggie scores (t = -1.8057, 0.07482).
# create a second density plot for LOCATION and VEGGIESCORE
plot.location.veggie <- ggdensity(secondary_data, x = "VEGGIE" ,
          add = "mean" , rug = TRUE ,
          color = "LOCATION" , fill = "LOCATION" ,
          palette = c("#ff8c6b" , "#e8a7d0"),
          title = "Distribution of Veggie Scores by LOCATION" ,
          xlab = "Veggie Score" ,
          add.params = list(linewidth = 1 , alpha = 1, linetype = "dashed")) # change color of average lines
# customize x-axis to range from 100 to 800
plot.location.veggie <- plot.location.veggie + scale_x_continuous(breaks = seq(100, 800, by = 100))
print(plot.location.veggie)

ggsave("plot/veggiescore_location_plot.png")
Saving 8 x 4 in image
# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)
# explanation: created a veggie score distribution plot to visualize the difference between participants surveyed at the farmers market and at Binghamton University veggie scores

1.4.3 Scatter Plot

library(ggplot2)
#| label: Age and veggie score scatter plot 
#| fig-cap: Figure 5. Scatter plot depicitng the relationship between age and veggie score. A linear regression was run which did not show a stastically significant relationship between veggie score and age (F = 1.47 , p = .227)
#| fig-alt: There appears to be a very weak, negative relationship between age and veggie score 
# create a scatter plot of age and veggiescore
plot.age.veggie <- ggplot(secondary_data, aes(x = AGE_1, y = VEGGIE)) + geom_point() + geom_smooth(method = "lm") + ggtitle("Veggie Score and Age") + theme_bw() + xlab("Age") + ylab("Veggie Score")
print(plot.age.veggie)
`geom_smooth()` using formula = 'y ~ x'

ggsave("plot/veggiescore_age_plot.png")
Saving 8 x 4 in image
`geom_smooth()` using formula = 'y ~ x'
# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)
# explanation: created a scatter plot depicting the relationship between age and veggie score 
library(ggplot2)
library(tidyr)
#| label: veggie score and health status scatter plot
#| fig-cap: Figure 5. Scatter plot depicitng the relationship between perceived health status and veggie score. The colors of points correspond to gender of participants. Orange for male, pink for female. A pearson correlation was run which did not show a stastically significant relationship between veggie score and age (F = 1.47 , p = .227). An independent samples t-test was also run which did not show a statistically significant difference between the perceived health status of male and female participants.
# create a scatterplot of health status and veggie score
veggie.health.plot <- combined %>%
  drop_na(HEALTHSTATUS, VEGGIE, GENDER) %>%
  ggplot(aes(x = HEALTHSTATUS, y = VEGGIE)) +
  geom_jitter(aes(color = GENDER), width = 0.2, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "#ceedc5ff") +
  ggtitle("Scatter Plot of Health Status vs. Veggie Score") +
  xlab("Health Status") +
  ylab("Veggie Score") +
  theme_bw() +
  scale_x_continuous(
    breaks = c(1, 2, 3, 4, 5),
    labels = c("Poor", "Fair", "Good", "Very Good", "Excelent")) +
  scale_color_manual(
    name = "Gender",
    values = c("Male" = "#ff8c6b", "Female" = "#e8a7d0"),
    labels = c("Male" , "Female")
  )
print(veggie.health.plot)
`geom_smooth()` using formula = 'y ~ x'

ggsave("plot/health_vs_veggie_scatter.png", 
       plot = veggie.health.plot,               
       width = 7,                     
       height = 3.7,                   
       dpi = 300)
`geom_smooth()` using formula = 'y ~ x'
# source: # source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014) 
# explanation: created a scatter plot depicting the relationship between perceived health status and veggie score, filtered by participant gender

1.5 Model

1.5.1 T-tests

# run independent sample t-test
t_test_results <- t.test(VEGGIE ~ SEX, 
                         data = secondary_data, 
                         var.equal = FALSE) # Use TRUE if Levene's test p > 0.05
print(t_test_results)

    Welch Two Sample t-test

data:  VEGGIE by SEX
t = 2.0588, df = 189.06, p-value = 0.04088
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
  1.071344 50.100047
sample estimates:
  mean in group Male mean in group Female 
            313.2073             287.6216 
# source: https://www.datacamp.com/tutorial/t-tests-r-tutorial
# explanation: ran an independent samples t-test on differences between veggie score by gender

An independent sample t-test was conducted to compare respondents’ sex and veggie scores. The test revealed a statistically significant relationship between sex and veggie scores, suggesting that veggie scores did vary by sex in the sample. Female participants had a mean veggie score of 287.62, and male participants had a mean veggie score of 313.2, suggesting that male participants, on average, had higher veggie scores than female participants.

# run independent samples t-test for 'GENDER' and 'HEALTH STATUS'
t_test_results2 <- t.test(HEALTHSTATUS ~ GENDER,
                          data = primary_data,
                          var.equal = FALSE)
print(t_test_results2)

    Welch Two Sample t-test

data:  HEALTHSTATUS by GENDER
t = -1.5622, df = 33.982, p-value = 0.1275
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
 -0.8898141  0.1163587
sample estimates:
  mean in group Male mean in group Female 
            3.086957             3.473684 
# source: https://www.datacamp.com/tutorial/t-tests-r-tutorial
# explanation: ran an independent sample t-test for gender and health status, to see if, because males had higher veggie scores, they would have higher perceived health status
# run independent sample t-test
t_test_results <- t.test(VEGGIE ~ LOCATION, 
                         data = secondary_data, 
                         var.equal = FALSE) # Use TRUE if Levene's test p > 0.05
print(t_test_results)

    Welch Two Sample t-test

data:  VEGGIE by LOCATION
t = -1.8057, df = 78.155, p-value = 0.07482
alternative hypothesis: true difference in means between group Farmers Market and group Binghamton University is not equal to 0
95 percent confidence interval:
 -60.684406   2.958714
sample estimates:
       mean in group Farmers Market mean in group Binghamton University 
                           277.8545                            306.7174 
# soruce: https://www.datacamp.com/tutorial/t-tests-r-tutorial
# explanation: ran an independent samples t-test on differences of veggie score between location surveyed

A second independent sample t-test was run to examine the relationship between surveying location—Binghamton University or Broome County Farmers’ Market—and veggie score. The test showed no statistically significant relationship between the two variables, suggesting that Binghamton University students and Broome County residents did not have significantly different veggie scores (t = -1.8057, p = 0.07482). Despite this, the participants who were surveyed at Binghamton University (m = 306.7174) had a higher mean veggie score compared to those who were surveyed at the Broome County Farmers Market (m = 277.8545).

1.5.2 Correlational Tests

# linear model for age and veggiescore
lm_age_veggie <- lm(AGE_1 ~ VEGGIE, data = secondary_data)
summary(lm_age_veggie)

Call:
lm(formula = AGE_1 ~ VEGGIE, data = secondary_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.362  -8.853  -7.120  -1.086  61.250 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 31.92011    4.04000   7.901 2.13e-13 ***
VEGGIE      -0.01574    0.01298  -1.213    0.227    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.94 on 191 degrees of freedom
Multiple R-squared:  0.00764,   Adjusted R-squared:  0.002444 
F-statistic:  1.47 on 1 and 191 DF,  p-value: 0.2268
# source: https://www.datacamp.com/tutorial/linear-regression-R
# explanation: ran a linear regression on the relationship between veggie score and age 

A linear model for age and veggie score found that the relationship between age and veggie score was not statistically significant. This means there was no statistically significant relationship between age and veggie score. Veggie score did not vary with age in this sample. Figure 2 shows there is little to no correlation between age and veggie score in the sample.

# pearson correltion for percieved health status and veggie score 
cor.test(combined$HEALTHSTATUS, combined$VEGGIE, method = "pearson", use = "complete.obs")

    Pearson's product-moment correlation

data:  combined$HEALTHSTATUS and combined$VEGGIE
t = 0.91818, df = 48, p-value = 0.3631
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1525464  0.3952727
sample estimates:
      cor 
0.1313798 
#source: (cor Function in R, 2025) https://www.r-bloggers.com/2025/03/cor-function-in-r-calculate-correlation-coefficients-in-r/
#explanation: running a pearson correlation test to examine statistical significance of the relationship between reported health status and veggie score

Finally, a Pearson correlation showed no statistically significant relationship between reported health status and veggie score, suggesting that veggie score does not predict reported health status (r = 0.13, p = 0.3631). Looking at Figure 1, it may appear as though a higher reported health status would be associated with a higher average veggie score. For instance, among those who reported a health status of 5, meaning excellent, the average veggie score was about 325. On the other hand, the average veggie score for participants who reported a health status of 1, meaning poor, was just over 250. However, the findings fail to reject the null hypothesis, which states that reported health status is not correlated with veggie score. Individuals report the same level of health status regardless of their veggie score.

2 Discussion

2.1 Main Findngs

2.1.1 Qualitative

The main qualitative findings of this study are that students primarily conceptualize healthy foods in terms of macronutrients, micronutrients, and processing, and that they make their actual meal decisions based on the food label (different macro and micronutrients), convenience, cravings, emotional satisfaction, and price. There was a significant overlap of lay beliefs across dimensions- participants often conceptualized healthy foods using nutrient-based reasoning, and also made actual decisions around meals with the same nutrient-based reasoning. However, many students also disregarded their beliefs regarding healthy foods when actually selecting meals. These students opted instead to make decisions based on convenience, cravings, and poor offerings. These findings suggest a significant belief-behavior gap, in which conceptual understanding of nutrition fails to directly translate into proper dietary practice. Although the chi-squared analysis showed no significant differences between lay beliefs and the sociodemographic variable gender, it is clear that many environmental factors influenced students' decision-making when selecting meals. 

2.1.2 Quantitative

The main quantitative finding of this study is that males, on average, have statistically significantly higher veggie scores than females. Higher veggies scores indicate that males in Broome County consume higher levels of fruits and vegetables than females do. Statistical tests also showed that location was not a statistically significant indicator of veggie score, suggesting that Binghamton University students are not any more or less likely to consume fruits and vegetables than Broome County residents. Additionally, age was also found to be a nonviable predictor of veggie score. The quantitative findings of this study suggest that gender may be a viable indicator of fruit and vegetable consumption among those in Broome County.

2.2 Contextualization

The current literature is contradictory, so these findings are both supported and challenged by other research. This study found that males tended to consume more fruits and vegetables than females did. In contrast, a study conducted in 2003 focused on older adults and found that women ate significantly more fruit and vegetables than men. Only 16% of men were found to eat the recommended five or more servings of fruit and vegetables a day compared to about 34% of women (Baker & Wardle, 2003). However, a 2013 study that collected dietary scores found that adult women had higher total dietary scores than men did (Hiza et al., 2013). These opposing findings suggest that more work is needed to identify indicators of dietary behaviors by sex. On the other hand, the qualitative findings clearly reinforce the existing literature, which shows that young adults frequently make food choices based on price, convenience, and environmental access rather than solely on nutritional information (Li et al., 2022). These findings also align with the preexisting literature by asserting that lay beliefs, emotional reasoning, and environmental factors are important determinants of food selection and, by extension, health.

2.3 Hypothesis

For the quantitative question, it was hypothesized that sex, age, and location would predict veggie scores. More specifically, it was hypothesized that males would have higher veggie scores than females; older participants would have higher veggie scores on average; and Broome County residents would also have higher veggie scores on average. The former hypothesis was supported, but the latter two hypotheses showed no significant differences and were therefore not supported. For the qualitative portion, the researchers hypothesized that a variety of lay beliefs would provide an explanation for the variability of veggie meter scores. In other words, it was hypothesized that students’ knowledge and beliefs would not directly translate into proper dietary practices, such as consuming fruits and vegetables, which would lead to a higher veggie meter score. This hypothesis was supported by a variety of environmental factors that participants reported, which directly influenced decision-making processes when selecting nutritious foods such as fruits and vegetables.

2.4 Theoretical Findings

These findings align with the socio-ecological model, which postulates that students make food choices based on a variety of individual factors (personal beliefs and nutrition knowledge) and also a variety of external factors (time, convenience, and current offerings) The finding that these external factors often impede individual choices is consistent with the ecological determinants of health (Wardle et al., 2000).

2.5 Evidence-Based Conclusions

This study found, using an independent-samples t-test (alpha = 0.05), that veggie score varies by sex (male or female). The test generated a t-statistic of 2.0588 and a p-value of 0.04088. The mean veggie meter score for men in the sample (m = 313.2073) was higher than that of female participants (m = 287.6216). Another independent-samples t-test (alpha = 0.05) was conducted to examine the relationship between the location surveyed (Binghamton University or Broome County Farmers Market) and participant veggie scores. The test (alpha = 0.05) found no significant difference in veggie scores between those sampled at Binghamton University and those sampled at the Broome County Farmers Market (t = 1.8057, p = 0.07482). Despite this, the mean veggie score of those surveyed at the Broome County Farmers Market was lower (m = 277.8545) than that of those surveyed at Binghamton University (m = 306.7174). 
Additionally, a Pearson correlation test was conducted to determine the relationship between perceived health status and veggie score. This test did not produce significant results at an alpha level of 0.05 (r = 0.1313, p = 0.3631). These results suggest that there is no relationship between age and veggie score within study participants. A linear regression was also conducted to examine the relationship between veggie score and participants' age. The linear regression did not produce significant results at an alpha level of 0.05 (F = 1.47, p = 0.2268). These results suggest that veggie score does not vary by subject’s perceived health status or age. 

2.6 Broader Context and Future Work

The quantitative findings of this study suggest that future work should further examine location and age as predictors of fruit and vegetable consumption. This study focused only on one university and one county in New York state, and, therefore, is not completely representative of New York state, universities, or the country as a whole. Future works should broaden their scope to include more than one university and/or multiple counties to create more representative samples and generalizable results. Qualitative findings included in this study show that future nutritional interventions need to go beyond simply providing factual information regarding nutrition and food selection—they need to pair factual information with environmental strategies such as improving the offerings of on-campus dining and promoting tasty, convenient, healthy food recipes for those who do not rely on campus dining. In short, future nutrition interventions should incorporate both individual knowledge and beliefs, and environmental factors that impact how students feel and act in real-world contexts.

2.7 Limitations

Discrepancies in this research may stem from a lack of literature on the Veggie Meter, including, but not limited to, methods for reducing variability in veggie scores, the duration of carotenoid presence in the skin and blood, and unforeseen environmental effects on veggie score readings. Additionally, the sample size presents a significant limitation for the qualitative results of this study. The design of the study led to the quantitative data being collected first (the veggie meter was the first survey given to participants), and all of the participants of this study who were Binghamton students were invited to partake in the second survey, where qualitative data were derived. Unfortunately, significant attrition was observed between these two surveys—many participants who completed the Veggie Meter and entered their biological data failed to fill out the second survey and therefore failed to provide qualitative data. A larger, more diverse sample would greatly strengthen this study’s statistical power, as the current sample underrepresented some racial groups. In addition, the high amount of blank or null responses in the qualitative data raises concerns about sample size and the validity of findings. Furthermore, research using larger, more representative samples is necessary to clarify whether location, sex, and age are true predictors of veggie score. Additional studies may also examine veggie scores longitudinally to better understand carotenoid level variability.

2.8 Broader Implications

Dietary interventions on Binghamton University’s campus and in Broome County should consider sex when developing community interventions, as sex appears to affect fruit and vegetable consumption. Additionally, it should be considered that Binghamton University’s campus does not have worse access to fruits and vegetables than the surrounding county. This would, however, raise questions regarding the quality of fruit and vegetable access in Broome County as a whole. The qualitative findings show that future nutritional interventions need to go beyond simply providing factual information regarding nutrition and food selection- they need to pair factual information with environmental strategies such as improving the offerings of on-campus dining and promoting tasty, convenient, and nutritious food recipes for those who do not rely on campus dining. In short, future nutrition interventions should incorporate both individual knowledge, beliefs, and environmental factors that impact how students feel and act in real-world contexts.