The current foodscape of the United States, along with the nutritional knowledge and perceptions of university students across the country, continues to be a critically understudied topic. University attendees today are especially susceptible to making poor nutritional choices due to the current environment of the Ultra-Processed Food (UPF) industry. A large gap that exists within the current literature is the lack of research on the dietary quality of Binghamton University students. Thus, this study aims to examine the overall dietary scores—or veggie scores—of Binghamton University students and how they correlate with lay beliefs and various sociodemographic variables. This study aims to employ a complementary multiphase concurrent mixed-method review (MMR) as well as objective biological sampling to determine the relationships, if any, between sociodemographics, nutritional beliefs, and veggie scores derived using the VeggieMeter®. Biological data were collected from willing participants who fit various criteria, and survey responses were collected both in-person and online. Once data was collected, it was cleaned and analyzed in R. Qualitative data was coded and analyzed using NVivo. All data was aggregated and cross-referenced in the form of codes. Various codes were referenced in the context of other codes as well as in the context of sociodemographic variables. After all data was collected, aggregated, and analyzed, these values were tested against veggie score, and a complete analysis of the data was conducted. The researchers hypothesized that participants with higher nutritional knowledge and perception scores would receive higher objective biological scores. Several correlations were examined tied to this relationship to determine its driving causes. From this data, the public can be effectively educated on the important link between nutritional knowledge and personal health.
Keywords
health status, veggie score, Age, Sex, college students
1 Results
Participants for the quantitative portion, including both Binghamton University students and Broome County residents, were between 18 and 89 years of age (n = 208) with a mean of 27.22 years. The sex distribution for the quantitative portion was majority female, with 111 females and 82 males. The Binghamton University population identifies as 80.1% White, 4.97% Asian, 5.38% Hispanic or Latino, and 5.09% Black or African American, 2.99% two or more races, 0.0465% Native Hawaiian or Other Pacific Islanders, and 0.0748% American Indian or Alaska Native (Binghamton University | Data USA, n.d.). The sample population for the qualitative portion, in comparison, was 50.7% White, 30.2% Asian, 9.5% two or more races, 6.3% Black, 1.6% other, and 1.6% Middle Eastern. Zero Binghamton University survey participants reported being Hispanic or Latino or Native Hawaiian or Pacific Islander. The sample was similar to the Broome County population in terms of the proportion of White students. Despite this, many racial groups went totally unrepresented or were greatly underrepresented.
1.1 Load
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
library(knitr)library(tibble)library(dplyr)library(tidyr)library(scales) # for number formatting like comma()
Attaching package: 'scales'
The following objects are masked from 'package:psych':
alpha, rescale
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(english) # to convert numbers to words
Attaching package: 'english'
The following object is masked from 'package:scales':
ordinal
library(stringr) # for text functions like str_c()library(NHANES)library(haven)library(readxl)library(tableone)library(ggpubr)
# source: (Hei & McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/import-once.html # explanation: import perceptions survey data as data frame primary_data and Veggie Meter survey data as data frame secondary_data
1.3 Transform
1.3.1 Combine Data Sets
## combine data sets primary_data and secondary_data by variable "PASSWORD"combined <- primary_data %>%left_join( secondary_data %>%select("PASSWORD", "VEGGIESCORE", "AGE_1"), by ="PASSWORD", )
Warning in left_join(., secondary_data %>% select("PASSWORD", "VEGGIESCORE", : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 86 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
# source: (McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/merge.html# explain: made a combined data set using the passwords shared by both the perceptions and Veggie Meter surveys.
...1 EndDate Status
Min. :2025-09-26 13:57:24 Min. :2025-09-26 13:57:33 Min. :0.000000
1st Qu.:2025-10-03 13:17:06 1st Qu.:2025-10-03 13:19:31 1st Qu.:0.000000
Median :2025-10-09 15:38:01 Median :2025-10-09 15:56:26 Median :0.000000
Mean :2025-10-09 23:35:47 Mean :2025-10-10 00:09:12 Mean :0.009709
3rd Qu.:2025-10-15 14:53:10 3rd Qu.:2025-10-15 14:56:13 3rd Qu.:0.000000
Max. :2025-10-18 12:50:27 Max. :2025-10-18 16:37:22 Max. :1.000000
IPAddress Progress Duration (in seconds) Finished
Length:309 Min. : 7.00 Min. : 5 Min. :0.0000
Class :character 1st Qu.:100.00 1st Qu.: 99 1st Qu.:1.0000
Mode :character Median :100.00 Median : 171 Median :1.0000
Mean : 92.52 Mean : 2005 Mean :0.8964
3rd Qu.:100.00 3rd Qu.: 249 3rd Qu.:1.0000
Max. :100.00 Max. :348270 Max. :1.0000
RecordedDate ResponseId RecipientLastName
Min. :2025-10-01 11:19:59 Length:309 Mode:logical
1st Qu.:2025-10-03 13:58:10 Class :character NA's:309
Median :2025-10-10 12:01:18 Mode :character
Mean :2025-10-10 17:33:06
3rd Qu.:2025-10-16 18:13:46
Max. :2025-10-19 12:00:20
RecipientFirstName RecipientEmail ExternalReference LocationLatitude
Mode:logical Mode:logical Mode:logical Min. :38.05
NA's:309 NA's:309 NA's:309 1st Qu.:40.78
Median :41.60
Mean :41.65
3rd Qu.:42.10
Max. :47.61
LocationLongitude DistributionChannel UserLanguage Consent
Min. :-122.33 Length:309 Length:309 Min. :0.0000
1st Qu.: -75.89 Class :character Class :character 1st Qu.:1.0000
Median : -74.36 Mode :character Mode :character Median :1.0000
Mean : -75.12 Mean :0.9968
3rd Qu.: -73.91 3rd Qu.:1.0000
Max. : -71.06 Max. :1.0000
NA's :1
PASSWORD VEGGIESCORE CONSUMPTION SEX
Length:309 Length:309 Min. :1.000 Min. :0.0000
Class :character Class :character 1st Qu.:4.000 1st Qu.:0.0000
Mode :character Mode :character Median :5.000 Median :1.0000
Mean :4.492 Mean :0.5607
3rd Qu.:5.000 3rd Qu.:1.0000
Max. :6.000 Max. :1.0000
NA's :110 NA's :29
AGE_1 LOCATION HEALTHSTATUS HOPE ANXIETY
Min. :18.00 Min. :0.0000 Mode:logical Mode:logical Mode:logical
1st Qu.:18.00 1st Qu.:0.0000 NA's:309 NA's:309 NA's:309
Median :19.00 Median :1.0000
Mean :25.09 Mean :0.7044
3rd Qu.:21.00 3rd Qu.:1.0000
Max. :89.00 Max. :1.0000
NA's :26 NA's :106
RACIALIZED RACIALIZED_8_TEXT SOCIALSTATUS INCOME
Mode:logical Mode:logical Mode:logical Mode:logical
NA's:309 NA's:309 NA's:309 NA's:309
# source: (Hei & McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/import-once.html # explanation: Filtered out NA responses from primary_data and secondary_data; used names() and summary() functions to see the column names and summarize the column descriptive statistics from the data frame based on code from the FRI playbook (Hei & McCarty, 2025)
# add a filtered VEGGIE column to secondary_data secondary_data <- secondary_data %>%mutate(VEGGIE =as.numeric(VEGGIESCORE), ) %>%filter(VEGGIE >=50& VEGGIE <=800)# add a filtered VEGGIE column to combinedcombined <- combined %>%mutate(VEGGIE =as.numeric(VEGGIESCORE.y),HEALTHSTATUS =as.numeric(HEALTHSTATUS) ) %>%filter(VEGGIE >=50& VEGGIE <=800)# source: (Estreich, 2025) https://shanemccarty.github.io/FRIplaybook/dplyr.html # explanation: filtered out outlier variables from 'VEGGIESCORE' in secondary_data
1.3.4 Transform Variables for Significance Tests
library(dplyr)# filter 'GENDER' to only include two groups for independent sample t-testprimary_data <- primary_data %>%filter( GENDER %in%c(0,1) ) %>%mutate(GENDER =factor(GENDER, levels =c(0,1), labels =c("Male" , "Female")) ) %>%drop_na( GENDER, HEALTHSTATUS )# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf # explanation: filtered column 'GENDER' in primary_data to only include two values 'Male' and 'Female'
library(dplyr)# filter 'GENDER' to only include two gruops for pearson correlationcombined <- combined %>%filter( GENDER %in%c(0,1) ) %>%mutate(GENDER =factor(GENDER, levels =c(0,1), labels =c("Male" , "Female")) )# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf # explanation: filtered column 'GENDER' in combined to only include two values 'Male' and 'Female'
library(dplyr)# filter 'SEX' to only include two groups for independent samples t-testsecondary_data <- secondary_data %>%filter( SEX %in%c(0,1) ) %>%mutate(SEX =factor(SEX, levels =c(0,1), labels =c("Male" , "Female"))) %>%drop_na( VEGGIE, SEX)# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf # explanation: filtered column 'SEX' to only include two values 'Male' and 'Female'
library(dplyr)# filter 'LOCATION' to only include two groups secondary_data <- secondary_data %>%filter( LOCATION %in%c(0,1) ) %>%mutate(LOCATION =factor(LOCATION, levels =c(0,1), labels =c("Farmers Market" , "Binghamton University"))) %>%drop_na( VEGGIESCORE, LOCATION)summary(secondary_data$LOCATION)
Farmers Market Binghamton University
55 138
summary(secondary_data$VEGGIESCORE)
Min. 1st Qu. Median Mean 3rd Qu. Max.
65.0 245.0 299.0 298.5 342.0 752.0
# source: https://nyu-cdsc.github.io/learningr/assets/data-transformation.pdf # explanation: filtered column 'LOCATION' to only include two values 'FARMERS MARKET' and 'BINGHAMTON UNIVERSITY'
1.4 Visualize
# calculate descriptive statistics (mean, sd) for 'SEX', 'VEGGIE', and 'AGE_1'summary(secondary_data$SEX)
Male Female
82 111
summary(secondary_data$VEGGIE)
Min. 1st Qu. Median Mean 3rd Qu. Max.
65.0 245.0 299.0 298.5 342.0 752.0
summary(secondary_data$AGE_1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.00 19.00 20.00 27.22 25.00 89.00
# source: (McCarty, 2025) https://shanemccarty.github.io/FRIplaybook/ggplot2.html#summarize-data# explanation: mean and standard deviation calculations for variables 'SEX', 'VEGGIE', and 'AGE_1'
A bar graph depicting the relationship between perceived health status and average veggie score of Binghamton University students.
theme_bw()
<theme> List of 144
$ line : <ggplot2::element_line>
..@ colour : chr "black"
..@ linewidth : num 0.5
..@ linetype : num 1
..@ lineend : chr "butt"
..@ linejoin : chr "round"
..@ arrow : logi FALSE
..@ arrow.fill : chr "black"
..@ inherit.blank: logi TRUE
$ rect : <ggplot2::element_rect>
..@ fill : chr "white"
..@ colour : chr "black"
..@ linewidth : num 0.5
..@ linetype : num 1
..@ linejoin : chr "round"
..@ inherit.blank: logi TRUE
$ text : <ggplot2::element_text>
..@ family : chr ""
..@ face : chr "plain"
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : chr "black"
..@ size : num 11
..@ hjust : num 0.5
..@ vjust : num 0.5
..@ angle : num 0
..@ lineheight : num 0.9
..@ margin : <ggplot2::margin> num [1:4] 0 0 0 0
..@ debug : logi FALSE
..@ inherit.blank: logi TRUE
$ title : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : NULL
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ point : <ggplot2::element_point>
..@ colour : chr "black"
..@ shape : num 19
..@ size : num 1.5
..@ fill : chr "white"
..@ stroke : num 0.5
..@ inherit.blank: logi TRUE
$ polygon : <ggplot2::element_polygon>
..@ fill : chr "white"
..@ colour : chr "black"
..@ linewidth : num 0.5
..@ linetype : num 1
..@ linejoin : chr "round"
..@ inherit.blank: logi TRUE
$ geom : <ggplot2::element_geom>
..@ ink : chr "black"
..@ paper : chr "white"
..@ accent : chr "#3366FF"
..@ linewidth : num 0.5
..@ borderwidth: num 0.5
..@ linetype : int 1
..@ bordertype : int 1
..@ family : chr ""
..@ fontsize : num 3.87
..@ pointsize : num 1.5
..@ pointshape : num 19
..@ colour : NULL
..@ fill : NULL
$ spacing : 'simpleUnit' num 5.5points
..- attr(*, "unit")= int 8
$ margins : <ggplot2::margin> num [1:4] 5.5 5.5 5.5 5.5
$ aspect.ratio : NULL
$ axis.title : NULL
$ axis.title.x : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 1
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 2.75 0 0 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.title.x.top : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 0
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 0 2.75 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.title.x.bottom : NULL
$ axis.title.y : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 1
..@ angle : num 90
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 2.75 0 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.title.y.left : NULL
$ axis.title.y.right : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 1
..@ angle : num -90
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 0 0 2.75
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : chr "#4D4D4DFF"
..@ size : 'rel' num 0.8
..@ hjust : NULL
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : NULL
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text.x : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 1
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 2.2 0 0 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text.x.top : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : NULL
..@ vjust : num 0
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 0 2.2 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text.x.bottom : NULL
$ axis.text.y : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : num 1
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 2.2 0 0
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text.y.left : NULL
$ axis.text.y.right : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : num 0
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 0 0 2.2
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.text.theta : NULL
$ axis.text.r : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : num 0.5
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : <ggplot2::margin> num [1:4] 0 2.2 0 2.2
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ axis.ticks : <ggplot2::element_line>
..@ colour : chr "#333333FF"
..@ linewidth : NULL
..@ linetype : NULL
..@ lineend : NULL
..@ linejoin : NULL
..@ arrow : logi FALSE
..@ arrow.fill : chr "#333333FF"
..@ inherit.blank: logi TRUE
$ axis.ticks.x : NULL
$ axis.ticks.x.top : NULL
$ axis.ticks.x.bottom : NULL
$ axis.ticks.y : NULL
$ axis.ticks.y.left : NULL
$ axis.ticks.y.right : NULL
$ axis.ticks.theta : NULL
$ axis.ticks.r : NULL
$ axis.minor.ticks.x.top : NULL
$ axis.minor.ticks.x.bottom : NULL
$ axis.minor.ticks.y.left : NULL
$ axis.minor.ticks.y.right : NULL
$ axis.minor.ticks.theta : NULL
$ axis.minor.ticks.r : NULL
$ axis.ticks.length : 'rel' num 0.5
$ axis.ticks.length.x : NULL
$ axis.ticks.length.x.top : NULL
$ axis.ticks.length.x.bottom : NULL
$ axis.ticks.length.y : NULL
$ axis.ticks.length.y.left : NULL
$ axis.ticks.length.y.right : NULL
$ axis.ticks.length.theta : NULL
$ axis.ticks.length.r : NULL
$ axis.minor.ticks.length : 'rel' num 0.75
$ axis.minor.ticks.length.x : NULL
$ axis.minor.ticks.length.x.top : NULL
$ axis.minor.ticks.length.x.bottom: NULL
$ axis.minor.ticks.length.y : NULL
$ axis.minor.ticks.length.y.left : NULL
$ axis.minor.ticks.length.y.right : NULL
$ axis.minor.ticks.length.theta : NULL
$ axis.minor.ticks.length.r : NULL
$ axis.line : <ggplot2::element_blank>
$ axis.line.x : NULL
$ axis.line.x.top : NULL
$ axis.line.x.bottom : NULL
$ axis.line.y : NULL
$ axis.line.y.left : NULL
$ axis.line.y.right : NULL
$ axis.line.theta : NULL
$ axis.line.r : NULL
$ legend.background : <ggplot2::element_rect>
..@ fill : NULL
..@ colour : logi NA
..@ linewidth : NULL
..@ linetype : NULL
..@ linejoin : NULL
..@ inherit.blank: logi TRUE
$ legend.margin : NULL
$ legend.spacing : 'rel' num 2
$ legend.spacing.x : NULL
$ legend.spacing.y : NULL
$ legend.key : NULL
$ legend.key.size : 'simpleUnit' num 1.2lines
..- attr(*, "unit")= int 3
$ legend.key.height : NULL
$ legend.key.width : NULL
$ legend.key.spacing : NULL
$ legend.key.spacing.x : NULL
$ legend.key.spacing.y : NULL
$ legend.key.justification : NULL
$ legend.frame : NULL
$ legend.ticks : NULL
$ legend.ticks.length : 'rel' num 0.2
$ legend.axis.line : NULL
$ legend.text : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : 'rel' num 0.8
..@ hjust : NULL
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : NULL
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ legend.text.position : NULL
$ legend.title : <ggplot2::element_text>
..@ family : NULL
..@ face : NULL
..@ italic : chr NA
..@ fontweight : num NA
..@ fontwidth : num NA
..@ colour : NULL
..@ size : NULL
..@ hjust : num 0
..@ vjust : NULL
..@ angle : NULL
..@ lineheight : NULL
..@ margin : NULL
..@ debug : NULL
..@ inherit.blank: logi TRUE
$ legend.title.position : NULL
$ legend.position : chr "right"
$ legend.position.inside : NULL
$ legend.direction : NULL
$ legend.byrow : NULL
$ legend.justification : chr "center"
$ legend.justification.top : NULL
$ legend.justification.bottom : NULL
$ legend.justification.left : NULL
$ legend.justification.right : NULL
$ legend.justification.inside : NULL
[list output truncated]
@ complete: logi TRUE
@ validate: logi TRUE
ggsave("health_veggie_plot.png", width =8, height =6)#source 1: The FRI Playbook (McCarty, 2025)#explanation: creating a bar graph to display reported health status and average veggie score
library(ggplot2)#| label: histogram #| fig-cap: Figures 1 and 2. Two histograms depicting the distribution of veggie scores and age of participants. #| fig-alt: the first histogram has participant age on the x-axis and frequency on the y-axis. The second histogram has participant veggie scores on the x-axis and frequency on the y-axis. The age histogram appears to be right-skewed and is not a normall distribution. The veggie score histogram appears to be about normally distributed with a single right-skewing outlier.# create a histogram for ageggplot(secondary_data, aes(x = AGE_1)) +geom_histogram(binwidth = .5) +theme_bw() +ggtitle("Age of Participants") +xlab("Age")
# source: https://rstudio.github.io/cheatsheets/data-visualization.pdf# explanation: made histograms to check for normal distributions
1.4.2 Density Plots
library(ggpubr)library(ggplot2)#| label: sex density plot #| fig-cap: Figure 3. Density plot depicting the distributions of veggie scores of males in orange and female in pink. #| fig-alt: Both male and female distribution appear to be about normal, with the female distribution being right skewed. An independent samples t-test showed a statistically significant difference between male and female veggie scores (t = 2.0588 , p < 0.05).# create a second density plot for SEX and VEGGIESCOREplot.sex.veggie <-ggdensity(secondary_data, x ="VEGGIE" ,add ="mean" , rug =TRUE ,color ="SEX" , fill ="SEX" ,palette =c("#ff8c6b" , "#e8a7d0"),title ="Distribution of Veggie Scores by Sex" ,xlab ="Veggie Score" ,add.params =list(linewidth =1 , alpha =1,linetype ="dashed")) # change color of average lines# customize x-axis to range from 100 to 800plot.sex.veggie <- plot.sex.veggie +scale_x_continuous(breaks =seq(100, 800, by =100))print(plot.sex.veggie)
ggsave("plot/veggiescore_sex_plot.png",plot = plot.sex.veggie, width =10, height =4, dpi =300)# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)# explanation: created a veggie score distribution plot to visualize the difference between male and female participant's veggie scores
library(ggpubr)library(ggplot2)#| label: location density plot #| fig-cap: Figure 4. Density plot depicting the distributions of veggie scores taken at the farmers market in orange and at Bighamton University in pink. #| fig-alt: The Binghamton University plot appears to be about normally distirubted with a slight right-skew, while the farmers market plot appears to be heavily right skewed. An indpendent samples did not show a statiscally significant difference between farmers market and Binghamton University veggie scores (t = -1.8057, 0.07482).# create a second density plot for LOCATION and VEGGIESCOREplot.location.veggie <-ggdensity(secondary_data, x ="VEGGIE" ,add ="mean" , rug =TRUE ,color ="LOCATION" , fill ="LOCATION" ,palette =c("#ff8c6b" , "#e8a7d0"),title ="Distribution of Veggie Scores by LOCATION" ,xlab ="Veggie Score" ,add.params =list(linewidth =1 , alpha =1, linetype ="dashed")) # change color of average lines# customize x-axis to range from 100 to 800plot.location.veggie <- plot.location.veggie +scale_x_continuous(breaks =seq(100, 800, by =100))print(plot.location.veggie)
ggsave("plot/veggiescore_location_plot.png")
Saving 8 x 4 in image
# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)# explanation: created a veggie score distribution plot to visualize the difference between participants surveyed at the farmers market and at Binghamton University veggie scores
1.4.3 Scatter Plot
library(ggplot2)#| label: Age and veggie score scatter plot #| fig-cap: Figure 5. Scatter plot depicitng the relationship between age and veggie score. A linear regression was run which did not show a stastically significant relationship between veggie score and age (F = 1.47 , p = .227)#| fig-alt: There appears to be a very weak, negative relationship between age and veggie score # create a scatter plot of age and veggiescoreplot.age.veggie <-ggplot(secondary_data, aes(x = AGE_1, y = VEGGIE)) +geom_point() +geom_smooth(method ="lm") +ggtitle("Veggie Score and Age") +theme_bw() +xlab("Age") +ylab("Veggie Score")print(plot.age.veggie)
`geom_smooth()` using formula = 'y ~ x'
ggsave("plot/veggiescore_age_plot.png")
Saving 8 x 4 in image
`geom_smooth()` using formula = 'y ~ x'
# source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014)# explanation: created a scatter plot depicting the relationship between age and veggie score
library(ggplot2)library(tidyr)#| label: veggie score and health status scatter plot#| fig-cap: Figure 5. Scatter plot depicitng the relationship between perceived health status and veggie score. The colors of points correspond to gender of participants. Orange for male, pink for female. A pearson correlation was run which did not show a stastically significant relationship between veggie score and age (F = 1.47 , p = .227). An independent samples t-test was also run which did not show a statistically significant difference between the perceived health status of male and female participants.# create a scatterplot of health status and veggie scoreveggie.health.plot <- combined %>%drop_na(HEALTHSTATUS, VEGGIE, GENDER) %>%ggplot(aes(x = HEALTHSTATUS, y = VEGGIE)) +geom_jitter(aes(color = GENDER), width =0.2, alpha =0.6) +geom_smooth(method ="lm", se =FALSE, color ="#ceedc5ff") +ggtitle("Scatter Plot of Health Status vs. Veggie Score") +xlab("Health Status") +ylab("Veggie Score") +theme_bw() +scale_x_continuous(breaks =c(1, 2, 3, 4, 5),labels =c("Poor", "Fair", "Good", "Very Good", "Excelent")) +scale_color_manual(name ="Gender",values =c("Male"="#ff8c6b", "Female"="#e8a7d0"),labels =c("Male" , "Female") )print(veggie.health.plot)
# source: # source: https://stackoverflow.com/questions/21563864/ggplot2-overlay-density-plots-r (2014) # explanation: created a scatter plot depicting the relationship between perceived health status and veggie score, filtered by participant gender
1.5 Model
1.5.1 T-tests
# run independent sample t-testt_test_results <-t.test(VEGGIE ~ SEX, data = secondary_data, var.equal =FALSE) # Use TRUE if Levene's test p > 0.05print(t_test_results)
Welch Two Sample t-test
data: VEGGIE by SEX
t = 2.0588, df = 189.06, p-value = 0.04088
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
1.071344 50.100047
sample estimates:
mean in group Male mean in group Female
313.2073 287.6216
# source: https://www.datacamp.com/tutorial/t-tests-r-tutorial# explanation: ran an independent samples t-test on differences between veggie score by gender
An independent sample t-test was conducted to compare respondents’ sex and veggie scores. The test revealed a statistically significant relationship between sex and veggie scores, suggesting that veggie scores did vary by sex in the sample. Female participants had a mean veggie score of 287.62, and male participants had a mean veggie score of 313.2, suggesting that male participants, on average, had higher veggie scores than female participants.
# run independent samples t-test for 'GENDER' and 'HEALTH STATUS't_test_results2 <-t.test(HEALTHSTATUS ~ GENDER,data = primary_data,var.equal =FALSE)print(t_test_results2)
Welch Two Sample t-test
data: HEALTHSTATUS by GENDER
t = -1.5622, df = 33.982, p-value = 0.1275
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
-0.8898141 0.1163587
sample estimates:
mean in group Male mean in group Female
3.086957 3.473684
# source: https://www.datacamp.com/tutorial/t-tests-r-tutorial# explanation: ran an independent sample t-test for gender and health status, to see if, because males had higher veggie scores, they would have higher perceived health status
# run independent sample t-testt_test_results <-t.test(VEGGIE ~ LOCATION, data = secondary_data, var.equal =FALSE) # Use TRUE if Levene's test p > 0.05print(t_test_results)
Welch Two Sample t-test
data: VEGGIE by LOCATION
t = -1.8057, df = 78.155, p-value = 0.07482
alternative hypothesis: true difference in means between group Farmers Market and group Binghamton University is not equal to 0
95 percent confidence interval:
-60.684406 2.958714
sample estimates:
mean in group Farmers Market mean in group Binghamton University
277.8545 306.7174
# soruce: https://www.datacamp.com/tutorial/t-tests-r-tutorial# explanation: ran an independent samples t-test on differences of veggie score between location surveyed
A second independent sample t-test was run to examine the relationship between surveying location—Binghamton University or Broome County Farmers’ Market—and veggie score. The test showed no statistically significant relationship between the two variables, suggesting that Binghamton University students and Broome County residents did not have significantly different veggie scores (t = -1.8057, p = 0.07482). Despite this, the participants who were surveyed at Binghamton University (m = 306.7174) had a higher mean veggie score compared to those who were surveyed at the Broome County Farmers Market (m = 277.8545).
1.5.2 Correlational Tests
# linear model for age and veggiescorelm_age_veggie <-lm(AGE_1 ~ VEGGIE, data = secondary_data)summary(lm_age_veggie)
Call:
lm(formula = AGE_1 ~ VEGGIE, data = secondary_data)
Residuals:
Min 1Q Median 3Q Max
-12.362 -8.853 -7.120 -1.086 61.250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.92011 4.04000 7.901 2.13e-13 ***
VEGGIE -0.01574 0.01298 -1.213 0.227
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.94 on 191 degrees of freedom
Multiple R-squared: 0.00764, Adjusted R-squared: 0.002444
F-statistic: 1.47 on 1 and 191 DF, p-value: 0.2268
# source: https://www.datacamp.com/tutorial/linear-regression-R# explanation: ran a linear regression on the relationship between veggie score and age
A linear model for age and veggie score found that the relationship between age and veggie score was not statistically significant. This means there was no statistically significant relationship between age and veggie score. Veggie score did not vary with age in this sample. Figure 2 shows there is little to no correlation between age and veggie score in the sample.
# pearson correltion for percieved health status and veggie score cor.test(combined$HEALTHSTATUS, combined$VEGGIE, method ="pearson", use ="complete.obs")
Pearson's product-moment correlation
data: combined$HEALTHSTATUS and combined$VEGGIE
t = 0.91818, df = 48, p-value = 0.3631
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1525464 0.3952727
sample estimates:
cor
0.1313798
#source: (cor Function in R, 2025) https://www.r-bloggers.com/2025/03/cor-function-in-r-calculate-correlation-coefficients-in-r/#explanation: running a pearson correlation test to examine statistical significance of the relationship between reported health status and veggie score
Finally, a Pearson correlation showed no statistically significant relationship between reported health status and veggie score, suggesting that veggie score does not predict reported health status (r = 0.13, p = 0.3631). Looking at Figure 1, it may appear as though a higher reported health status would be associated with a higher average veggie score. For instance, among those who reported a health status of 5, meaning excellent, the average veggie score was about 325. On the other hand, the average veggie score for participants who reported a health status of 1, meaning poor, was just over 250. However, the findings fail to reject the null hypothesis, which states that reported health status is not correlated with veggie score. Individuals report the same level of health status regardless of their veggie score.
2 Discussion
2.1 Main Findngs
2.1.1 Qualitative
The main qualitative findings of this study are that students primarily conceptualize healthy foods in terms of macronutrients, micronutrients, and processing, and that they make their actual meal decisions based on the food label (different macro and micronutrients), convenience, cravings, emotional satisfaction, and price. There was a significant overlap of lay beliefs across dimensions- participants often conceptualized healthy foods using nutrient-based reasoning, and also made actual decisions around meals with the same nutrient-based reasoning. However, many students also disregarded their beliefs regarding healthy foods when actually selecting meals. These students opted instead to make decisions based on convenience, cravings, and poor offerings. These findings suggest a significant belief-behavior gap, in which conceptual understanding of nutrition fails to directly translate into proper dietary practice. Although the chi-squared analysis showed no significant differences between lay beliefs and the sociodemographic variable gender, it is clear that many environmental factors influenced students' decision-making when selecting meals.
2.1.2 Quantitative
The main quantitative finding of this study is that males, on average, have statistically significantly higher veggie scores than females. Higher veggies scores indicate that males in Broome County consume higher levels of fruits and vegetables than females do. Statistical tests also showed that location was not a statistically significant indicator of veggie score, suggesting that Binghamton University students are not any more or less likely to consume fruits and vegetables than Broome County residents. Additionally, age was also found to be a nonviable predictor of veggie score. The quantitative findings of this study suggest that gender may be a viable indicator of fruit and vegetable consumption among those in Broome County.
2.2 Contextualization
The current literature is contradictory, so these findings are both supported and challenged by other research. This study found that males tended to consume more fruits and vegetables than females did. In contrast, a study conducted in 2003 focused on older adults and found that women ate significantly more fruit and vegetables than men. Only 16% of men were found to eat the recommended five or more servings of fruit and vegetables a day compared to about 34% of women (Baker & Wardle, 2003). However, a 2013 study that collected dietary scores found that adult women had higher total dietary scores than men did (Hiza et al., 2013). These opposing findings suggest that more work is needed to identify indicators of dietary behaviors by sex. On the other hand, the qualitative findings clearly reinforce the existing literature, which shows that young adults frequently make food choices based on price, convenience, and environmental access rather than solely on nutritional information (Li et al., 2022). These findings also align with the preexisting literature by asserting that lay beliefs, emotional reasoning, and environmental factors are important determinants of food selection and, by extension, health.
2.3 Hypothesis
For the quantitative question, it was hypothesized that sex, age, and location would predict veggie scores. More specifically, it was hypothesized that males would have higher veggie scores than females; older participants would have higher veggie scores on average; and Broome County residents would also have higher veggie scores on average. The former hypothesis was supported, but the latter two hypotheses showed no significant differences and were therefore not supported. For the qualitative portion, the researchers hypothesized that a variety of lay beliefs would provide an explanation for the variability of veggie meter scores. In other words, it was hypothesized that students’ knowledge and beliefs would not directly translate into proper dietary practices, such as consuming fruits and vegetables, which would lead to a higher veggie meter score. This hypothesis was supported by a variety of environmental factors that participants reported, which directly influenced decision-making processes when selecting nutritious foods such as fruits and vegetables.
2.4 Theoretical Findings
These findings align with the socio-ecological model, which postulates that students make food choices based on a variety of individual factors (personal beliefs and nutrition knowledge) and also a variety of external factors (time, convenience, and current offerings) The finding that these external factors often impede individual choices is consistent with the ecological determinants of health (Wardle et al., 2000).
2.5 Evidence-Based Conclusions
This study found, using an independent-samples t-test (alpha = 0.05), that veggie score varies by sex (male or female). The test generated a t-statistic of 2.0588 and a p-value of 0.04088. The mean veggie meter score for men in the sample (m = 313.2073) was higher than that of female participants (m = 287.6216). Another independent-samples t-test (alpha = 0.05) was conducted to examine the relationship between the location surveyed (Binghamton University or Broome County Farmers Market) and participant veggie scores. The test (alpha = 0.05) found no significant difference in veggie scores between those sampled at Binghamton University and those sampled at the Broome County Farmers Market (t = 1.8057, p = 0.07482). Despite this, the mean veggie score of those surveyed at the Broome County Farmers Market was lower (m = 277.8545) than that of those surveyed at Binghamton University (m = 306.7174).
Additionally, a Pearson correlation test was conducted to determine the relationship between perceived health status and veggie score. This test did not produce significant results at an alpha level of 0.05 (r = 0.1313, p = 0.3631). These results suggest that there is no relationship between age and veggie score within study participants. A linear regression was also conducted to examine the relationship between veggie score and participants' age. The linear regression did not produce significant results at an alpha level of 0.05 (F = 1.47, p = 0.2268). These results suggest that veggie score does not vary by subject’s perceived health status or age.
2.6 Broader Context and Future Work
The quantitative findings of this study suggest that future work should further examine location and age as predictors of fruit and vegetable consumption. This study focused only on one university and one county in New York state, and, therefore, is not completely representative of New York state, universities, or the country as a whole. Future works should broaden their scope to include more than one university and/or multiple counties to create more representative samples and generalizable results. Qualitative findings included in this study show that future nutritional interventions need to go beyond simply providing factual information regarding nutrition and food selection—they need to pair factual information with environmental strategies such as improving the offerings of on-campus dining and promoting tasty, convenient, healthy food recipes for those who do not rely on campus dining. In short, future nutrition interventions should incorporate both individual knowledge and beliefs, and environmental factors that impact how students feel and act in real-world contexts.
2.7 Limitations
Discrepancies in this research may stem from a lack of literature on the Veggie Meter, including, but not limited to, methods for reducing variability in veggie scores, the duration of carotenoid presence in the skin and blood, and unforeseen environmental effects on veggie score readings. Additionally, the sample size presents a significant limitation for the qualitative results of this study. The design of the study led to the quantitative data being collected first (the veggie meter was the first survey given to participants), and all of the participants of this study who were Binghamton students were invited to partake in the second survey, where qualitative data were derived. Unfortunately, significant attrition was observed between these two surveys—many participants who completed the Veggie Meter and entered their biological data failed to fill out the second survey and therefore failed to provide qualitative data. A larger, more diverse sample would greatly strengthen this study’s statistical power, as the current sample underrepresented some racial groups. In addition, the high amount of blank or null responses in the qualitative data raises concerns about sample size and the validity of findings. Furthermore, research using larger, more representative samples is necessary to clarify whether location, sex, and age are true predictors of veggie score. Additional studies may also examine veggie scores longitudinally to better understand carotenoid level variability.
2.8 Broader Implications
Dietary interventions on Binghamton University’s campus and in Broome County should consider sex when developing community interventions, as sex appears to affect fruit and vegetable consumption. Additionally, it should be considered that Binghamton University’s campus does not have worse access to fruits and vegetables than the surrounding county. This would, however, raise questions regarding the quality of fruit and vegetable access in Broome County as a whole. The qualitative findings show that future nutritional interventions need to go beyond simply providing factual information regarding nutrition and food selection- they need to pair factual information with environmental strategies such as improving the offerings of on-campus dining and promoting tasty, convenient, and nutritious food recipes for those who do not rely on campus dining. In short, future nutrition interventions should incorporate both individual knowledge, beliefs, and environmental factors that impact how students feel and act in real-world contexts.