Tidy Tuesday on Coffee Ratings Dataset

Exploratory Data Analysis on Coffee Ratings

lruolin
02-09-2021

Objective

Load packages

library(pacman)
p_load(tidyverse,skimr,tidytuesdayR, ggthemes, GGally, broom)

Import

tuesdata <- tidytuesdayR::tt_load(2020, week = 28)

    Downloading file 1 of 1: `coffee_ratings.csv`
coffee_ratings <- tuesdata$coffee_ratings

Understanding the data

Skimming the data using the skimr package.

skim(coffee_ratings)
Table 1: Data summary
Name coffee_ratings
Number of rows 1339
Number of columns 43
_______________________
Column type frequency:
character 24
numeric 19
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
species 0 1.00 7 7 0 2 0
owner 7 0.99 3 50 0 315 0
country_of_origin 1 1.00 4 28 0 36 0
farm_name 359 0.73 1 73 0 571 0
lot_number 1063 0.21 1 71 0 227 0
mill 315 0.76 1 77 0 460 0
ico_number 151 0.89 1 40 0 847 0
company 209 0.84 3 73 0 281 0
altitude 226 0.83 1 41 0 396 0
region 59 0.96 2 76 0 356 0
producer 231 0.83 1 100 0 691 0
bag_weight 0 1.00 1 8 0 56 0
in_country_partner 0 1.00 7 85 0 27 0
harvest_year 47 0.96 3 24 0 46 0
grading_date 0 1.00 13 20 0 567 0
owner_1 7 0.99 3 50 0 319 0
variety 226 0.83 4 21 0 29 0
processing_method 170 0.87 5 25 0 5 0
color 218 0.84 4 12 0 4 0
expiration 0 1.00 13 20 0 566 0
certification_body 0 1.00 7 85 0 26 0
certification_address 0 1.00 40 40 0 32 0
certification_contact 0 1.00 40 40 0 29 0
unit_of_measurement 0 1.00 1 2 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total_cup_points 0 1.00 82.09 3.50 0 81.08 82.50 83.67 90.58 ▁▁▁▁▇
number_of_bags 0 1.00 154.18 129.99 0 14.00 175.00 275.00 1062.00 ▇▇▁▁▁
aroma 0 1.00 7.57 0.38 0 7.42 7.58 7.75 8.75 ▁▁▁▁▇
flavor 0 1.00 7.52 0.40 0 7.33 7.58 7.75 8.83 ▁▁▁▁▇
aftertaste 0 1.00 7.40 0.40 0 7.25 7.42 7.58 8.67 ▁▁▁▁▇
acidity 0 1.00 7.54 0.38 0 7.33 7.58 7.75 8.75 ▁▁▁▁▇
body 0 1.00 7.52 0.37 0 7.33 7.50 7.67 8.58 ▁▁▁▁▇
balance 0 1.00 7.52 0.41 0 7.33 7.50 7.75 8.75 ▁▁▁▁▇
uniformity 0 1.00 9.83 0.55 0 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup 0 1.00 9.84 0.76 0 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness 0 1.00 9.86 0.62 0 10.00 10.00 10.00 10.00 ▁▁▁▁▇
cupper_points 0 1.00 7.50 0.47 0 7.25 7.50 7.75 10.00 ▁▁▁▇▁
moisture 0 1.00 0.09 0.05 0 0.09 0.11 0.12 0.28 ▃▇▅▁▁
category_one_defects 0 1.00 0.48 2.55 0 0.00 0.00 0.00 63.00 ▇▁▁▁▁
quakers 1 1.00 0.17 0.83 0 0.00 0.00 0.00 11.00 ▇▁▁▁▁
category_two_defects 0 1.00 3.56 5.31 0 0.00 2.00 4.00 55.00 ▇▁▁▁▁
altitude_low_meters 230 0.83 1750.71 8669.44 1 1100.00 1310.64 1600.00 190164.00 ▇▁▁▁▁
altitude_high_meters 230 0.83 1799.35 8668.81 1 1100.00 1350.00 1650.00 190164.00 ▇▁▁▁▁
altitude_mean_meters 230 0.83 1775.03 8668.63 1 1100.00 1310.64 1600.00 190164.00 ▇▁▁▁▁

This is an incomplete dataset. I am not familiar with all the terms, such as ICO number, altitude, certification details.

To address my focal questions, I would need to take note that there are missing values in:

The distribution for scoring criteria is quite right-skewed. The total cup points is also very right skewed, most of the coffee graded are probably good coffee, so this may not be a representative dataset since it only contains information on above average coffee, but does not show data for average and sub-par coffee.

Species: Arabica vs Robusta

coffee_ratings %>% 
  select(species) %>% 
  count(species) %>% # equivalent to df %>% group_by(a, b) %>% summarise(n = n()).
  mutate(percentage = n/sum(n)*100) %>%  # need not group by first
  ggplot(aes(species,percentage)) +
  geom_col(aes(fill = species)) +
  scale_fill_few() + # ggthemes: Color scales from Few's "Practical Rules for Using Color in Charts"
  labs(title = "Distribution of Arabica and Robusta coffe",
       subtitle = "Most of the coffee graded are Arabica coffee",
       x = "Species",
       y = "Percentage of samples",
       caption = "Source: Coffee Quality Institute") +
  theme_clean()

Even though there is very little representation from Robusta coffee, which is considered to be a more inferior type, out of curiosity and for data exploratory purposes, I will look at the averate total cup score. Personally, I prefer the Robusta type of coffee unique to Singapore and Malaysia because of the way coffee beans are fried with butter and sugar, which gives it a unique aromatic taste.

coffee_ratings %>% 
  select(species,total_cup_points) %>% 
  group_by(species) %>% 
  summarise(mean = mean(total_cup_points)) %>% 
  ggplot(aes(x = species, y = mean, label = round(mean,1))) +
  geom_col(aes(fill = species)) +
  geom_text(aes(label = round(mean,1)), vjust = -0.5) +
  scale_fill_few() +
  labs(title = "Mean Total Cup Points for Arabica and Robusta",
       subtitle = "Arabica has higher mean score than Robusta",
       caption  = "Source: Coffee Quality Institute") +
  ylim(0,100) +
  theme_clean()

Processing method

To compare like with like, I will look the effect of processing methods on scores for Arabica coffee only.

arabica <- coffee_ratings %>% 
  filter(species == "Arabica")

The plot below shows what the commonly used processing methods are.

arabica %>% 
  filter(!is.na(processing_method)) %>% 
  count(processing_method) %>% 
  mutate(percentage = n/sum(n)*100) %>% 
  arrange(desc(percentage)) %>% 
  ggplot(aes(reorder(processing_method, percentage),percentage),
         label = round(percentage,1)) +
  geom_col(aes(fill = processing_method), width = 0.75) +
  scale_color_few() +
  geom_text(aes(label = round(percentage,1), hjust = -0.15)) +
  labs(title = "Distribution by Processing Method",
       subtitle = "Most of the Arabica Coffee were either Wet or Dry Processed",
       caption = "Source: Coffee Quality Institute",
       x = NULL) +
  coord_flip() +
  theme_clean() +
  theme(legend.position = "none")

I did some reading online (see Reference section below), and found that there were three main types of processing methods:

For the purpose of comparing the scores across processing methods, I will just look at Wet vs Dry processing.

However, it is important to compare like with like for different processing methods. What does the total cup points mean? The total cup points could be used as a classifier:

I will add in the class into the dataset to compare effect of processing method in the class with the most datapoints.

EDA on total cup points

sensory <- coffee_ratings %>% 
  select(total_cup_points, species, country_of_origin,
         processing_method:category_two_defects)

sensory %>% 
  ggplot(aes(total_cup_points)) +
  geom_histogram(fill = "chocolate4") +
  theme_few()
min(sensory$total_cup_points)  # 0 : has missing values
[1] 0
table(sensory$total_cup_points) # 1 missing value, lowest is 59.83

    0 59.83 63.08 67.92 68.33 69.17 69.33 70.67 70.75    71 71.08 
    1     1     1     1     1     2     1     1     1     1     1 
71.75 72.33 72.58 72.83 72.92 73.42  73.5 73.67 73.75 73.83    74 
    1     1     1     1     1     1     1     1     1     1     1 
74.33 74.42 74.67 74.75 74.83 74.92    75 75.08 75.17  75.5 75.58 
    2     1     1     2     1     1     1     1     3     1     2 
75.67 75.83    76 76.08 76.17 76.25 76.33 76.42  76.5 76.75 76.83 
    1     1     1     1     3     1     2     1     1     1     1 
   77 77.17 77.25 77.33 77.42  77.5 77.58 77.67 77.83 77.92    78 
    1     2     2     3     1     1     1     1     3     3     8 
78.08 78.17 78.25 78.33 78.42  78.5 78.58 78.67 78.75 78.83 78.92 
    2     1     2     5     2     3     7     2     6     1     2 
   79 79.08 79.17 79.25 79.33 79.42  79.5 79.58 79.67 79.75 79.83 
    6     6     8     2     6     3     5     4     8    13     5 
79.92    80 80.08 80.17 80.25 80.33 80.42  80.5 80.58 80.67 80.75 
    9     8     8    11    11     8     7    12     9    11    12 
80.83 80.92    81 81.08 81.17 81.25 81.33 81.42  81.5 81.58 81.67 
    7    18    15    12    15    10    12    17    26    17    25 
81.75 81.83 81.92    82 82.08 82.17 82.25 82.33 82.42  82.5 82.58 
   12    26    18    21    17    21    22    29    32    23    21 
82.67 82.75 82.83 82.92    83 83.08 83.17 83.25 83.33 83.38 83.42 
   26    30    19    26    39    18    38    25    20     1    20 
 83.5 83.58 83.67 83.75 83.83 83.92    84 84.08 84.13 84.17 84.25 
   25    16    21    20    21    16    18     8     1    21    19 
84.33 84.42  84.5 84.58 84.67 84.75 84.83 84.92    85 85.08 85.17 
   12     8    13    14    19     5     5     9    10     8     2 
85.25 85.33 85.42  85.5 85.58 85.75 85.83 85.92    86 86.08 86.17 
    3     8     5     5     3     3     4     3     6     3     4 
86.25 86.33 86.42  86.5 86.58 86.67 86.83 86.92 87.08 87.17 87.25 
    5     1     1     1     2     1     1     2     2     2     3 
87.33 87.42 87.58 87.83 87.92 88.08 88.25 88.42 88.67 88.75 88.83 
    1     1     1     1     3     1     1     1     1     1     2 
   89 89.75 89.92 90.58 
    1     1     1     1 

Creating a classification variable

sensory_with_category <- sensory %>% 
  filter(total_cup_points != 0) %>% # remove zero score
  mutate(classification = ifelse(total_cup_points > 95, "Super Premium Specialty",
                                 ifelse(total_cup_points >90, "Premium Specialty",
                                        ifelse(total_cup_points >85, "Specialty",
                                               ifelse(total_cup_points >80, "Premium",
                                                      ifelse(total_cup_points >75, "Usual Good Quality",
                                                             ifelse(total_cup_points >70, "Average Quality",
                                                                    ifelse(total_cup_points >60, "Exchange grade",
                                                                           "Commercial grade"))))))))

Understanding the coffee with the highest score:

sensory_with_category %>% 
  select(total_cup_points, classification) %>% 
  arrange(desc(total_cup_points))
# A tibble: 1,338 x 2
   total_cup_points classification   
              <dbl> <chr>            
 1             90.6 Premium Specialty
 2             89.9 Specialty        
 3             89.8 Specialty        
 4             89   Specialty        
 5             88.8 Specialty        
 6             88.8 Specialty        
 7             88.8 Specialty        
 8             88.7 Specialty        
 9             88.4 Specialty        
10             88.2 Specialty        
# … with 1,328 more rows
min(coffee_ratings$total_cup_points)
[1] 0
# which coffee had the highest score?
coffee_ratings %>% 
  filter(total_cup_points == max(coffee_ratings$total_cup_points)) %>% 
  t() # transpose
                      [,1]                                      
total_cup_points      "90.58"                                   
species               "Arabica"                                 
owner                 "metad plc"                               
country_of_origin     "Ethiopia"                                
farm_name             "metad plc"                               
lot_number            NA                                        
mill                  "metad plc"                               
ico_number            "2014/2015"                               
company               "metad agricultural developmet plc"       
altitude              "1950-2200"                               
region                "guji-hambela"                            
producer              "METAD PLC"                               
number_of_bags        "300"                                     
bag_weight            "60 kg"                                   
in_country_partner    "METAD Agricultural Development plc"      
harvest_year          "2014"                                    
grading_date          "April 4th, 2015"                         
owner_1               "metad plc"                               
variety               NA                                        
processing_method     "Washed / Wet"                            
aroma                 "8.67"                                    
flavor                "8.83"                                    
aftertaste            "8.67"                                    
acidity               "8.75"                                    
body                  "8.5"                                     
balance               "8.42"                                    
uniformity            "10"                                      
clean_cup             "10"                                      
sweetness             "10"                                      
cupper_points         "8.75"                                    
moisture              "0.12"                                    
category_one_defects  "0"                                       
quakers               "0"                                       
color                 "Green"                                   
category_two_defects  "0"                                       
expiration            "April 3rd, 2016"                         
certification_body    "METAD Agricultural Development plc"      
certification_address "309fcf77415a3661ae83e027f7e5f05dad786e44"
certification_contact "19fef5a731de2db57d16da10287413f5f99bc2dd"
unit_of_measurement   "m"                                       
altitude_low_meters   "1950"                                    
altitude_high_meters  "2200"                                    
altitude_mean_meters  "2075"                                    
# which coffee had the lowest score?
coffee_ratings %>% 
  filter(total_cup_points == 59.83) %>% 
  t() # transpose
                      [,1]                                      
total_cup_points      "59.83"                                   
species               "Arabica"                                 
owner                 "juan luis alvarado romero"               
country_of_origin     "Guatemala"                               
farm_name             "finca el limon"                          
lot_number            NA                                        
mill                  "beneficio serben"                        
ico_number            "11/853/165"                              
company               "unicafe"                                 
altitude              "4650"                                    
region                "nuevo oriente"                           
producer              "WILLIAM ESTUARDO MARTINEZ PACHECO"       
number_of_bags        "275"                                     
bag_weight            "1 kg"                                    
in_country_partner    "Asociacion Nacional Del Café"            
harvest_year          "2012"                                    
grading_date          "May 24th, 2012"                          
owner_1               "Juan Luis Alvarado Romero"               
variety               "Catuai"                                  
processing_method     "Washed / Wet"                            
aroma                 "7.5"                                     
flavor                "6.67"                                    
aftertaste            "6.67"                                    
acidity               "7.67"                                    
body                  "7.33"                                    
balance               "6.67"                                    
uniformity            "8"                                       
clean_cup             "1.33"                                    
sweetness             "1.33"                                    
cupper_points         "6.67"                                    
moisture              "0.1"                                     
category_one_defects  "0"                                       
quakers               "0"                                       
color                 "Green"                                   
category_two_defects  "4"                                       
expiration            "May 24th, 2013"                          
certification_body    "Asociacion Nacional Del Café"            
certification_address "b1f20fe3a819fd6b2ee0eb8fdc3da256604f1e53"
certification_contact "724f04ad10ed31dbb9d260f0dfd221ba48be8a95"
unit_of_measurement   "ft"                                      
altitude_low_meters   "1417.32"                                 
altitude_high_meters  "1417.32"                                 
altitude_mean_meters  "1417.32"                                 
# min score is actually 0, which is a missing datapoint.
# distribution of types of coffee
sensory_with_category %>% 
  filter(species == "Arabica",
         processing_method %in% c("Natural / Dry", "Washed / Wet")) %>% 
  count(classification, processing_method) %>% 
  ggplot(aes(fct_reorder(classification, n), n, label = n)) + 
  geom_col(aes(fill = classification)) +
  scale_color_few() +
  labs(title = "Distribution of types of Arabica coffees, by processing method",
       subtitle = "Most of the premium coffee (with cup scores 80 - 84) are processed by Washed/Wet method.",
       caption = "Source: Coffee Quality Institute") +
  facet_grid(processing_method ~. ) +
  theme_clean() +
  coord_flip() +
  theme(legend.position = "none")

The Premium category has the most number of datapoints, and I will focus on this category for analysis.

plot_sensory_total_boxplot <- sensory_with_category %>% 
  filter(classification == "Premium",
         species == "Arabica",
         processing_method %in% c("Natural / Dry", "Washed / Wet")) %>% 
  mutate(processing_mtd_fct = ifelse(processing_method == c("Natural / Dry"), "Dry",
                                     "Wet")) %>% 
  select(total_cup_points, processing_mtd_fct) %>% 
  ggplot(aes(x = processing_mtd_fct, y = total_cup_points)) +
  geom_boxplot(aes(col = processing_mtd_fct),notch = T) +
  stat_summary(fun.data = "mean_cl_normal",
           geom = "errorbar",
           fun.args = (conf.int = 0.95),
           color = "forestgreen") +
  geom_jitter(aes(col = processing_mtd_fct), alpha = 0.3) +
  scale_color_manual(values = c("Dry" = "chocolate4",
                                "Wet" = "cadetblue4")) +
  labs(title = "Comparison of Mean Total Cup Points for Dry vs Wet Processing in Arabica Coffee",
       subtitle = "The Mean Total Cup Points are very similar for both processing methods",
       caption = "Source: Coffee Quality Institute",
       x = "Processing Method",
       y = "Total Cup Points") +
  theme_few() +
  theme(legend.position = "none")

plot_sensory_total_boxplot 

plot_sensory_boxplot <- sensory_with_category %>% 
  filter(classification == "Premium",
         species == "Arabica",
         processing_method %in% c("Natural / Dry", "Washed / Wet")) %>% 
  mutate(processing_mtd_fct = ifelse(processing_method == c("Natural / Dry"), "Dry",
                                     "Wet")) %>% 
  select(-quakers, -color, - category_one_defects, 
         - category_two_defects, - processing_method) %>% 
  pivot_longer(cols = aroma:cupper_points,
               names_to = "parameters",
               values_to = "score") %>% 
  mutate(parameters_fct = factor(parameters,
                                 levels = c("acidity", "aroma", "clean_cup",
                                            "sweetness", "uniformity", "aftertaste",
                                            "balance", "body", "cupper_points", "flavor"
                                 ))) %>% 
  ggplot(aes(x = processing_mtd_fct, y = score)) +
  geom_boxplot(aes(col = processing_mtd_fct), notch = T, size = 1) +
  geom_jitter(aes(col = processing_mtd_fct), alpha = 0.1) +
  scale_color_manual(values = c("Dry" = "chocolate4",
                                "Wet" = "cadetblue4")) +
  facet_wrap(vars(parameters_fct), scales = "free", ncol= 5) +
  labs(x = NULL,
       title = "Comparison of mean score for Arabica coffee: Dry vs Wet Processing",
       subtitle = "Wet processed coffee has higher average scores for acidity, aroma, clean_cup, sweetness, uniformity.",
       caption = "Source: Coffee Quality Institute") +
  theme_few() +
  theme(legend.position = "none")

plot_sensory_boxplot

Country of origin/Owner

# plot to see which countries are above/below mean rating

arabica_dotplot <- arabica %>% 
  filter(!is.na(country_of_origin)) %>% # 1 missing value
  group_by(country_of_origin) %>% 
  summarise(mean_rating = mean(total_cup_points)) %>% 
  mutate(above_below_mean = as.factor(ifelse(mean_rating > mean(arabica$total_cup_points),
                                             "above_mean", "below_mean"))) %>% 
  ggplot(aes(x = reorder(country_of_origin, mean_rating), 
             y = mean_rating, 
             col = above_below_mean,
             label = round(mean_rating,1))) +
  geom_point(aes(col = above_below_mean), stat = "identity", size = 9) +
  scale_color_few() +
  geom_text(col = "black", size = 4) +
  geom_hline(aes(yintercept = mean(arabica$total_cup_points)), size = 2,
             col = "grey")+
  labs(title = "Dot plot for Arabica Coffee Ratings",
       subtitle = "Countries with ratings above mean values are coloured blue,\nand countries below mean values are colored orange.",
       x =  "Country of Origin",
       y = "Mean Rating",
       caption = "Source: Coffee Quality Institute") +
  coord_flip() +
  theme_clean() +
  theme(legend.position = "none",
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14),
        title = element_text(size = 20, face = "bold"))

arabica_dotplot

# sensory scores for arabica coffee, top scorers for sensory

sensory_by_country <- coffee_ratings %>% 
  filter(species == "Arabica",
         !total_cup_points %in% 0,
         !is.na(country_of_origin),
         !is.na(owner)) %>% 
  select(country_of_origin, owner, 
         total_cup_points, aroma:cupper_points)
skim(sensory_by_country)
Table 2: Data summary
Name sensory_by_country
Number of rows 1302
Number of columns 13
_______________________
Column type frequency:
character 2
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country_of_origin 0 1 4 28 0 36 0
owner 0 1 3 50 0 305 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total_cup_points 0 1 82.18 2.69 59.83 81.17 82.50 83.67 90.58 ▁▁▁▇▁
aroma 0 1 7.57 0.32 5.08 7.42 7.58 7.75 8.75 ▁▁▂▇▁
flavor 0 1 7.52 0.34 6.08 7.33 7.58 7.75 8.83 ▁▂▇▃▁
aftertaste 0 1 7.40 0.35 6.17 7.25 7.42 7.58 8.67 ▁▃▇▂▁
acidity 0 1 7.54 0.32 5.25 7.33 7.50 7.75 8.75 ▁▁▃▇▁
body 0 1 7.52 0.29 5.25 7.33 7.50 7.67 8.58 ▁▁▁▇▁
balance 0 1 7.52 0.35 6.08 7.33 7.50 7.75 8.75 ▁▂▇▃▁
uniformity 0 1 9.84 0.49 6.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup 0 1 9.84 0.72 0.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness 0 1 9.91 0.46 1.33 10.00 10.00 10.00 10.00 ▁▁▁▁▇
cupper_points 0 1 7.50 0.43 5.17 7.25 7.50 7.75 10.00 ▁▂▇▁▁

Looking at the coffee with clean cup score = 0: Is it really that the coffee had a score of 0? Or was it a data entry mistake?

# there is one datapoint in which clean_cup score = 0
coffee_ratings %>% 
  filter(clean_cup == 0) %>% 
  t() # transpose
                      [,1]                                                  
total_cup_points      "68.33"                                               
species               "Arabica"                                             
owner                 "juan carlos garcia lopez"                            
country_of_origin     "Mexico"                                              
farm_name             "el centenario"                                       
lot_number            NA                                                    
mill                  "la esperanza, municipio juchique de ferrer, veracruz"
ico_number            "1104328663"                                          
company               "terra mia"                                           
altitude              "900"                                                 
region                "juchique de ferrer"                                  
producer              "JUAN CARLOS GARCÍA LOPEZ"                            
number_of_bags        " 12"                                                 
bag_weight            "1 kg"                                                
in_country_partner    "AMECAFE"                                             
harvest_year          "2012"                                                
grading_date          "September 17th, 2012"                                
owner_1               "JUAN CARLOS GARCIA LOPEZ"                            
variety               "Bourbon"                                             
processing_method     "Washed / Wet"                                        
aroma                 "7.08"                                                
flavor                "6.83"                                                
aftertaste            "6.25"                                                
acidity               "7.42"                                                
body                  "7.25"                                                
balance               "6.75"                                                
uniformity            "10"                                                  
clean_cup             "0"                                                   
sweetness             "10"                                                  
cupper_points         "6.75"                                                
moisture              "0.11"                                                
category_one_defects  "0"                                                   
quakers               "0"                                                   
color                 "None"                                                
category_two_defects  "20"                                                  
expiration            "September 17th, 2013"                                
certification_body    "AMECAFE"                                             
certification_address "59e396ad6e22a1c22b248f958e1da2bd8af85272"            
certification_contact "0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7"            
unit_of_measurement   "m"                                                   
altitude_low_meters   " 900"                                                
altitude_high_meters  " 900"                                                
altitude_mean_meters  " 900"                                                
                      [,2]                                      
total_cup_points      " 0.00"                                   
species               "Arabica"                                 
owner                 "bismarck castro"                         
country_of_origin     "Honduras"                                
farm_name             "los hicaques"                            
lot_number            "103"                                     
mill                  "cigrah s.a de c.v."                      
ico_number            "13-111-053"                              
company               "cigrah s.a de c.v"                       
altitude              "1400"                                    
region                "comayagua"                               
producer              "Reinerio Zepeda"                         
number_of_bags        "275"                                     
bag_weight            "69 kg"                                   
in_country_partner    "Instituto Hondureño del Café"            
harvest_year          "2017"                                    
grading_date          "April 28th, 2017"                        
owner_1               "Bismarck Castro"                         
variety               "Caturra"                                 
processing_method     NA                                        
aroma                 "0.00"                                    
flavor                "0.00"                                    
aftertaste            "0.00"                                    
acidity               "0.00"                                    
body                  "0.00"                                    
balance               "0.00"                                    
uniformity            " 0"                                      
clean_cup             "0"                                       
sweetness             " 0"                                      
cupper_points         "0.00"                                    
moisture              "0.12"                                    
category_one_defects  "0"                                       
quakers               "0"                                       
color                 "Green"                                   
category_two_defects  " 2"                                      
expiration            "April 28th, 2018"                        
certification_body    "Instituto Hondureño del Café"            
certification_address "b4660a57e9f8cc613ae5b8f02bfce8634c763ab4"
certification_contact "7f521ca403540f81ec99daec7da19c2788393880"
unit_of_measurement   "m"                                       
altitude_low_meters   "1400"                                    
altitude_high_meters  "1400"                                    
altitude_mean_meters  "1400"                                    
# one is missing value, already filtered out for total_cup_points = 0
# the remaining one looks like it really has 0 for clean cup score

7.08 + 6.83 + 6.25 + 7.42 + 7.25 + 6.75 + 10 + 10  + 6.75 # 68.33
[1] 68.33

It turned out that total cup points is a summation of scores for aroma, flavor, aftertaste, acidity, body, balance, uniformity, clean_cup, sweetness and cupper_points.

country_mean_score <- sensory_by_country %>% 
  group_by(country_of_origin, owner) %>% 
  summarise(mean_score = mean(total_cup_points)) %>% 
  arrange(desc(mean_score)) 

country_mean_score
# A tibble: 350 x 3
# Groups:   country_of_origin [36]
   country_of_origin owner                              mean_score
   <chr>             <chr>                                   <dbl>
 1 Ethiopia          metad plc                                89.8
 2 Guatemala         grounds for health admin                 89.8
 3 Ethiopia          yidnekachew dabessa                      89  
 4 Brazil            ji-ae ahn                                88.8
 5 Peru              hugo valdivia                            88.8
 6 Ethiopia          diamond enterprise plc                   88.2
 7 Ethiopia          mohammed lalo                            88.1
 8 Indonesia         grounds for health admin                 87.4
 9 United States     cqi q coffee sample representative       87.3
10 Mexico            roberto licona franco                    87.2
# … with 340 more rows
min(country_mean_score$mean_score) # 68.33
[1] 68.33
max(country_mean_score$mean_score) # 89.7767
[1] 89.77667

How do the top 8 coffee owners by country compare against each other in terms of the ten scoring criteria?

# plot profile for top 8 owners

top_owners_data <- sensory_by_country%>% 
  group_by(country_of_origin, owner) %>% 
  summarise_at(.vars = vars(total_cup_points:cupper_points),
               .funs = c(mean = "mean")) %>% 
  ungroup() %>% 
  mutate(country_owner = str_c(country_of_origin, owner, sep = ","),
         country_owner_fct = factor(country_owner, 
                                    levels =c("Ethiopia,metad plc", 
                                             "Guatemala,grounds for health admin", 
                                             "Ethiopia,yidnekachew dabessa",
                                             "Brazil,ji-ae ahn",
                                             "Peru,hugo valdivia",
                                             "Ethiopia,diamond enterprise plc",
                                             "Ethiopia,mohammed lalo",
                                             "Indonesia,grounds for health admin"))) %>% 
  group_by(country_owner_fct) %>% 
  arrange(desc(total_cup_points_mean)) %>% 
  ungroup() %>% 
  slice_max(total_cup_points_mean, n = 8) %>% 
  pivot_longer(cols = c(aroma_mean:cupper_points_mean),
               names_to = "parameters",
               values_to = "score") %>% 

  ggplot(aes(x = fct_rev(factor(parameters)), y = score, label = round(score, 1))) +
  geom_point(stat = "identity", aes(col = factor(parameters)), size = 8) +
  geom_text(col = "black", size = 4) +
  facet_wrap(country_owner_fct~., scales = "free_y", ncol = 4) +
  coord_flip() +
  theme_few() +
  theme(legend.position = "none")
  
top_owners_data

The scores for clean_cup, sweetness, uniformity is a perfect 10 for all 8 owners. Slight differences were observed for mean scores for cupper_points, aftertaste and body. These were probably the distinguishing parameters.

Variety

The first few sections above looked mainly at highly scored coffee. Would there be any differenced in scoring profile, if I were to look at different varieties of coffee?

variety_count <- coffee_ratings %>% 
  count(variety) %>% 
  arrange(desc(n)) # 30 observations

head(variety_count, 8) # NA: 226, Other: 226
# A tibble: 8 x 2
  variety            n
  <chr>          <int>
1 Caturra          256
2 Bourbon          226
3 <NA>             226
4 Typica           211
5 Other            110
6 Catuai            74
7 Hawaiian Kona     44
8 Yellow Bourbon    35
tail(variety_count)
# A tibble: 6 x 2
  variety                 n
  <chr>               <int>
1 Ethiopian Heirlooms     1
2 Marigojipe              1
3 Moka Peaberry           1
4 Pache Comun             1
5 Sulawesi                1
6 Sumatra Lintong         1
data_variety <- coffee_ratings %>% 
  select(total_cup_points, species, owner, country_of_origin, processing_method,
         variety, aroma:cupper_points, color) %>% 
  filter(variety %in% c("Caturra", "Bourbon", "Typica", "Catuai", 
                        "Hawaiian Kona", "Yellow Bourbon")) %>% 
  group_by(variety)

glimpse(data_variety)
Rows: 846
Columns: 17
Groups: variety [6]
$ total_cup_points  <dbl> 89.75, 87.17, 86.92, 86.67, 86.42, 86.33,…
$ species           <chr> "Arabica", "Arabica", "Arabica", "Arabica…
$ owner             <chr> "grounds for health admin", "the coffee s…
$ country_of_origin <chr> "Guatemala", "Costa Rica", "Brazil", "Hon…
$ processing_method <chr> NA, "Washed / Wet", "Natural / Dry", NA, …
$ variety           <chr> "Bourbon", "Caturra", "Bourbon", "Caturra…
$ aroma             <dbl> 8.42, 8.08, 8.50, 8.17, 8.50, 8.17, 8.08,…
$ flavor            <dbl> 8.50, 8.25, 8.50, 8.08, 8.17, 7.83, 8.17,…
$ aftertaste        <dbl> 8.42, 8.00, 8.00, 8.08, 8.00, 8.00, 8.00,…
$ acidity           <dbl> 8.42, 8.17, 8.00, 8.00, 7.75, 8.08, 7.92,…
$ body              <dbl> 8.33, 8.00, 8.00, 8.08, 8.00, 7.83, 7.92,…
$ balance           <dbl> 8.42, 8.33, 8.00, 8.00, 8.00, 8.00, 7.83,…
$ uniformity        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ clean_cup         <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ sweetness         <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ cupper_points     <dbl> 9.25, 8.33, 7.92, 8.25, 8.00, 8.42, 8.33,…
$ color             <chr> NA, "Green", "Green", "Green", "Green", "…
# Top 6 coffee by number of datapoints

data_variety %>% 
  count(species) # all arabica
# A tibble: 6 x 3
# Groups:   variety [6]
  variety        species     n
  <chr>          <chr>   <int>
1 Bourbon        Arabica   226
2 Catuai         Arabica    74
3 Caturra        Arabica   256
4 Hawaiian Kona  Arabica    44
5 Typica         Arabica   211
6 Yellow Bourbon Arabica    35
data_variety %>% 
  count(processing_method) # quite varied
# A tibble: 30 x 3
# Groups:   variety [6]
   variety processing_method             n
   <chr>   <chr>                     <int>
 1 Bourbon Natural / Dry                38
 2 Bourbon Other                         2
 3 Bourbon Pulped natural / honey        2
 4 Bourbon Semi-washed / Semi-pulped    11
 5 Bourbon Washed / Wet                170
 6 Bourbon <NA>                          3
 7 Catuai  Natural / Dry                18
 8 Catuai  Pulped natural / honey        2
 9 Catuai  Semi-washed / Semi-pulped     6
10 Catuai  Washed / Wet                 48
# … with 20 more rows
data_variety %>% 
  ungroup() %>% 
  count(country_of_origin) %>% 
  arrange(desc(n))
# A tibble: 25 x 2
   country_of_origin          n
   <chr>                  <int>
 1 Mexico                   195
 2 Guatemala                157
 3 Colombia                 132
 4 Brazil                    92
 5 Taiwan                    66
 6 Costa Rica                44
 7 Honduras                  44
 8 United States (Hawaii)    44
 9 El Salvador               13
10 Nicaragua                 13
# … with 15 more rows
data_variety %>% 
  ungroup() %>% 
  group_by(variety) %>% 
  skim()
Table 3: Data summary
Name Piped data
Number of rows 846
Number of columns 17
_______________________
Column type frequency:
character 5
numeric 11
________________________
Group variables variety

Variable type: character

skim_variable variety n_missing complete_rate min max empty n_unique whitespace
species Bourbon 0 1.00 7 7 0 1 0
species Catuai 0 1.00 7 7 0 1 0
species Caturra 0 1.00 7 7 0 1 0
species Hawaiian Kona 0 1.00 7 7 0 1 0
species Typica 0 1.00 7 7 0 1 0
species Yellow Bourbon 0 1.00 7 7 0 1 0
owner Bourbon 0 1.00 4 50 0 44 0
owner Catuai 2 0.97 5 41 0 29 0
owner Caturra 5 0.98 4 45 0 45 0
owner Hawaiian Kona 0 1.00 15 32 0 2 0
owner Typica 0 1.00 8 47 0 83 0
owner Yellow Bourbon 0 1.00 8 25 0 6 0
country_of_origin Bourbon 0 1.00 6 28 0 11 0
country_of_origin Catuai 0 1.00 6 10 0 8 0
country_of_origin Caturra 0 1.00 4 10 0 13 0
country_of_origin Hawaiian Kona 0 1.00 22 22 0 1 0
country_of_origin Typica 0 1.00 4 11 0 9 0
country_of_origin Yellow Bourbon 0 1.00 6 6 0 2 0
processing_method Bourbon 3 0.99 5 25 0 5 0
processing_method Catuai 0 1.00 12 25 0 4 0
processing_method Caturra 7 0.97 5 25 0 5 0
processing_method Hawaiian Kona 0 1.00 12 13 0 2 0
processing_method Typica 3 0.99 5 25 0 5 0
processing_method Yellow Bourbon 3 0.91 5 25 0 5 0
color Bourbon 12 0.95 4 12 0 4 0
color Catuai 2 0.97 4 12 0 4 0
color Caturra 21 0.92 4 12 0 4 0
color Hawaiian Kona 6 0.86 5 12 0 3 0
color Typica 27 0.87 4 12 0 4 0
color Yellow Bourbon 3 0.91 4 12 0 4 0

Variable type: numeric

skim_variable variety n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total_cup_points Bourbon 0 1 81.95 2.54 68.33 81.00 82.33 83.40 89.75 ▁▁▂▇▁
total_cup_points Catuai 0 1 81.30 3.91 59.83 81.17 81.88 83.06 85.83 ▁▁▁▁▇
total_cup_points Caturra 0 1 82.44 5.59 0.00 82.00 83.12 83.77 87.17 ▁▁▁▁▇
total_cup_points Hawaiian Kona 0 1 81.58 3.10 72.58 80.31 82.62 83.33 86.25 ▁▁▃▇▃
total_cup_points Typica 0 1 81.02 2.59 67.92 79.79 81.50 82.67 85.33 ▁▁▁▇▇
total_cup_points Yellow Bourbon 0 1 82.43 1.58 78.00 81.54 82.42 83.16 86.17 ▁▃▇▅▁
aroma Bourbon 0 1 7.56 0.32 6.17 7.42 7.58 7.67 8.50 ▁▁▆▇▁
aroma Catuai 0 1 7.49 0.30 6.67 7.33 7.50 7.67 8.50 ▁▃▇▂▁
aroma Caturra 0 1 7.58 0.56 0.00 7.50 7.67 7.75 8.25 ▁▁▁▁▇
aroma Hawaiian Kona 0 1 7.51 0.24 6.92 7.33 7.50 7.67 8.08 ▁▅▇▃▂
aroma Typica 0 1 7.47 0.28 6.67 7.25 7.50 7.67 8.17 ▁▅▇▇▂
aroma Yellow Bourbon 0 1 7.50 0.33 6.92 7.25 7.42 7.62 8.42 ▂▇▃▂▂
flavor Bourbon 0 1 7.50 0.36 6.08 7.33 7.50 7.67 8.50 ▁▁▇▇▁
flavor Catuai 0 1 7.43 0.34 6.17 7.33 7.50 7.58 8.00 ▁▁▂▇▃
flavor Caturra 0 1 7.53 0.54 0.00 7.42 7.58 7.75 8.33 ▁▁▁▁▇
flavor Hawaiian Kona 0 1 7.53 0.29 6.92 7.33 7.50 7.75 8.17 ▃▆▆▇▁
flavor Typica 0 1 7.38 0.34 6.33 7.17 7.42 7.58 8.17 ▁▃▇▇▂
flavor Yellow Bourbon 0 1 7.54 0.24 7.00 7.38 7.58 7.67 8.00 ▂▃▇▃▃
aftertaste Bourbon 0 1 7.32 0.36 6.17 7.17 7.33 7.50 8.42 ▁▂▇▂▁
aftertaste Catuai 0 1 7.31 0.35 6.17 7.17 7.33 7.50 8.00 ▁▁▅▇▂
aftertaste Caturra 0 1 7.44 0.54 0.00 7.33 7.50 7.67 8.08 ▁▁▁▁▇
aftertaste Hawaiian Kona 0 1 7.47 0.31 6.83 7.33 7.50 7.69 8.00 ▃▂▇▇▅
aftertaste Typica 0 1 7.28 0.33 6.17 7.08 7.33 7.50 8.00 ▁▂▇▇▃
aftertaste Yellow Bourbon 0 1 7.40 0.27 6.83 7.25 7.42 7.58 8.00 ▂▃▇▅▁
acidity Bourbon 0 1 7.55 0.27 6.83 7.42 7.54 7.67 8.42 ▁▅▇▂▁
acidity Catuai 0 1 7.49 0.32 6.50 7.33 7.50 7.67 8.33 ▁▂▇▃▁
acidity Caturra 0 1 7.52 0.57 0.00 7.33 7.58 7.75 8.25 ▁▁▁▁▇
acidity Hawaiian Kona 0 1 7.59 0.27 6.92 7.40 7.58 7.83 8.00 ▁▅▂▇▇
acidity Typica 0 1 7.40 0.27 6.67 7.25 7.42 7.58 8.33 ▂▇▇▃▁
acidity Yellow Bourbon 0 1 7.47 0.23 6.92 7.33 7.50 7.67 8.00 ▂▇▆▇▁
body Bourbon 0 1 7.50 0.27 6.33 7.33 7.50 7.67 8.33 ▁▁▇▆▁
body Catuai 0 1 7.41 0.28 6.50 7.27 7.42 7.58 7.92 ▁▂▇▇▅
body Caturra 0 1 7.54 0.54 0.00 7.48 7.58 7.75 8.17 ▁▁▁▁▇
body Hawaiian Kona 0 1 7.61 0.26 6.92 7.42 7.67 7.83 8.08 ▁▂▇▇▅
body Typica 0 1 7.40 0.25 6.75 7.25 7.42 7.50 8.33 ▁▇▇▂▁
body Yellow Bourbon 0 1 7.57 0.27 6.92 7.42 7.50 7.71 8.33 ▁▅▇▁▁
balance Bourbon 0 1 7.47 0.32 6.50 7.33 7.50 7.67 8.42 ▁▃▇▅▁
balance Catuai 0 1 7.40 0.37 6.17 7.25 7.42 7.67 8.00 ▁▁▃▇▆
balance Caturra 0 1 7.57 0.58 0.00 7.42 7.58 7.75 8.58 ▁▁▁▁▇
balance Hawaiian Kona 0 1 7.64 0.34 6.83 7.42 7.67 7.92 8.25 ▁▃▇▅▅
balance Typica 0 1 7.35 0.31 6.58 7.17 7.33 7.50 8.25 ▁▃▇▂▁
balance Yellow Bourbon 0 1 7.57 0.24 7.17 7.42 7.50 7.67 8.17 ▅▇▇▃▂
uniformity Bourbon 0 1 9.87 0.39 8.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
uniformity Catuai 0 1 9.85 0.51 8.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
uniformity Caturra 0 1 9.89 0.72 0.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
uniformity Hawaiian Kona 0 1 9.47 0.81 6.67 9.33 10.00 10.00 10.00 ▁▁▁▅▇
uniformity Typica 0 1 9.75 0.60 6.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
uniformity Yellow Bourbon 0 1 9.83 0.59 6.67 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup Bourbon 0 1 9.85 0.80 0.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup Catuai 0 1 9.77 1.10 1.33 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup Caturra 0 1 9.89 0.79 0.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup Hawaiian Kona 0 1 9.53 0.94 6.67 9.33 10.00 10.00 10.00 ▁▁▁▂▇
clean_cup Typica 0 1 9.76 0.89 2.67 10.00 10.00 10.00 10.00 ▁▁▁▁▇
clean_cup Yellow Bourbon 0 1 9.96 0.16 9.33 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness Bourbon 0 1 9.91 0.35 6.67 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness Catuai 0 1 9.75 1.13 1.33 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness Caturra 0 1 9.92 0.71 0.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness Hawaiian Kona 0 1 9.67 0.75 6.67 9.33 10.00 10.00 10.00 ▁▁▁▂▇
sweetness Typica 0 1 9.94 0.36 6.00 10.00 10.00 10.00 10.00 ▁▁▁▁▇
sweetness Yellow Bourbon 0 1 9.98 0.11 9.33 10.00 10.00 10.00 10.00 ▁▁▁▁▇
cupper_points Bourbon 0 1 7.43 0.41 6.00 7.25 7.50 7.67 9.25 ▁▃▇▁▁
cupper_points Catuai 0 1 7.41 0.36 6.33 7.27 7.42 7.58 8.17 ▁▁▇▆▂
cupper_points Caturra 0 1 7.55 0.57 0.00 7.42 7.58 7.75 8.50 ▁▁▁▁▇
cupper_points Hawaiian Kona 0 1 7.56 0.35 6.92 7.33 7.50 7.83 8.33 ▅▇▇▆▂
cupper_points Typica 0 1 7.30 0.39 5.25 7.00 7.33 7.58 8.17 ▁▁▃▇▃
cupper_points Yellow Bourbon 0 1 7.60 0.31 7.00 7.38 7.58 7.75 8.25 ▅▇▇▃▃
data_variety %>% 
  select(variety, total_cup_points) %>% 
  filter(total_cup_points != 0) %>% 
  ggplot(aes(fct_reorder(variety, total_cup_points), total_cup_points)) +
  geom_boxplot(aes(col = variety), show.legend = F) +
  labs(title = "Comparison of Total Cup Points across top 6 varieties \n(by count)",
       subtitle = "Caturra has the higest mean Total Cup Score. Catuai had a wider distribution of scores.",
       x = NULL,
       y = "Total Cup Points",
       caption = "Source: Coffee Quality Institute") +
  geom_jitter(aes(col = variety), alpha = 0.2, show.legend = F) +
  scale_color_few() +
  coord_flip()  +
  theme_few()

dot_plot_variety <- data_variety %>% 
  filter(total_cup_points != 0) %>% 
  select(variety, aroma:cupper_points) %>% 
  group_by(variety) %>% 
  summarise(across(c(aroma:cupper_points), mean)) %>% 
  pivot_longer(cols = c(aroma:cupper_points),
               names_to = "parameters",
               values_to = "score") %>% 
  ggplot(aes(x = fct_reorder(factor(variety), score), y = score, label = round(score, 1))) +
  geom_point(stat = "identity", aes(col = factor(variety)), size = 8) +
  geom_text(col = "black", size = 4) +
  facet_wrap(parameters~., scales = "free", ncol = 4) +
  labs(title = "Breakdown of scoring criteria for top 5 coffee (by count)",
       subtitle = "Scores were quite close for all categories, within +/- 0.2. 
Main areas of differences were in balance, clean_cup, cupper_points, sweetness, uniformity",
       caption = "Source: Coffee Quality Institute",
       x = "Variety",
       y = "Score") +
  coord_flip() +
  theme_few() +
  theme(legend.position = "none",
        axis.title = element_text(face = "bold"))

dot_plot_variety

Canturra had an edge over Hawaiian Kona for aroma, clean_cup, sweetness and uniformity, resulting in higher mean total_cup_points. What is Canturra coffee? It’s actually a mutated type of Bourbon coffee that is known for great flavor.

As mentioned at the beginning, most of the coffee had very high scores in this dataset. Hence, the plots only show a snapshot of the flavor profiles of the scored coffee, but not all the coffee.

Main Learning Pointers

I am really glad to have found this #tidytuesday hashtag, which allows me to practice on readily available datasets and understand how different people in the community approach exploratory data analysis! I am really amazed that there is a dedicated package for loading the dataset with convenience, and this dataset even comes with a data dictionary to understand what each variable means. The R community is really committed to sharing and becoming better, together.

The process of EDA is about getting to know your dataset, through asking questions, which are to be answered by carrying out data transformations and creating data visualizations. One question often leads to another, and EDA is a repetitive process until you finish getting to know your data. There were several aspects that I did not look at, such as the effect of altitude, and the grading dates. I may have concentrated too much on the sensory aspect of coffee since that was the more familiar aspect to me, and should have also looked at geographical region and coffee varieties. As an initial learning exercise, I sharpened my focus and concentrated on the effect of species, variety, processing methods, country/owners.

As the total cup points is a summation of the scores for attributes such as aroma, flavor, etc, I think it is hard to do classification based on these scores. I would prefer to have physicochemical data as well so that differentiation is more objective and to better countercheck the sensorial data. However, this may be a personal bias as I work in the analytical chemistry field. :)

I think coffee is really complex. You can have a poorer grade (Robusta), but the roasting process plays a very important role in flavor development. You can have a very good variety, but the processing method may spoil/enhance its flavor profile. You can have a very good farm/owner, but maybe the year of harvest was particularly good or bad. Hence, it is really important to consider all (both familar and unfamilar) aspects when carrying out data analysis, and this is one area I need to improve on.

Coding wise, I got a chance to practice ggplots, data transformation, filtering and selecting rows and columns, as well as calculating means efficiently by using summarise(across, var, mean). I also managed to create new classifications using ifelse, and used fct_reorder to make my plot better. I like to use theme_clean and scale_color_few for my plots, making aesthetically pleasant plots are a breeze as compared to using Microsoft Excel.

References

https://perfectdailygrind.com/2016/07/washed-natural-honey-coffee-processing-101/ https://www.baristainstitute.com/blog/jori-korhonen/january-2020/coffee-processing-methods-drying-washing-or-honey https://www.coffeechemistry.com/cupping-fundamentals https://www.data-to-viz.com/caveat/spider.html https://www.javapresse.com/blogs/buying-coffee/beginners-guide-coffee-varieties

Citation

For attribution, please cite this work as

lruolin (2021, Feb. 9). pRactice corner: Tidy Tuesday on Coffee Ratings Dataset. Retrieved from https://lruolin.github.io/myBlog/posts/20210209_tidy_tuesday_coffee_ratings/

BibTeX citation

@misc{lruolin2021tidy,
  author = {lruolin, },
  title = {pRactice corner: Tidy Tuesday on Coffee Ratings Dataset},
  url = {https://lruolin.github.io/myBlog/posts/20210209_tidy_tuesday_coffee_ratings/},
  year = {2021}
}