An introduction to dplyr and ggplot2: Most popular drink brand in the party

I currently work as a data scientist in NYC Data Science Academy. Last night, we had a fantastic open house party to introduce our 12-Week Data Science bootcamp.

It’s a good chance to learn about what is data science, what will you learn in the bootcamp and make friends. To me, it’s a good chance to eat free food and get free drinks! During the event, I suddenly got a question:

What is the most popular drink brand  in the Open House Party ?

To answer this question, I did three steps. These would be a good introduction to two R packages (“ggplot2” and “dplyr”)

Step 1: Collect data

Data = read.csv(“OpenHouseParty.csv”,header=T)


[1] Samuel Adams Beer Angry Orchard Corona Extra
[4] Angry Orchard Samuel Adams Beer Angry Orchard
[7] Angry Orchard Angry Orchard Angry Orchard
[10] Corona Extra Corona Extra Blue Moon
[13] Blue Moon Angry Orchard Angry Orchard
[16] Angry Orchard
4 Levels: Angry Orchard Blue Moon … Samuel Adams Beer


Step 2: Group data by Brand

NewData = Data %>%
Source: local data frame [4 x 2]

Brand Count
(fctr) (int)
1 Angry Orchard 9
2 Blue Moon 2
3 Corona Extra 3
4 Samuel Adams Beer 2

Step 3:Create a bar chart




So my conclusion is I will highly recommend my coworkers to order more Angry Orchard Hard Cider for the next open house event, even though my sample size is relevant small.



