An introduction to dplyr and ggplot2: Most popular drink brand in the party

I currently work as a data scientist in NYC Data Science Academy. Last night, we had a fantastic open house party to introduce our 12-Week Data Science bootcamp.

It’s a good chance to learn about what is data science, what will you learn in the bootcamp and make friends. To me, it’s a good chance to eat free food and get free drinks! During the event, I suddenly got a question:

What is the most popular drink brand  in the Open House Party ?

To answer this question, I did three steps. These would be a good introduction to two R packages (“ggplot2” and “dplyr”)

Step 1: Collect data

IMG_6057 (1)


Data = read.csv(“OpenHouseParty.csv”,header=T)


[1] Samuel Adams Beer Angry Orchard Corona Extra
[4] Angry Orchard Samuel Adams Beer Angry Orchard
[7] Angry Orchard Angry Orchard Angry Orchard
[10] Corona Extra Corona Extra Blue Moon
[13] Blue Moon Angry Orchard Angry Orchard
[16] Angry Orchard
4 Levels: Angry Orchard Blue Moon … Samuel Adams Beer


Step 2: Group data by Brand

IMG_6056 (1)

NewData = Data %>%
Source: local data frame [4 x 2]

Brand Count
(fctr) (int)
1 Angry Orchard 9
2 Blue Moon 2
3 Corona Extra 3
4 Samuel Adams Beer 2

Step 3:Create a bar chart




So my conclusion is I will highly recommend my coworkers to order more Angry Orchard Hard Cider for the next open house event, even though my sample size is relevant small.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s