An introduction to dplyr and ggplot2: Most popular drink brand in the party

I currently work as a data scientist in NYC Data Science Academy. Last night, we had a fantastic open house party to introduce our 12-Week Data Science bootcamp.

It’s a good chance to learn about what is data science, what will you learn in the bootcamp and make friends. To me, it’s a good chance to eat free food and get free drinks! During the event, I suddenly got a question:

What is the most popular drink brand  in the Open House Party ?

To answer this question, I did three steps. These would be a good introduction to two R packages (“ggplot2” and “dplyr”)

Step 1: Collect data

IMG_6057 (1)

 

Data = read.csv(“OpenHouseParty.csv”,header=T)

Data

[1] Samuel Adams Beer Angry Orchard Corona Extra
[4] Angry Orchard Samuel Adams Beer Angry Orchard
[7] Angry Orchard Angry Orchard Angry Orchard
[10] Corona Extra Corona Extra Blue Moon
[13] Blue Moon Angry Orchard Angry Orchard
[16] Angry Orchard
4 Levels: Angry Orchard Blue Moon … Samuel Adams Beer

 

Step 2: Group data by Brand

IMG_6056 (1)

library(dplyr)
NewData = Data %>%
  group_by(Brand)%>%
  summarise(Count=n())
NewData
Source: local data frame [4 x 2]

Brand Count
(fctr) (int)
1 Angry Orchard 9
2 Blue Moon 2
3 Corona Extra 3
4 Samuel Adams Beer 2

Step 3:Create a bar chart

IMG_6058

library(ggplot2)
ggplot(NewData,aes(x=Brand,y=Count,fill=Brand))+geom_bar(stat=”identity”)

Rplot01.png

So my conclusion is I will highly recommend my coworkers to order more Angry Orchard Hard Cider for the next open house event, even though my sample size is relevant small.

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s