I currently work as a data scientist in NYC Data Science Academy. Last night, we had a fantastic open house party to introduce our 12-Week Data Science bootcamp.
It’s a good chance to learn about what is data science, what will you learn in the bootcamp and make friends. To me, it’s a good chance to eat free food and get free drinks! During the event, I suddenly got a question:
What is the most popular drink brand in the Open House Party ?
To answer this question, I did three steps. These would be a good introduction to two R packages (“ggplot2” and “dplyr”)
Step 1: Collect data
Data = read.csv(“OpenHouseParty.csv”,header=T)
 Samuel Adams Beer Angry Orchard Corona Extra
 Angry Orchard Samuel Adams Beer Angry Orchard
 Angry Orchard Angry Orchard Angry Orchard
 Corona Extra Corona Extra Blue Moon
 Blue Moon Angry Orchard Angry Orchard
 Angry Orchard
4 Levels: Angry Orchard Blue Moon … Samuel Adams Beer
Step 2: Group data by Brand
NewData = Data %>%
Source: local data frame [4 x 2]
1 Angry Orchard 9
2 Blue Moon 2
3 Corona Extra 3
4 Samuel Adams Beer 2
Step 3:Create a bar chart
So my conclusion is I will highly recommend my coworkers to order more Angry Orchard Hard Cider for the next open house event, even though my sample size is relevant small.