Part 3.
Select city and restaurant. Check how many cities the top_cool data set contains.
table(top_cool$business_city)

summary(top_cool$business_city)
Length Class Mode
373 character character
I imagine I am a tourist visiting a new town and I would like to know which is the best restaurant around. For example, I extract the city of Phoenix from the dataset.
top_cool_phoenix<- italian %>%
select(reviewer_name, reviewer_cool, reviewer_useful, business_stars, stars, business_name, business_city, text) %>%
filter(reviewer_cool >= 1000, reviewer_useful >= 1000,business_stars >=4, business_city == "Phoenix")
top_cool_phoenix

There are 218 results (208 + 10) and we check the table of the most frequent restaurants in Phoenix according to the top_cool_phoenix dataset. Now we check their names.
table(top_cool_phoenix$business_name)

Some of these restaurants have a lot of reviews. I would like to know more details about “The Parlor” which has 26 reviews. This could be my pick in case I am around that area.

top_cool_parlor <- subset(top_cool_phoenix, business_name == "The Parlor")
top_cool_parlor
write.csv(top_cool_parlor, "top_cool_parlor.csv")
Check mean and median and compare to business stars (4).
mean(top_cool_parlor$stars)
[1] 4.076923
median(top_cool_parlor$stars)
[1] 4
I check the comments, to see whether this will be my pick.
First, I can read all the comments in text columns, to see whether they match with the business_stars variable, which is > than 4 stars. At least, I will get an idea about what expects me.

As an alternative, I can go more in depth and identify frequent keywords.
I create a text file with the text from the text column of top_cool_parlor, then I load the packages:
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
and I choose the text file I have just created:
text <- readLines(file.choose())
Now I load the data as a corpus
docs <- Corpus(VectorSource(text))
inspect(docs)
Now the text transformation and removal of characters I do not need
We clean the text and remember that we can come back later to add words in “Remove your own stop word”, (I run this part a few times, to make sure I remove the words that are not relevant).
The R code below can be used to clean your text :
Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
docs <- tm_map(docs, removeWords, c("parlor", "just", "can", "get", "one", "also", "two", "will", "since", "parlor", "can", "get", "one", "two", "think", "will", "since", "even", "made"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
#docs <- tm_map(docs, stemDocument)
#Now we get the top 30 words
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 30)

Generate the Word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, scale=c(1.5,0.25), min.freq = 1,
max.words=40, random.order=FALSE, rot.per=0.100,
colors=brewer.pal(8, "Dark2"))

Since we have the list of the most common words, we find the associations, to see what customers say about pizza, or what is good, what is great, and so on…Of course you can pick other words if you like. It looks like pizza is good, and they have great cheese.
findAssocs(dtm, terms = "pizza", corlimit = 0.3)
findAssocs(dtm, terms = "good", corlimit = 0.3)
findAssocs(dtm, terms = "great", corlimit = 0.3)
findAssocs(dtm, terms = "love", corlimit = 0.3)
findAssocs(dtm, terms = "place", corlimit = 0.3)
findAssocs(dtm, terms = "menu", corlimit = 0.3)
findAssocs(dtm, terms = "salad", corlimit = 0.3)
findAssocs(dtm, terms = "service", corlimit = 0.3)
findAssocs(dtm, terms = "cheese", corlimit = 0.3)
findAssocs(dtm, terms = "like", corlimit = 0.3)
findAssocs(dtm, terms = "time", corlimit = 0.3)
findAssocs(dtm, terms = "bar", corlimit = 0.3)
findAssocs(dtm, terms = "well", corlimit = 0.3)
findAssocs(dtm, terms = "cool", corlimit = 0.3)
Here are some, of these, but if you run the code you will see all of them.
$pizza
mushrooms mozzarella inch carolyn fellow heavily
0.74 0.72 0.69 0.63 0.63 0.63
literally melts mouth soft taste added
0.63 0.63 0.63 0.63 0.63 0.62
last eight places times went sausage
0.57 0.56 0.56 0.55 0.55 0.50
smoked finished crust food delicious thing
0.50 0.50 0.49 0.48 0.48 0.45
piece applewood artwork blend chains create
0.45 0.44 0.44 0.44 0.44 0.44
favorite husband leftover magnificent medium omg
0.44 0.44 0.44 0.44 0.44 0.44
possible schreiners sized subway super tastier
0.44 0.44 0.44 0.44 0.44 0.44
third tiling twelve update vintage visited
0.44 0.44 0.44 0.44 0.44 0.44
write yelper figured cheesy crunchy extra
0.44 0.43 0.43 0.43 0.43 0.43
moving got wild time decided great
0.43 0.41 0.40 0.39 0.38 0.38
funghi many checked flavors pizzas half
0.38 0.38 0.37 0.37 0.36 0.34
happy cheeses course toppings amount way
0.34 0.34 0.34 0.34 0.34 0.32
restaurant loved like topped bruschetta bathrooms
0.31 0.31 0.31 0.30 0.30 0.30
discovered oregano phoenix pictures affordable center
0.30 0.30 0.30 0.30 0.30 0.30
$great
way figured inch crust sausage
0.49 0.48 0.48 0.45 0.41
price job food pizza added
0.41 0.41 0.40 0.38 0.37
start plenty checked thin eight
0.36 0.36 0.35 0.35 0.35
places light metro ways savory
0.35 0.35 0.35 0.35 0.35
pizzas coming applewood artwork blend
0.34 0.34 0.34 0.34 0.34
chains create favorite husband leftover
0.34 0.34 0.34 0.34 0.34
magnificent medium omg possible schreiners
0.34 0.34 0.34 0.34 0.34
sized subway super tastier third
0.34 0.34 0.34 0.34 0.34
tiling twelve update vintage visited
0.34 0.34 0.34 0.34 0.34
write awesome cant dang heartbeat
0.34 0.34 0.34 0.34 0.34
mentioned nontraditonal accessibility appetites built
0.34 0.34 0.34 0.34 0.34
char closes combining diners finish
0.34 0.34 0.34 0.34 0.34
gold golden gorgonzola heat imagine
0.34 0.34 0.34 0.34 0.34
leeks limit oven pancetta potato
0.34 0.34 0.34 0.34 0.34
sort sultry waves wind woodburning
0.34 0.34 0.34 0.34 0.34
yang yin yukon aint beef
0.34 0.34 0.34 0.34 0.34
belated betteryea bianco birthday blushed
0.34 0.34 0.34 0.34 0.34
board bomb bookmarks bout bruchetta
0.34 0.34 0.34 0.34 0.34
buck clown coal codensation consider
0.34 0.34 0.34 0.34 0.34
daters deck door establishment firm
0.34 0.34 0.34 0.34 0.34
gonna gooey gotta grabbed grubbed
0.34 0.34 0.34 0.34 0.34
hold introduced japanese jumped kat
0.34 0.34 0.34 0.34 0.34
kit manhatten mins miscarried moozarella
0.34 0.34 0.34 0.34 0.34
none nuts ooey pbr peppers
0.34 0.34 0.34 0.34 0.34
platter pork posted project pure
0.34 0.34 0.34 0.34 0.34
ribbon rip roll rolled sansbooth
0.34 0.34 0.34 0.34 0.34
sapriccio scarce signature snatch soldier
0.34 0.34 0.34 0.34 0.34
spreaded talking thankfully tonight topping
0.34 0.34 0.34 0.34 0.34
toppins virginbitch waitress worse yup
0.34 0.34 0.34 0.34 0.34
time actually got cute like
0.33 0.33 0.33 0.32 0.32
We plot the most frequent words
barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,
col ="lightblue", main ="Most frequent words",
ylab = "Word frequencies")

Summary
According to my analysis, the group of reviewers I have named “top_cool” is made of a subgroup of efficient reviewers. They have provided many comments about other restaurants, and their analysis is mostly aligned with the average stars the website suggested.
The restaurant I have chosen had the highest number of visits from expert reviewers. The analysis of the most frequent word reports that pizza is often mentioned from customers, who associate it with words such as “soft”, “delicious”, “artwork”, “favorite”, “magnificent”, and “cheesy”. If I had to visit this restaurant, I would order one pizza, probably with mushrooms (funghi) or sausage toppings. Calamari seem like another great option (see the $love word associations). Regarding the place and the atmosphere, it seems like clients think décor is awesome. Menu is “sophisticated”, and words such as “fresh” and “salads” are present. In fact, checking the word “salad” we find a lot of details: chickpeas, veggies, avocado, tomatoes, addictive, cabbage, freshest, refreshingly, robust, and other ingredients. Therefore, a salad seems like a good choice at this restaurant. The comments regarding service are various and probably require further investigation regarding reservations, even if service was generally appreciated. Customers also think that bar and bartenders are cool, but sometimes the area is too crowded.
We have chosen “The Parlor” because of the number of reviews, but clients could also pick other restaurants from the list, depending on reviews and the geographical area where they stay.
This is the map showing the Italian business restaurants the top and cool category of reviewers has selected. You can save this file as a webpage.
If you want to try, you can generate the map as follows.
Create the dataset with latitude and longitude
phoenix_map<- italian %>%
select(ID, business_latitude, business_longitude, reviewer_name, reviewer_cool, reviewer_useful, business_stars, stars, business_name, business_city, text) %>%
filter(reviewer_cool >= 1000, reviewer_useful >= 1000, business_stars >=4, business_city == "Phoenix")
View(phoenix_map)
summary(phoenix_map)
phoenix_map
write.csv(phoenix_map, "phoenix_map.csv")
Install “leaflet” package
install.packages("leaflet")
library(leaflet)
#Pull out only business name, latitude, and longitude.
phoen<- phoenix_map %>%
select(business_name, business_latitude, business_longitude)
View(phoen)
summary(phoen)
Let’s give a look if this is what we need.
phoen
write.csv(phoen, "phoen.csv")
We change the column names.
names(phoen)[1] <- "business_name"
names(phoen)[2] <- "latitude"
names(phoen)[3] <- "longitude"
phoen
head(phoen)
write.csv(phoen, "phoen.csv")
library(dplyr)
library(leaflet)
k <- leaflet::leaflet(phoen) %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(~longitude, ~latitude, popup = phoen$ID)
k
If you save the map’s image as web file you obtain the following.

Next part 4 stars and time series.
2 Comments