Filter city, restaurant, and text analysis.

Part 3.

Select city and restaurant. Check how many cities the top_cool data set contains.

table(top_cool$business_city)
summary(top_cool$business_city)

 Length     Class      Mode 
      373 character character 

I imagine I am a tourist visiting a new town and I would like to know which is the best restaurant around. For example, I extract the city of Phoenix from the dataset.

top_cool_phoenix<- italian %>% 
  select(reviewer_name, reviewer_cool, reviewer_useful, business_stars, stars, business_name, business_city, text) %>% 
  filter(reviewer_cool >= 1000, reviewer_useful >= 1000,business_stars >=4, business_city == "Phoenix")
top_cool_phoenix

There are 218 results (208 + 10) and we check the table of the most frequent restaurants in Phoenix according to the top_cool_phoenix dataset. Now we check their names.

table(top_cool_phoenix$business_name)

Some of these restaurants have a lot of reviews. I would like to know more details about “The Parlor” which has 26 reviews. This could be my pick in case I am around that area.

top_cool_parlor <- subset(top_cool_phoenix, business_name == "The Parlor")
top_cool_parlor
write.csv(top_cool_parlor, "top_cool_parlor.csv")

Check mean and median and compare to business stars (4).

mean(top_cool_parlor$stars)
[1] 4.076923
median(top_cool_parlor$stars)
[1] 4

I check the comments, to see whether this will be my pick.
First, I can read all the comments in text columns, to see whether they match with the business_stars variable, which is > than 4 stars. At least, I will get an idea about what expects me.

As an alternative, I can go more in depth and identify frequent keywords.

I create a text file with the text from the text column of top_cool_parlor, then I load the packages:

library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

and I choose the text file I have just created:

text <- readLines(file.choose())

Now I load the data as a corpus

docs <- Corpus(VectorSource(text))
inspect(docs)

Now the text transformation and removal of characters I do not need

We clean the text and remember that we can come back later to add words in “Remove your own stop word”, (I run this part a few times, to make sure I remove the words that are not relevant).

The R code below can be used to clean your text :

Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
docs <- tm_map(docs, removeWords, c("parlor", "just", "can", "get", "one", "also", "two", "will", "since", "parlor", "can", "get", "one", "two", "think", "will", "since", "even", "made"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
#docs <- tm_map(docs, stemDocument)

#Now we get the top 30 words

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 30)
Frequent words

Generate the Word cloud

set.seed(1234)
wordcloud(words = d$word, freq = d$freq, scale=c(1.5,0.25), min.freq = 1,
          max.words=40, random.order=FALSE, rot.per=0.100, 
          colors=brewer.pal(8, "Dark2"))

Since we have the list of the most common words, we find the associations, to see what customers say about pizza, or what is good, what is great, and so on…Of course you can pick other words if you like. It looks like pizza is good, and they have great cheese.

findAssocs(dtm, terms = "pizza", corlimit = 0.3)
findAssocs(dtm, terms = "good", corlimit = 0.3)
findAssocs(dtm, terms = "great", corlimit = 0.3)
findAssocs(dtm, terms = "love", corlimit = 0.3)
findAssocs(dtm, terms = "place", corlimit = 0.3)
findAssocs(dtm, terms = "menu", corlimit = 0.3)
findAssocs(dtm, terms = "salad", corlimit = 0.3)
findAssocs(dtm, terms = "service", corlimit = 0.3)
findAssocs(dtm, terms = "cheese", corlimit = 0.3)
findAssocs(dtm, terms = "like", corlimit = 0.3)
findAssocs(dtm, terms = "time", corlimit = 0.3)
findAssocs(dtm, terms = "bar", corlimit = 0.3)
findAssocs(dtm, terms = "well", corlimit = 0.3)
findAssocs(dtm, terms = "cool", corlimit = 0.3)

Here are some, of these, but if you run the code you will see all of them.

$pizza

  mushrooms  mozzarella        inch     carolyn      fellow     heavily

       0.74        0.72        0.69        0.63        0.63        0.63

  literally       melts       mouth        soft       taste       added

       0.63        0.63        0.63        0.63        0.63        0.62

       last       eight      places       times        went     sausage

       0.57        0.56        0.56        0.55        0.55        0.50

     smoked    finished       crust        food   delicious       thing

       0.50        0.50        0.49        0.48        0.48        0.45

      piece   applewood     artwork       blend      chains      create

       0.45        0.44        0.44        0.44        0.44        0.44

   favorite     husband    leftover magnificent      medium         omg

       0.44        0.44        0.44        0.44        0.44        0.44

   possible  schreiners       sized      subway       super     tastier

       0.44        0.44        0.44        0.44        0.44        0.44

      third      tiling      twelve      update     vintage     visited

       0.44        0.44        0.44        0.44        0.44        0.44

      write      yelper     figured      cheesy     crunchy       extra

       0.44        0.43        0.43        0.43        0.43        0.43

     moving         got        wild        time     decided       great

       0.43        0.41        0.40        0.39        0.38        0.38

     funghi        many     checked     flavors      pizzas        half

       0.38        0.38        0.37        0.37        0.36        0.34

      happy     cheeses      course    toppings      amount         way

       0.34        0.34        0.34        0.34        0.34        0.32

 restaurant       loved        like      topped  bruschetta   bathrooms

       0.31        0.31        0.31        0.30        0.30        0.30

 discovered     oregano     phoenix    pictures  affordable      center

       0.30        0.30        0.30        0.30        0.30        0.30

$great

          way       figured          inch         crust       sausage

         0.49          0.48          0.48          0.45          0.41

        price           job          food         pizza         added

         0.41          0.41          0.40          0.38          0.37

        start        plenty       checked          thin         eight

         0.36          0.36          0.35          0.35          0.35

       places         light         metro          ways        savory

         0.35          0.35          0.35          0.35          0.35

       pizzas        coming     applewood       artwork         blend

         0.34          0.34          0.34          0.34          0.34

       chains        create      favorite       husband      leftover

         0.34          0.34          0.34          0.34          0.34

  magnificent        medium           omg      possible    schreiners

         0.34          0.34          0.34          0.34          0.34

        sized        subway         super       tastier         third

         0.34          0.34          0.34          0.34          0.34

       tiling        twelve        update       vintage       visited

         0.34          0.34          0.34          0.34          0.34

        write       awesome          cant          dang     heartbeat

         0.34          0.34          0.34          0.34          0.34

    mentioned nontraditonal accessibility     appetites         built

         0.34          0.34          0.34          0.34          0.34

         char        closes     combining        diners        finish

         0.34          0.34          0.34          0.34          0.34

         gold        golden    gorgonzola          heat       imagine

         0.34          0.34          0.34          0.34          0.34

        leeks         limit          oven      pancetta        potato

         0.34          0.34          0.34          0.34          0.34

         sort        sultry         waves          wind   woodburning

         0.34          0.34          0.34          0.34          0.34

         yang           yin         yukon          aint          beef

         0.34          0.34          0.34          0.34          0.34

      belated     betteryea        bianco      birthday       blushed

         0.34          0.34          0.34          0.34          0.34

        board          bomb     bookmarks          bout     bruchetta

         0.34          0.34          0.34          0.34          0.34

         buck         clown          coal   codensation      consider

         0.34          0.34          0.34          0.34          0.34

       daters          deck          door establishment          firm

         0.34          0.34          0.34          0.34          0.34

        gonna         gooey         gotta       grabbed       grubbed

         0.34          0.34          0.34          0.34          0.34

         hold    introduced      japanese        jumped           kat

         0.34          0.34          0.34          0.34          0.34

          kit     manhatten          mins    miscarried    moozarella

         0.34          0.34          0.34          0.34          0.34

         none          nuts          ooey           pbr       peppers

         0.34          0.34          0.34          0.34          0.34

      platter          pork        posted       project          pure

         0.34          0.34          0.34          0.34          0.34

       ribbon           rip          roll        rolled     sansbooth

         0.34          0.34          0.34          0.34          0.34

    sapriccio        scarce     signature        snatch       soldier

         0.34          0.34          0.34          0.34          0.34

     spreaded       talking    thankfully       tonight       topping

         0.34          0.34          0.34          0.34          0.34

      toppins   virginbitch      waitress         worse           yup

         0.34          0.34          0.34          0.34          0.34

         time      actually           got          cute          like

         0.33          0.33          0.33          0.32          0.32

We plot the most frequent words

barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,
        col ="lightblue", main ="Most frequent words",
        ylab = "Word frequencies")

Summary

According to my analysis, the group of reviewers I have named “top_cool” is made of a subgroup of efficient reviewers. They have provided many comments about other restaurants, and their analysis is mostly aligned with the average stars the website suggested.

The restaurant I have chosen had the highest number of visits from expert reviewers. The analysis of the most frequent word reports that pizza is often mentioned from customers, who associate it with words such as “soft”, “delicious”, “artwork”, “favorite”, “magnificent”, and “cheesy”. If I had to visit this restaurant, I would order one pizza, probably with mushrooms (funghi) or sausage toppings. Calamari seem like another great option (see the $love word associations). Regarding the place and the atmosphere, it seems like clients think décor is awesome. Menu is “sophisticated”, and words such as “fresh” and “salads” are present. In fact, checking the word “salad” we find a lot of details: chickpeas, veggies, avocado, tomatoes, addictive, cabbage, freshest, refreshingly, robust, and other ingredients. Therefore, a salad seems like a good choice at this restaurant. The comments regarding service are various and probably require further investigation regarding reservations, even if service was generally appreciated. Customers also think that bar and bartenders are cool, but sometimes the area is too crowded.

We have chosen “The Parlor” because of the number of reviews, but clients could also pick other restaurants from the list, depending on reviews and the geographical area where they stay.

This is the map showing the Italian business restaurants the top and cool category of reviewers has selected. You can save this file as a webpage.

If you want to try, you can generate the map as follows.

Create the dataset with latitude and longitude

phoenix_map<- italian %>% 
  select(ID, business_latitude, business_longitude, reviewer_name, reviewer_cool, reviewer_useful, business_stars, stars, business_name, business_city, text) %>% 
  filter(reviewer_cool >= 1000, reviewer_useful >= 1000, business_stars >=4, business_city == "Phoenix")
View(phoenix_map)
summary(phoenix_map)
phoenix_map
write.csv(phoenix_map, "phoenix_map.csv")
Install “leaflet” package
install.packages("leaflet")
library(leaflet)

#Pull out only business name, latitude, and longitude.
phoen<- phoenix_map %>% 
  select(business_name, business_latitude, business_longitude) 
View(phoen)
summary(phoen)

Let’s give a look if this is what we need.

phoen
write.csv(phoen, "phoen.csv")

We change the column names.

names(phoen)[1] <- "business_name"
names(phoen)[2] <- "latitude"
names(phoen)[3] <- "longitude"
phoen
head(phoen)
write.csv(phoen, "phoen.csv")
library(dplyr)
library(leaflet)
k <- leaflet::leaflet(phoen) %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(~longitude, ~latitude, popup = phoen$ID)
k

If you save the map’s image as web file you obtain the following.

Next part 4 stars and time series.

Advertisement

2 Comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s