Editor’s note: Minerva will be speaking on the Ai+ Training platform on December 14th. Be sure to check out the session, “Web Scraping & Social Media Mining for Text Analysis & NLP” for 10% off now.
The city-state of Singapore is a vibrant and culturally diverse economic powerhouse in SE Asia. The city-state has an excellent social scene, especially a very diverse and dynamic food scene. Singapore’s diverse eating-out scene is the result of its multicultural heritage, colonial legacy, and the fact that Singapore is a major financial hub that attracts professionals from across the globe. Hence opening a restaurant in Singapore can arguably be both rewarding and challenging at the same time. With one of the highest population densities in the world resulting in steep real estate prices, the choice and location of the restaurant can make or break the business.
The crux of the business problem was to identify the most suitable restaurant a food entrepreneur could open in Singapore and select the most optimum location for the same. The primary target audience for this report are restaurateurs and food entrepreneurs who are either interested in opening a new restaurant in Singapore or expanding their business. The secondary target audience is investors looking to invest in Singapore’s booming hospitality sphere. In order to support the needs of the target audience, I identified the most common social venues in Singapore as arguably these are already crowded. Then I identified the attributes of the selected restaurant type that the target audience could open (Thai restaurants) by identifying their distribution across the different neighbourhoods of Singapore, average ratings, and their spatial distribution. This assessment will help them identify the nature of the existing Thai restaurants (competitor analysis) and select the most optimal location for opening a new Thai restaurant in Singapore.
Singapore is divided into 5 planning areas- Central, North, East, West and North West. These in turn are divided into several neighbourhoods. I obtained the names of the planning area and their neighbourhoods via Wikipedia by employing the Beautiful Soup package of Python programming language. I plotted the distribution of neighbourhoods within the different planning areas
Number of Neighbourhoods In Each of Singapore’s Planning Regions
The Central region has the highest number of neighbourhoods (20+) while the East has the fewest neighbourhoods. I geocoded the locations of these different neighbourhoods using the GeoPy package. Next, I obtained the details of venues in each neighbourhood namely Venue, Venue Latitude, Venue Longitude, Venue Category using the Foursquare API. Additionally, I also obtained the details of the ratings, tips, and likes (for the Thai restaurants).
My analysis depends on two major aspects- details of Singapore’s planning regions (along with their geolocations) and details of social venues obtained from the Foursquare API. I obtained details of the different Foursquare venues corresponding to the different neighbourhoods. I carried out exploratory data analysis to identify the most common social venues in Singapore and their spatial concentration As a part of the EDA, I extracted the reviews of the most popular existing Thai restaurants in Singapore as a way of scoping out locations to avoid. We can perform one hot encoding on the obtained data set and use it to find the 10 most common venue categories in each neighbourhood. Then clustering can be performed on the dataset. Here K — Nearest Neighbor clustering technique has been used. To find the optimal number of clusters silhouette score metric technique is used. The clusters obtained can be analyzed to find the major type of venue categories in each cluster. This data can be used to suggest business people, suitable locations based on the category.
Exploratory data analysis revealed that the most common venue types in Singapore are hotels and Chinese restaurants. An examination of the spatial distribution of these revealed that while the hotels are clustered in Central Singapore, the Chinese restaurants, apart from being congregated in central Singapore are also distributed across the city-state
Given the popularity of hotels and Chinese restaurants, Thai restaurants are a good option for a new venture. Next, we will perform one hot encoding on the filtered data to obtain the venue categories in each neighbourhood. From this, we extract the top 5 most common venues for each of the neighbourhoods.
Top 5 Most Common Venues For Each of the Neighbourhoods
This dataset can be used for the clustering algorithm. Here, the K-means unsupervised clustering algorithm is used. It is an unsupervised machine learning technique that clusters the given data into a K number of clusters. For an optimal result, we need to select the best value for K. Here, the silhouette score is used to find the best value for K. A range of values from 2 to 10 was considered, k-means clustering was performed on the dataset and the silhouette score was calculated and plotted on a line plot as shown in the figure.
Silhouette Score for different Number of Clusters
From the plot, we can see that a K value of 6 provides the best score. This K value is used for the K-Means Clustering Technique.
Additionally, I obtained the data pertaining to additional details relating to Thai restaurants such as Likes and Tips with the view of identifying the neighbourhoods with popular Thai restaurants. The results of the k-means clustering provided the spatial distribution of clusters of Thai restaurants, including the areas with the highest number of clusters.
Clusters of Thai Restaurants (Singapore)
The 5th cluster just had 1 Thai restaurant while clusters 1 and 2 had the highest number of Thai restaurants.
I also obtained the ratings and likes data pertaining to Thai restaurants in the different regions and neighbourhoods. The most popular Thai restaurants were located in the Central and East planning regions
Average Ratings of Thai Restaurants By Planning Region
While cluster 1 and cluster 2 have the maximum number of restaurants, cluster 4 and cluster 3 have Thai restaurants among their most frequent locations. Cluster 5 has the lowest number of restaurants. So, areas such as Holland Park which has the lowest number of Thai restaurants can be considered as a viable area for opening a Thai restaurant owing to lower competition. Planning areas such as Changi and Serangoon road which have a high concentration of Thai restaurants can have a higher competition for a newer Thai establishment. Additionally, both the Central and East Planning region has the most popular Thai restaurants. These too should be avoided. While I only considered Foursquare data, other datasets, such as those relating to property prices and transport links can be included to better pinpoint optimal localities.
More on Minerva’s session, “Web Scraping & Social Media Mining for Text Analysis & NLP“: This course provides a foundation to carry out PRACTICAL, real-life social media mining. By taking this course, you are taking an important step forward in your data science journey to become an expert in harnessing the power of social media for deriving insights and identifying trends. This course will help you gain fluency both in the different aspects of text analysis and NLP working through a real-life example of cryptocurrency tweets, Wall Street Bet Reddit posts, restaurant reviews, and financial news using a powerful clouded based python environment called GoogleColab.
About the author/Ai+ Training Speaker: Minerva Singh, PhD
I joined the Center for Environmental Policy (CEP), Imperial College London as a Research Fellow in 2018. Before that, I completed a PhD from the University of Cambridge in 2017 where I focussed on implementing data science techniques for quantifying the impact of forest loss on tropical ecosystems. I hold an MPhil (School of Geography and Environment) and an MSc (Department of Engineering) from Oxford University. I have nearly 10 years’ experience in conducting academic research at the interface of tropical ecology, data science, earth observation (EO), and artificial intelligence (AI) and published 14 first-author peer-reviewed papers in international journals since 2013 including PLoS One. I am also a best selling course-instructor on the online MOOC platform Udemy where I provide online teaching to more than 71,000 students from across the world on machine learning, earth observation, and deep learning-related topics.
Article originally posted here. Reposted with permission.