{"id":271,"date":"2022-04-23T17:30:00","date_gmt":"2022-04-23T09:30:00","guid":{"rendered":"https:\/\/philip.twinight.co\/portfolio\/?p=271"},"modified":"2024-03-05T17:49:35","modified_gmt":"2024-03-05T09:49:35","slug":"data-exploration-on-airbnb-hk","status":"publish","type":"post","link":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/","title":{"rendered":"Data Exploration on Airbnb HK"},"content":{"rendered":"\n<p>This is an individual project of SDSC3014 &#8211; Introduction to Sharing Economy. I did the project in my year 2 2021\/22 Semester B.<\/p>\n\n\n\n<p><strong>Presentation Slides:<\/strong><\/p>\n\n\n\n<style>\n    .ppv_container.de_6a04ebb708f0a {\n        width: 100%;\n        height: 700px;\n        display: flex;\n        flex-direction: column;\n        position: relative;\n    }\n\n    @media (max-width: 991px) {\n        .ppv_container.de_6a04ebb708f0a {\n            width: 100%;\n            height: 700px;\n        }\n    }\n\n    @media (max-width: 767px) {\n        .ppv_container.de_6a04ebb708f0a {\n            width: 100%;\n            height: 700px;\n        }\n    }\n\n    .de_6a04ebb708f0a.document-preview {\n        width: 100%;\n        flex: 1;\n        display: flex;\n        flex-direction: column;\n    }\n\n    .de_6a04ebb708f0aiframe {\n        width: 100%;\n        height: 100%;\n        flex: 1;\n        border: none;\n    }\n\n    .ppv-loading {\n        width: inherit;\n        position: absolute;\n        top: 50%;\n        left: 0;\n        font-family: sans-serif;\n        color: #666;\n        z-index: 1;\n        display: flex;\n        justify-content: center;\n    }\n<\/style>\n<div class=\"ppv_container de_6a04ebb708f0a\">\n\n        <div class=\"ppv-loading\">PDF Loading...<\/div>\n    <div class=\"document-preview\" style=\"height: inherit; width: inherit;\">\n        <iframe style=\"width: 100%; height: 100%;\" src=\"\/\/docs.google.com\/gview?embedded=true&#038;url=https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/final-present-sdsc3014.pdf\" frameborder=\"0\"><\/iframe>\n    <\/div>\n    <\/div>\n\n\n\n\n\n<p>Course Instructor: Prof.\u00a0KE Qing<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#1_Introduction\" >1. Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#2_Method\" >2. Method<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#21_Data_Scraping\" >2.1 Data Scraping<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#22_Data_Import\" >2.2 Data Import<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#23_Data_Cleaning\" >2.3 Data Cleaning<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#3_Result\" >3. Result<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#31_Data_Visualization\" >3.1 Data Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#32_Text_Sentiment_Analysis\" >3.2 Text Sentiment Analysis<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#4_Limitation\" >4. Limitation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#5_Future_research_and_Conclusion\" >5. Future research and Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#6_References\" >6. References<\/a><\/li><\/ul><\/nav><\/div>\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_Introduction\"><\/span><a id=\"post-271-_Toc919487765\"><\/a><a id=\"post-271-_Toc961097119\"><\/a><a id=\"post-271-_Toc1975674503\"><\/a><a id=\"post-271-_Toc101627398\"><\/a>1. Introduction<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p>Although I have finished studying the entire course of SDSC3014, I as a student of data science should keep improving ourselves by reviewing what I have learnt. So taken the chance of this project, I decided to combine all the things from the tutorial to finish a data exploration on Airbnb Hong Kong. So that I can grab this chance to play with different useful libraries and familiar with them.<\/p>\n\n\n\n<p>Airbnb is one of the giants of the sharing accommodation platform. It arises gives a big revolution on the traditional property management industry and it is always one of the best examples of business models on sharing economy.<\/p>\n\n\n\n<p>This report proposes to analyze the Airbnb Hong Kong, I will analyze the listings data provided from Inside Airbnb, I will do some pre-processing first then extract some different insights from it. Apart from that, I will scrap the tweets from Twitter that tagged #Airbnb and try to conduct a sentiment analysis on the dataset I got.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_Method\"><\/span>2. Method<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"21_Data_Scraping\"><\/span><a id=\"post-271-_Toc101627400\"><\/a>2.1 Data Scraping<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=2132120485  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"1398\" height=\"688\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-1.png\" alt=\"\" class=\"wp-image-275\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1398\/h:688\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-1.png 1398w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:148\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-1.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:504\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-1.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:378\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-1.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>As I have mentioned from the introduction, I have scraped 5000+ data from Twitter by using Facepager 4.4. It is the dataset that I will use to do the text analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"22_Data_Import\"><\/span><a id=\"post-271-_Toc1002525859\"><\/a><a id=\"post-271-_Toc241230117\"><\/a><a id=\"post-271-_Toc101627401\"><\/a>2.2 Data Import<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1435213566  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"317\" height=\"329\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-2.png\" alt=\"\" class=\"wp-image-276\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:317\/h:329\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-2.png 317w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:289\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-2.png 289w\" sizes=\"auto, (max-width: 317px) 100vw, 317px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=327716934  loading=\"lazy\" decoding=\"async\" width=\"487\" height=\"24\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-3.png\" alt=\"\" class=\"wp-image-277\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:487\/h:24\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-3.png 487w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:15\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-3.png 300w\" sizes=\"auto, (max-width: 487px) 100vw, 487px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=104472898  loading=\"lazy\" decoding=\"async\" width=\"313\" height=\"22\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-4.png\" alt=\"\" class=\"wp-image-278\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:313\/h:22\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-4.png 313w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:21\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-4.png 300w\" sizes=\"auto, (max-width: 313px) 100vw, 313px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1937114813  loading=\"lazy\" decoding=\"async\" width=\"307\" height=\"19\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-5.png\" alt=\"\" class=\"wp-image-279\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:307\/h:19\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-5.png 307w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:19\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-5.png 300w\" sizes=\"auto, (max-width: 307px) 100vw, 307px\" \/><\/figure>\n\n\n\n<p>First, I have to import all the library I needed to use later. I will use Pandas, Geopandas, Numpy to do the Data manipulation. Folium, Seaborn, Matplotlib, LinearColormap, WordCloud will be used to do the Data Visualization. And NLTK, TextBlob to do the Natural Language Processing. There will be mainly four datasets in total. One is from the tweets.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=2129893023  loading=\"lazy\" decoding=\"async\" width=\"1353\" height=\"506\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-6.png\" alt=\"\" class=\"wp-image-280\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1353\/h:506\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-6.png 1353w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:112\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-6.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:383\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-6.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:287\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-6.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>And these are the other three dataset, they are all come from Insideairbnb.com. I will use them to do the Data Visualization part and Text Analytics as well. The first 2 datasets are used for finding insight from the dataset by plotting different graph and conduct Text Analytics on the Airbnb listing name and description. The Geojson data file is used for plotting folium map.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1155353634  loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"527\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-7.png\" alt=\"\" class=\"wp-image-281\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1000\/h:527\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-7.png 1000w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:158\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-7.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:405\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-7.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>In more detail, dataset 1 is the summary information and metrics for listings of Airbnb in Hong Kong (Used for visualizations).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1980449878  loading=\"lazy\" decoding=\"async\" width=\"997\" height=\"567\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-8.png\" alt=\"\" class=\"wp-image-282\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:997\/h:567\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-8.png 997w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:171\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-8.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:437\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-8.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>And another dataset 2, I will only focus on this two columns, name and description, which will be used to generate word cloud and apply text analytics on them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"23_Data_Cleaning\"><\/span><a id=\"post-271-_Toc101627402\"><\/a>2.3 Data Cleaning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=372884251  loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"442\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-9.png\" alt=\"\" class=\"wp-image-283\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:800\/h:442\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-9.png 800w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:166\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-9.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:424\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-9.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1153415845  loading=\"lazy\" decoding=\"async\" width=\"796\" height=\"368\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-10.png\" alt=\"\" class=\"wp-image-284\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:796\/h:368\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-10.png 796w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:139\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-10.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:355\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-10.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>The dataset for visualization, I have removed some of the column that full of Null value and dropped some columns as well as I am not going to use most of them.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=947856792  loading=\"lazy\" decoding=\"async\" width=\"503\" height=\"730\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-11.png\" alt=\"\" class=\"wp-image-285\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:503\/h:730\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-11.png 503w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:207\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-11.png 207w\" sizes=\"auto, (max-width: 503px) 100vw, 503px\" \/><\/figure>\n\n\n\n<p>I also built a new dataframe that only contain neighborhood info so that it is easier to find out which neighborhood have the higher Airbnb average listing price in Hong Kong later. And I sorted it for later plotting use as well.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=39390932  loading=\"lazy\" decoding=\"async\" width=\"997\" height=\"437\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-12.png\" alt=\"\" class=\"wp-image-286\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:997\/h:437\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-12.png 997w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:131\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-12.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:337\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-12.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=947993625  loading=\"lazy\" decoding=\"async\" width=\"348\" height=\"388\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-13.png\" alt=\"\" class=\"wp-image-287\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:348\/h:388\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-13.png 348w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:269\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-13.png 269w\" sizes=\"auto, (max-width: 348px) 100vw, 348px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1196978042  loading=\"lazy\" decoding=\"async\" width=\"519\" height=\"294\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-14.png\" alt=\"\" class=\"wp-image-288\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:519\/h:294\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-14.png 519w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:170\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-14.png 300w\" sizes=\"auto, (max-width: 519px) 100vw, 519px\" \/><\/figure>\n\n\n\n<p>And the last dataset, the twitter one, I dropped the first row of it and rows that contain Null value. I only extract the text column out which are the tweets content. I cleaned the whole text by removing RT, Punctuation etc. by using lambda function. RT is the retweet sign generated by Twitter.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_Result\"><\/span><a id=\"post-271-_Toc101627403\"><\/a>3. Result<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"31_Data_Visualization\"><\/span><a id=\"post-271-_Toc1733441444\"><\/a><a id=\"post-271-_Toc1563838205\"><\/a><a id=\"post-271-_Toc101627404\"><\/a>3.1 Data Visualization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1315047581  loading=\"lazy\" decoding=\"async\" width=\"994\" height=\"412\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-15.png\" alt=\"\" class=\"wp-image-289\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:994\/h:412\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-15.png 994w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:124\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-15.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:318\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-15.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>In this part, I will share some cool facts I found out from the dataset by using Seaborn. First, I tried to plot the Airbnb Price Count but it doesn\u2019t look well. Obviously, there are some outliers with very high price, so we better take the logarithm of prices to avoid plotting this kind of skewed graph.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1410172167  loading=\"lazy\" decoding=\"async\" width=\"995\" height=\"425\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-16.png\" alt=\"\" class=\"wp-image-290\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:995\/h:425\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-16.png 995w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:128\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-16.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:328\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-16.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>Now, it should look way more towards the normal distribution.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=151287100  loading=\"lazy\" decoding=\"async\" width=\"1004\" height=\"612\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-17.png\" alt=\"\" class=\"wp-image-291\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1004\/h:612\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-17.png 1004w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:183\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-17.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:468\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-17.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>I also think of the second method to deal with this kind of problem. We can simply remove the outliers by using the standard deviation value, which is 2189 here. After removing the outliers, we can conclude that the range of Airbnb price in Hong Kong should be around 200HKD-800HKD.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=828705159  loading=\"lazy\" decoding=\"async\" width=\"769\" height=\"421\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-18.png\" alt=\"\" class=\"wp-image-292\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:769\/h:421\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-18.png 769w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:164\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-18.png 300w\" sizes=\"auto, (max-width: 769px) 100vw, 769px\" \/><\/figure>\n\n\n\n<p>I have also tried to find out whether any relationship between number of bed and price, but it seems that the sample size of larger number of beds (10-16) are clearly not enough, so we can&#8217;t really make a conclusion from this graph.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1285833752  loading=\"lazy\" decoding=\"async\" width=\"1378\" height=\"421\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-19.png\" alt=\"\" class=\"wp-image-293\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1378\/h:421\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-19.png 1378w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:92\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-19.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:313\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-19.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:235\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-19.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>From this graph, we can see that in Hong Kong Airbnb listings, most of them are renting either entire apartment or private rooms, only few are renting shared room and hotel room.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=552252631  loading=\"lazy\" decoding=\"async\" width=\"1114\" height=\"562\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-20.png\" alt=\"\" class=\"wp-image-294\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1114\/h:562\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-20.png 1114w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:151\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-20.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:517\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-20.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:387\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-20.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>From this catplot, it shows us that the shared room is relatively cheaper in price and entire apartments are the most expensive type of listing which is totally reasonable in Hong Kong.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1435468735  loading=\"lazy\" decoding=\"async\" width=\"1378\" height=\"421\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-21.png\" alt=\"\" class=\"wp-image-295\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1378\/h:421\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-21.png 1378w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:92\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-21.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:313\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-21.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:235\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-21.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>This countplot shows us that most of the Airbnb listings are came from Yau Tsim Mong and Wan Chai. We can see that Yau Tsim Mong district has over 2000 available listings in Airbnb.<a id=\"post-271-_Toc936017656\"><\/a><a id=\"post-271-_Toc1353339171\"><\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=93943123  loading=\"lazy\" decoding=\"async\" width=\"1064\" height=\"600\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-22.gif\" alt=\"\" class=\"wp-image-296\"\/><\/figure>\n\n\n\n<p>Now, I want to apply some skills from the tutorial lessons by using Folium and Geopandas library. In practice, I would generate a map by using the dataframe we just prepared before in order to display the mean price of listing from different neighborhood areas. This one is the original Folium map captured from tutorial 2 Jupyter Notebook directly. Nothing was edited yet.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1195375363  loading=\"lazy\" decoding=\"async\" width=\"908\" height=\"406\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-23.png\" alt=\"\" class=\"wp-image-297\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:908\/h:406\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-23.png 908w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:134\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-23.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:343\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-23.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>Now, I have changed the original code by Professor Qing in order to fulfill my current needs. You can see that I have used linear colormap library here so that when the mean price is higher in a neighborhood area, the color will be deeper too.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1076010998  loading=\"lazy\" decoding=\"async\" width=\"1068\" height=\"606\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-24.gif\" alt=\"\" class=\"wp-image-298\"\/><\/figure>\n\n\n\n<p>This is the result. We can clearly see that Tuen Mun and Tsuen Wan are the two highest average listing price districts in Hong Kong.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1595663744  loading=\"lazy\" decoding=\"async\" width=\"533\" height=\"88\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-25.png\" alt=\"\" class=\"wp-image-299\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:533\/h:88\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-25.png 533w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:50\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-25.png 300w\" sizes=\"auto, (max-width: 533px) 100vw, 533px\" \/><\/figure>\n\n\n\n<p>Now, I want to find out how many listings per each neighborhood area and plot them directly in the Folium map. I have used a plugin called FastMarkerCluster to fulfill my needs. FMC allows us to plot the number of listings interactively on the map.<\/p>\n\n\n\n<p>This is the result; you can see that those Airbnb listings are plotted into dynamic bubbles. Clearly, Kowloon Yau Tsim Mong won this competition as this area got the highest number of bubbles among Hong Kong.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"32_Text_Sentiment_Analysis\"><\/span><a id=\"post-271-_Toc101627405\"><\/a>3.2 Text Sentiment Analysis<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1158541871  loading=\"lazy\" decoding=\"async\" width=\"590\" height=\"541\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/textblob-simplified-text-processing-textblob-0-.png\" alt=\"TextBlob: Simplified Text Processing \u2014 TextBlob 0.16.0 documentation\" class=\"wp-image-300\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:590\/h:541\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/textblob-simplified-text-processing-textblob-0-.png 590w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:275\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/textblob-simplified-text-processing-textblob-0-.png 300w\" sizes=\"auto, (max-width: 590px) 100vw, 590px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=40944429  loading=\"lazy\" decoding=\"async\" width=\"316\" height=\"344\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/nlppython-nltk-clay-technology-w.png\" alt=\"NLP][Python] \u82f1\u6587\u81ea\u7136\u8a9e\u8a00\u8655\u7406\u7684\u7d93\u5178\u5de5\u5177NLTK - Clay-Technology World\" class=\"wp-image-301\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:316\/h:344\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/nlppython-nltk-clay-technology-w.png 316w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:276\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/nlppython-nltk-clay-technology-w.png 276w\" sizes=\"auto, (max-width: 316px) 100vw, 316px\" \/><\/figure>\n\n\n\n<p>You may have heard of TextBlob before, it is very famous in NLP analysis as NLTK too. And it is built on top of NLTK too, we can use and process the text in a few lines of code by using TextBlob. Both of them can help us easier to generate simple sentiment result without training any model.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1061581543  loading=\"lazy\" decoding=\"async\" width=\"763\" height=\"686\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-29.png\" alt=\"\" class=\"wp-image-302\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:763\/h:686\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-29.png 763w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:270\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-29.png 300w\" sizes=\"auto, (max-width: 763px) 100vw, 763px\" \/><\/figure>\n\n\n\n<p>It helps me to generate the sentiment result by using its polarity score of TextBlob. Polarity score only lies between -1 to 1, -1 defines a negative sentiment and 1 defines a positive sentiment.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=868484815  loading=\"lazy\" decoding=\"async\" width=\"432\" height=\"323\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-30.png\" alt=\"\" class=\"wp-image-303\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:432\/h:323\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-30.png 432w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:224\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-30.png 300w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/figure>\n\n\n\n<p>So, from this graph, we can see that most of the tweets are in neutral or positive sentiment.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=2015588525  loading=\"lazy\" decoding=\"async\" width=\"876\" height=\"173\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-31.png\" alt=\"\" class=\"wp-image-304\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:876\/h:173\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-31.png 876w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:59\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-31.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:152\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-31.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>I also used the code from tutorial 10 to extract the positive and negative adjective words from the tweets I scraped.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=14862674  loading=\"lazy\" decoding=\"async\" width=\"362\" height=\"199\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-32.png\" alt=\"\" class=\"wp-image-305\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:362\/h:199\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-32.png 362w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:165\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-32.png 300w\" sizes=\"auto, (max-width: 362px) 100vw, 362px\" \/><\/figure>\n\n\n\n<p>This is the result of the wordcloud generated from the tweets dataset, I will explain it more in the limitation part.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=4931167  loading=\"lazy\" decoding=\"async\" width=\"1178\" height=\"497\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-33.png\" alt=\"\" class=\"wp-image-306\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1178\/h:497\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-33.png 1178w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:127\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-33.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:432\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-33.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:324\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-33.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>I also plotted a bar chart to show which word appeared the most in the name of Airbnb listings. And they are room, studio, mtr etc.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=402186287  loading=\"lazy\" decoding=\"async\" width=\"1378\" height=\"554\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-34.png\" alt=\"\" class=\"wp-image-307\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1378\/h:554\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-34.png 1378w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:121\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-34.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:412\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-34.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:309\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-271-34.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p>So, in order to build a more successful and beautiful wordcloud, I used the description of Airbnb listing as the input data. We can clearly see that space, apartment, walk or Hong Kong are some of the most used words. That\u2019s the word cloud wallpaper of Airbnb in Hong Kong version I built.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_Limitation\"><\/span><a id=\"post-271-_Toc101627406\"><\/a>4. Limitation<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p>Through the example of failed wordcloud from the scraped tweets given in Part 3.2, I can conclude that the experiment was failed mainly due to the low quality of the input. It can be simply described as a term called Garbage in, Garbage out. The reason behind that is most of the tweets I scraped from Twitter that tagged #Airbnb are spamming or in a different language so NLTK can\u2019t recognize them properly. I have tried that with different translator library of Python. But all failed in the end due to large amount of data that have to be translated. Their server simply disconnected while I was trying to do that.<\/p>\n\n\n\n<p>The second limitation of why I have to use the Twitter platform instead of Facebook, it is because Airbnb Hong Kong not really active in Facebook and all of them are interacted in Cantonese which should be even more hard to translate. Apart form that, Facebook graph API got bunch of restrictions and I failed to get the data I want from them. So, I have to use the third-party tool called Facepager to do web crawling on Twitter instead of Facebook as Twitter&#8217;s API is more open compared to Facebook one due to more privacy content in Facebook.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"5_Future_research_and_Conclusion\"><\/span><a id=\"post-271-_Toc101627407\"><\/a>5. Future research and Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p>In future study, I may consider learn and build more experience on web crawling as it is still a brand-new topic for me. Also, I have thought of another solution to solve the problem of translation. I think I can try to translate it outside Python so that I do not have to rely on the translation server of those library\/package that used for translation. Another solution is I can simply scrap more tweets from Twitter that contains the word \u2018Airbnb\u2019 instead of hashtag \u2018#Airbnb\u2019, then filter out only English content so that the dataset is more readable to NLTK\/TextBlob.<\/p>\n\n\n\n<p>Last but not least, the purpose and the learning objectives of this project is to let myself be more familiar with different useful libraries that we have learnt before and apply most of the skills from the tutorial of this course to finish the data exploratory task.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"6_References\"><\/span><a id=\"post-271-_Toc101627408\"><\/a>6. References<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p>Tumer Kabadayi, E., Cavdar Aksoy, N., Yazici, N., &amp; Kocak Alan, A. (2021). Airbnb as a sharing economy-enabled digital service platform: The power of motivational factors and the moderating role of experience. Tourism Economics.<\/p>\n\n\n\n<p><a href=\"https:\/\/doi.org\/10.1177\/13548166211044606\">https:\/\/doi.org\/10.1177\/13548166211044606<\/a><\/p>\n\n\n\n<p>Shahul ES. (2021, December 3). Sentiment analysis in Python: TextBlob vs Vader sentiment vs flair vs building it from scratch. neptune.ai.<\/p>\n\n\n\n<p><a href=\"https:\/\/neptune.ai\/blog\/sentiment-analysis-python-textblob-vs-vader-vs-flair\">https:\/\/neptune.ai\/blog\/sentiment-analysis-python-textblob-vs-vader-vs-flair<\/a><\/p>\n\n\n\n<p>Jayson DeLancey. (2020, May 29). NLTK and machine learning for sentiment analysis. CodeProject.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.codeproject.com\/Articles\/5269448\/NLTK-and-Machine-Learning-for-Sentiment-Analysis\">https:\/\/www.codeproject.com\/Articles\/5269448\/NLTK-and-Machine-Learning-for-Sentiment-Analysis<\/a><\/p>\n\n\n\n<p>Akash. (2021, October 9). Making natural language processing easy with TextBlob. Analytics Vidhya.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2021\/10\/making-natural-language-processing-easy-with-textblob\/\">https:\/\/www.analyticsvidhya.com\/blog\/2021\/10\/making-natural-language-processing-easy-with-textblob\/<\/a><\/p>\n\n\n\n<p>SumedhKadam. (2021, July 5). Generating word cloud in Python. GeeksforGeeks. <a href=\"https:\/\/www.geeksforgeeks.org\/generating-word-cloud-python\/\">https:\/\/www.geeksforgeeks.org\/generating-word-cloud-python\/<\/a><\/p>\n\n\n\n<p>Zaxliu. (2015, November 3). How to display Chinese in matplotlib plot. Stack Overflow. <a href=\"https:\/\/stackoverflow.com\/questions\/21307832\/how-to-display-chinese-in-matplotlib-plot\">https:\/\/stackoverflow.com\/questions\/21307832\/how-to-display-chinese-in-matplotlib-plot<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is an individual project of SDSC3014 &#8211; Introduction to Sharing Economy. I did the project in my year 2 2021\/22 Semester B. Presentation Slides: Course Instructor: Prof.\u00a0KE Qing 1. &hellip; <a href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/\" class=\"more-link\"><span>Continue reading<span class=\"screen-reader-text\">Data Exploration on Airbnb HK<\/span><\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":315,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[73,3],"tags":[25,13,30,29,21],"class_list":["post-271","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analysis","category-proj","tag-2021-22-semester-b","tag-data-science","tag-introduction-to-sharing-economy","tag-sdsc3014","tag-year-2"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary\" \/>\n<meta property=\"og:description\" content=\"This is an individual project of SDSC3014 &#8211; Introduction to Sharing Economy. I did the project in my year 2 2021\/22 Semester B. Presentation Slides: Course Instructor: Prof.\u00a0KE Qing 1. &hellip; Continue readingData Exploration on Airbnb HK\" \/>\n<meta property=\"og:url\" content=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/\" \/>\n<meta property=\"og:site_name\" content=\"Philip\u2019s Data Science Diary\" \/>\n<meta property=\"article:published_time\" content=\"2022-04-23T09:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-05T09:49:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Philip\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Philip\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/\"},\"author\":{\"name\":\"Philip\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"headline\":\"Data Exploration on Airbnb HK\",\"datePublished\":\"2022-04-23T09:30:00+00:00\",\"dateModified\":\"2024-03-05T09:49:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/\"},\"wordCount\":1733,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2022\\/04\\/Data-Exploration-on-Airbnb-HK.png\",\"keywords\":[\"2021\\\/22 Semester B\",\"Data Science\",\"Introduction to Sharing Economy\",\"SDSC3014\",\"Year 2\"],\"articleSection\":[\"Data Analysis\",\"Projects\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/\",\"name\":\"Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2022\\/04\\/Data-Exploration-on-Airbnb-HK.png\",\"datePublished\":\"2022-04-23T09:30:00+00:00\",\"dateModified\":\"2024-03-05T09:49:35+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#primaryimage\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2022\\/04\\/Data-Exploration-on-Airbnb-HK.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2022\\/04\\/Data-Exploration-on-Airbnb-HK.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2022\\\/04\\\/23\\\/data-exploration-on-airbnb-hk\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9801\",\"item\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Exploration on Airbnb HK\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\",\"name\":\"Philip\u2019s University Data Science Journey\",\"description\":\"Navigating Data Science: From Classroom to Career\",\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\",\"name\":\"Philip\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"width\":16,\"height\":16,\"caption\":\"Philip\"},\"logo\":{\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\"},\"description\":\"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.\",\"sameAs\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\"],\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/author\\\/philip\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/","og_locale":"en_GB","og_type":"article","og_title":"Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary","og_description":"This is an individual project of SDSC3014 &#8211; Introduction to Sharing Economy. I did the project in my year 2 2021\/22 Semester B. Presentation Slides: Course Instructor: Prof.\u00a0KE Qing 1. &hellip; Continue readingData Exploration on Airbnb HK","og_url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/","og_site_name":"Philip\u2019s Data Science Diary","article_published_time":"2022-04-23T09:30:00+00:00","article_modified_time":"2024-03-05T09:49:35+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png","type":"image\/png"}],"author":"Philip","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Philip","Estimated reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#article","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/"},"author":{"name":"Philip","@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"headline":"Data Exploration on Airbnb HK","datePublished":"2022-04-23T09:30:00+00:00","dateModified":"2024-03-05T09:49:35+00:00","mainEntityOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/"},"wordCount":1733,"commentCount":0,"publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png","keywords":["2021\/22 Semester B","Data Science","Introduction to Sharing Economy","SDSC3014","Year 2"],"articleSection":["Data Analysis","Projects"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/","url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/","name":"Data Exploration on Airbnb HK - Philip\u2019s Data Science Diary","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#primaryimage"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png","datePublished":"2022-04-23T09:30:00+00:00","dateModified":"2024-03-05T09:49:35+00:00","breadcrumb":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#primaryimage","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2022\/04\/Data-Exploration-on-Airbnb-HK.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2022\/04\/23\/data-exploration-on-airbnb-hk\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9801","item":"https:\/\/philip.twinight.co\/portfolio\/"},{"@type":"ListItem","position":2,"name":"Data Exploration on Airbnb HK"}]},{"@type":"WebSite","@id":"https:\/\/philip.twinight.co\/portfolio\/#website","url":"https:\/\/philip.twinight.co\/portfolio\/","name":"Philip\u2019s University Data Science Journey","description":"Navigating Data Science: From Classroom to Career","publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/philip.twinight.co\/portfolio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":["Person","Organization"],"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414","name":"Philip","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","width":16,"height":16,"caption":"Philip"},"logo":{"@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png"},"description":"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.","sameAs":["https:\/\/philip.twinight.co\/portfolio"],"url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/author\/philip\/"}]}},"_links":{"self":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/271","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/comments?post=271"}],"version-history":[{"count":5,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/271\/revisions"}],"predecessor-version":[{"id":311,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/271\/revisions\/311"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media\/315"}],"wp:attachment":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media?parent=271"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/categories?post=271"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/tags?post=271"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}