{"id":385,"date":"2023-04-16T22:17:00","date_gmt":"2023-04-16T14:17:00","guid":{"rendered":"https:\/\/philip.twinight.co\/portfolio\/?p=385"},"modified":"2024-03-06T23:06:28","modified_gmt":"2024-03-06T15:06:28","slug":"car-price-prediction","status":"publish","type":"post","link":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/","title":{"rendered":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models"},"content":{"rendered":"\n<p>This is the group project of SDSC2102 \u2013 Statistical Methods and Data Analysis. I did the project in my year 3 2022\/23 Semester B.<\/p>\n\n\n\n<p><strong>Presentation Slides:<\/strong><\/p>\n\n\n<div class=\"ose-google-docs ose-uid-475fc18086be38ac8cc775b12d093314 ose-embedpress-responsive\" style=\"width:600px; height:550px; max-height:550px; max-width:100%; display:inline-block;\" data-embed-type=\"GoogleDocs\"><iframe loading=\"lazy\" allowFullScreen=\"true\" src=\"https:\/\/docs.google.com\/presentation\/d\/e\/2PACX-1vRWnvbmd9hXVUjZrSCxvEQb66mQlPF6BdudxO1NzWHVyWLBfxsabd10yXPzhyd4uQ\/embed?start=false&#038;loop=false&#038;delayms=3000\" frameborder=\"0\" width=\"600\" height=\"550\" allowfullscreen=\"true\" mozallowfullscreen=\"true\" webkitallowfullscreen=\"true\"><\/iframe><\/div>\n\n\n\n<p>Course Instructor: Prof. ZENG Li<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#1_Introduction_Background_and_Problem_Formulation\" >1. Introduction (Background and Problem Formulation)<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#11_Background\" >1.1 Background<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#12_Objectives\" >1.2 Objectives<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#13_Target\" >1.3 Target<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#14_Methodology\" >1.4 Methodology<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#2_Data_Preprocessing\" >2. Data Preprocessing<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#21_Importing_Dataset_and_Handling_Missing_Values\" >2.1 Importing Dataset and Handling Missing Values<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#22_Data_Cleaning\" >2.2 Data Cleaning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#23_Data_Visualization\" >2.3 Data Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#24_Feature_Selection\" >2.4 Feature Selection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#25_Handling_Categorical_Variables\" >2.5 Handling Categorical Variables<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#26_Creating_New_Variable_and_Classifying_the_Car_Companies\" >2.6 Creating New Variable and Classifying the Car Companies<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#27_Finalizing_the_Dataset_for_Prediction\" >2.7 Finalizing the Dataset for Prediction<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#3_Modeling\" >3. Modeling<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#31_Evaluation_Metrics\" >3.1 Evaluation Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#32_Linear_Multilinear_Regression\" >3.2 Linear &amp; Multilinear Regression<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#321_Linear_Regression_Model\" >3.2.1 Linear Regression Model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#322_Hypothesis_Testing\" >3.2.2 Hypothesis Testing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#323_Results_of_Linear_Regression\" >3.2.3 Results of Linear Regression<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#324_Results_of_Multiple_Linear_Regression\" >3.2.4 Results of Multiple Linear Regression<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#33_Random_Forest_Regression\" >3.3 Random Forest Regression<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#34_K-Nearest_Neighbor_Regression\" >3.4 K-Nearest Neighbor Regression<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#4_Interpretation_and_Reflection\" >4. Interpretation and Reflection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#Appendix\" >Appendix<\/a><\/li><\/ul><\/nav><\/div>\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_Introduction_Background_and_Problem_Formulation\"><\/span><a id=\"post-385-_j4dptvaibvz0\"><\/a>1. Introduction (Background and Problem Formulation)<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"11_Background\"><\/span><a id=\"post-385-_di5k2nli5d87\"><\/a>1.1 Background<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>A Chinese automobile company planned to penetrate the US market by building a factory within the country to produce cars locally. In order to achieve higher competitiveness against its US and European counterparts, the company intends to adapt its car designs and business strategies to meet specific price levels.  <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"12_Objectives\"><\/span><a id=\"post-385-_i4a4qjaki73e\"><\/a>1.2 Objectives<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>One of the key objectives is to predict reasonable prices to make their products more appealing to consumers in the US market. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"13_Target\"><\/span><a id=\"post-385-_xdmj0mge5n4\"><\/a>1.3 Target<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The primary goal of this analysis is to understand the factors influencing car pricing in the US market. By identifying these factors, the company can make informed decisions about car designs and pricing strategies. Additionally, this study aims to compare the performance of three regression models, including Linear Regression, Random Forest Regression, and KNN Regression, to determine the best model for predicting suitable car prices for the car company. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"14_Methodology\"><\/span><a id=\"post-385-_yjrr4la4ovz5\"><\/a>1.4 Methodology<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The investigation will employ three modeling techniques: Linear Regression, Random Forest Regression, and KNN Regression, to identify the factors influencing car pricing in the US market and predict reasonable prices for the company&#8217;s products. Kaggle, Tableau, and Python will be the primary tools for this data analysis and interpretation. <\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_Data_Preprocessing\"><\/span><a id=\"post-385-_mkg9zbthaz3n\"><\/a>2. Data Preprocessing<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"21_Importing_Dataset_and_Handling_Missing_Values\"><\/span><a id=\"post-385-_if8kj151133a\"><\/a>2.1 Importing Dataset and Handling Missing Values<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To begin the data preprocessing, we first imported the dataset using pandas and checked for any missing values. In this case, there were no missing values found in the dataset. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"22_Data_Cleaning\"><\/span><a id=\"post-385-_53bm8pbcw1cc\"><\/a>2.2 Data Cleaning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In this step, we modified the column name \u201cCarName\u201d&#8217; to \u201cCompanyName\u201d and ensured all company name typos were corrected. After that, we checked for any duplicate entries not found in this dataset either. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"23_Data_Visualization\"><\/span><a id=\"post-385-_jnp9mrc65cme\"><\/a>2.3 Data Visualization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We created distribution and box plots for car prices to understand the data better. We saw that most car prices are around $10,000 (Appendix 1). Next, we plotted a correlation heatmap for 14 numerical variables, including \u201cprice,\u201d to identify the top 10 attributes with the highest correlation (Appendix 2). We saw that variables like \u201cenginesize\u201d positively correlate with price (0.87), while \u201ccitympg\u201d  and \u201cprice\u201d are negatively correlated. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"24_Feature_Selection\"><\/span><a id=\"post-385-_jcxgpy4pectm\"><\/a>2.4 Feature Selection<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Based on the correlation heatmap, we selected those six features to be further analyzed. Four features positively correlate with price (\u201cenginesize,\u201d \u201ccurbweight,\u201d \u201chorsepower,\u201d \u201ccarwidth\u201d), and two features negatively correlate with price (\u201ccitympg,\u201d \u201chighwaympg\u201d). We then plot the variation of car prices against these six selected features for better visualization (Appendix 3). <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"25_Handling_Categorical_Variables\"><\/span><a id=\"post-385-_pywrhekkuty\"><\/a>2.5 Handling Categorical Variables<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We further plotted some graphs on the dataset, making us notice too many categorical items (Appendix 4). They were not convenient for modeling if we did not take action to tackle them. So in order to solve the problems, we handled those dummy variables by hot-encoding categorical columns into the numerical column using the Pandas\u2019 <em>get_dummies<\/em> function. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"26_Creating_New_Variable_and_Classifying_the_Car_Companies\"><\/span><a id=\"post-385-_9o4v0na9bte0\"><\/a>2.6 Creating New Variable and Classifying the Car Companies<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>We created a new variable, \u201cfueleconomy,\u201d to represent the fuel efficiency of each car. We also grouped the car companies based on the average prices of each company, categorizing them as \u201cBudget,\u201d \u201cMedium,\u201d or \u201cHighend\u201d. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"27_Finalizing_the_Dataset_for_Prediction\"><\/span><a id=\"post-385-_hadodqcun67f\"><\/a>2.7 Finalizing the Dataset for Prediction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Finally, we created a dataset for prediction, which includes only the important variables identified in the previous steps. At the end of the preprocessing steps, we have a clean and structured dataset ready for later modeling and prediction tasks (Appendix 5). <\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_Modeling\"><\/span><a id=\"post-385-_h6vm2hclmj44\"><\/a>3.<em> <\/em>Modeling<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"31_Evaluation_Metrics\"><\/span><a id=\"post-385-_rbxwaarbxjis\"><\/a>3.1 Evaluation Metrics<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The metrics we used to evaluate the performance of our models are R-squared (R\u00b2), Adjusted R-squared, Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Error Percentage (MAPE).  <\/p>\n\n\n\n<p>The R-squared (R\u00b2) metric quantifies the amount of variation in the dependent variable that is accounted for by the independent variables. The R\u00b2 value ranges from 0 to 1, with 1 indicating a perfect fit. Generally, a higher R\u00b2 value is more favorable, although other considerations such as model complexity and sample size need to be considered as well. <\/p>\n\n\n\n<p>The Adj R\u00b2 is a modified form of R\u00b2 such that it adjusts for the number of independent variables in the model. The Adj R\u00b2 ranges from -\u221e to 1, with 1 indicating a perfect fit. In general, R\u00b2 does not decrease when a new variable is added to the model, however, the Adj R\u00b2 increases only when the newly added independent variables improve the model fit more than would be predicted by chance alone, and it decreases when the new independent variables do not add sufficient improvement into the model fit. Hence, Adj R\u00b2 is a better metric than R\u00b2 for comparing models with different numbers of independent variables. <\/p>\n\n\n\n<p>The Mean Squared Error (MSE) assesses the average squared difference between the observed and predicted car prices. The MSE is a risk function that is expressed in the squared units of the dependent variable. MSE measures the quality of an estimator. <\/p>\n\n\n\n<p>The Mean Absolute Error (MAE) is a metric that calculates the average absolute difference between the observed and predicted car prices. The MAE is measured in the same units as the dependent variable. <\/p>\n\n\n\n<p>Mean Absolute Percentage Error (MAPE) measures the average percentage difference between the observed and predicted car prices. MAPE represents the magnitude of error towards the actual value, which is expressed in percentage. <\/p>\n\n\n\n<p>MSE, MAE, and MAPE value shows a better model if their value is low since it indicates that the model&#8217;s car price predictions are closer to the actual car prices. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"32_Linear_Multilinear_Regression\"><\/span><a id=\"post-385-_wbyfq3sw1isp\"><\/a>3.2 Linear &amp; Multilinear Regression<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"321_Linear_Regression_Model\"><\/span><a id=\"post-385-_c49a6lish988\"><\/a>3.2.1 Linear Regression Model<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Ordinary least squares (OLS) Linear Regression is a method that fits a linear model by determining coefficients w = (w1, \u2026, wp) in order to minimize the discrepancy between the actual target values in the dataset and the values predicted by the linear equation. This approach seeks to reduce squared residuals, representing the differences between observed and predicted targets. Many benefits can be obtained by using the linear regression model from the <em>sklearn.linear_model.LinearRegression<\/em>, such as parameters optimization and building predictive multiple linear models.  <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"322_Hypothesis_Testing\"><\/span><a id=\"post-385-_kc2yyogdzwss\"><\/a>3.2.2 Hypothesis Testing<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In order to examine the relationship between the dependent variable (Price) and the independent variable (Factors), a hypothesis test was conducted. The significance level was set at 0.05, with the null hypothesis (H0) stating that there is no relationship between Price (Y) and Factors (X) and the alternative hypothesis (H1) asserting that a relationship exists between Y and X. If the p-value is less than 0.05, the null hypothesis will be rejected, leading to the conclusion that a relationship exists between Y and X. Additionally, the R-squared value will be assessed to determine the strength of the effect of Factors (X) on Price (Y) within the context of this report. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"323_Results_of_Linear_Regression\"><\/span><a id=\"post-385-_yss7rju6rpq4\"><\/a>3.2.3 Results of Linear Regression<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>At the outset, we established linear regression models with all independent variables (X), such as \u201cenginesize,\u201d \u201chorsepower,\u201d \u201ccurbweight,\u201d and other independent variables, with price as the dependent variable (Y). The resulting graph provides an overview of the outcomes (Appendix 6). <\/p>\n\n\n\n<p>To conduct the analysis, we initially identified the factors with a p-value greater than 0.05, indicating no correlation with price. As illustrated in Appendix 7, variables from \u201ccarheight\u201d to \u201ccompression ratio\u201d demonstrate no association with the price. Consequently, we rejected them and focused solely on the remaining factors presenting a p-value less than 0.05. As depicted in Appendix 8, variables ranging from \u201cenginesize\u201d to rear-wheel drive (RWD) correlate with price. Appendix 9 demonstrates that \u201cfour\u201d car doors and \u201cfueleconomy\u201d were negatively correlated with price, while Appendix 10 reveals that the remaining factors positively correlate with price. After comparing the R-squared values, the linear regression model identified nine attributes with the highest absolute correlation with price. The p-values for all variables are less than 0.05, indicating that all variables are statistically significant in predicting car prices. This suggests that including these variables in the model is appropriate as they each contribute to the model&#8217;s ability to predict car prices. <\/p>\n\n\n\n<p>The variables with the highest R-squared values and lowest p-values are \u201cenginesize,\u201d \u201ccurbweight,\u201d \u201chorsepower,\u201d and \u201chighend.\u201d These variables are likely the most important predictors of car price according to this model. Other variables such as \u201ccarwidth,\u201d \u201cfour,\u201d \u201cfueleconomy,\u201d \u201ccarlength,\u201d and \u201cRWD\u201d have weaker relationships with car prices but are still statistically significant. Overall, this model appears to be effective in car price prediction by using a combination of variables that are statistically significant. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"324_Results_of_Multiple_Linear_Regression\"><\/span><a id=\"post-385-_7v9q94ivtmzy\"><\/a>3.2.4 Results of Multiple Linear Regression<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Multiple linear regression is a technique to analyze the linear relationship between two or more independent variables and one continuous dependent variable. The equation takes the form of<strong> y = b<sub>o <\/sub>+ b<sub>1<\/sub>x<sub>1 <\/sub>+ b<sub>2<\/sub>x<sub>2 <\/sub>+ \u2026 + b<sub>n<\/sub>x<sub>n<\/sub><\/strong>, where y is the dependent variable, x<sub>1<\/sub>, x<sub>2<\/sub>, \u2026, x<sub>n<\/sub> are the independent variables, and b<sub>0,<\/sub> b<sub>1<\/sub>, b<sub>2<\/sub>, \u2026, b<sub>n<\/sub> are the coefficients that represent the slope of the line for each independent variable. This model combines nine significant variables found during the individual linear regression into one equation to predict the car prices (response variable).<strong> <\/strong>Through the model fitting, it is found that \u201cHighend\u201d is the largest coefficient, indicating that it has the most significant impact in this car price prediction model. <\/p>\n\n\n\n<p>Appendix 13 illustrates the scatter plot of the actual and predicted prices by this model. Overall, the results in Appendix 12 indicate that the multiple linear regression model shows good predictive abilities, with some variations in performance across the training, cross-validation, and testing sets. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"33_Random_Forest_Regression\"><\/span><a id=\"post-385-_xfmvz51c2emu\"><\/a>3.3 Random Forest Regression<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>A random forest acts as a composite predictor, employing multiple decision trees, each trained on distinct data subsets, to boost prediction accuracy and combat overfitting. When bootstrap is set to True (default), the max_samples parameter dictates the size of each subgroup; otherwise, the model uses the entire dataset to build each tree.    <br>Before obtaining the results, the model parameters were optimized using GridSearchCV with a 5-fold cross-validation method. The current model employs 61 estimators and requires a minimum of 6 sample splits. Using 61 estimators, the model incorporates 61 separate decision tree regressions, each utilizing varying bootstrapped samples. Lastly, other hyperparameters remain at the default of the model.<strong><br> <\/strong>The results from the model are finalized after averaging the results from each tree. The study divided the dataset using a 5-fold cross-validation method, creating separate training and testing sets. We employed an out-of-box (OOB) approach for predictions, which involved making predictions on data points not used in constructing the individual decision trees within the random forest model.   <br>Meanwhile, the other general evaluation metrics of the Random Forest Regression model are presented in Appendix 14. The testing set shows a good result with the adjusted R\u00b2 value of 0.8861. Overall, these results indicate that the random forest model demonstrates strong predictive capabilities and generalizes well to new data compared to Multiple Linear Regression.  <\/p>\n\n\n\n<p>Appendix 15 demonstrates the comparison of actual and predicted price values done by the model. We can see that more data are below $25,000. We believed that in order for the model to perform better, we need more sources from higher prices, especially the one that ranges above $30,000. <\/p>\n\n\n\n<p>Appendix 16 represents the Feature Importance done by Gini Importance, which shows that the most impactful variable is the \u201cenginesize,\u201d which makes sense as the engine is the most important part of the car. Cars can not run without the engine, and another reason would be that the different car capabilities, including speed and lifespan, mostly depends on the \u201cengine size.\u201d <\/p>\n\n\n\n<p>Lastly, Appendix 17 represents the visualization of the tree plot, which is limited by depth 2. In reality, the tree on the left has a max depth of 9, and total nodes of 67, and the tree on the right has a max depth of 10 and total nodes of 69. Appendix 18 shows the histograms to visualize the distribution of the number of nodes, max depth, and the number of leaves. The distribution was created based on the different individual regression trees inside the random forest based on the bootstrapping method. Results show that bootstrapping takes a huge role to make the trees more diverse and apply the training dataset to help with the prediction. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"34_K-Nearest_Neighbor_Regression\"><\/span><a id=\"post-385-_3ejq6xrn3igo\"><\/a>3.4 K-Nearest Neighbor Regression<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>K-Nearest Neighbor Regression is a type of instance-based learning to predict numerical values of new data points by evaluating its \u2018k\u2019 neighbors to find similarities using the <em>KNeighborsRegressor()<\/em> function from the <em>sklearn<\/em> library. This means a new point is assigned a value based on how closely it resembles the points in the training set. In this model, <em>n_neighbor <\/em>= 4 is used as a parameter, acquired from comparing R\u00b2 from different k. Appendix 20 depicts the scatterplot of actual and predicted prices using KNN regression.    <br>Appendix 19 describes the general evaluation metrics used in the KNN Regression model.<strong> <\/strong>Overall, these results indicate that this model can predict the car price effectively, with only slight variations in performance between training, testing, and cross-validation sets. Additionally, the accuracy of KNN regression is higher than the multiple linear regression model. Appendix 21 visualizes the contour plot of KNN for the 2 most significant attributes, engine size and curb weight with the warmer color of the plot indicating a higher price. <\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_Interpretation_and_Reflection\"><\/span><a id=\"post-385-_z6rvrto59v41\"><\/a>4. Interpretation and Reflection<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p>The analysis compared the performance of three different regression models: Multiple Linear Regression, Random Forest Regression, and K-Nearest Neighbor Regression. All models showed slight overfitting problems, which is shown by the notably higher accuracy on the training set compared to the testing and validation sets. This problem might be caused by the small sample size and lack of diversity in the data, leading to difficulties in generalization and adaptability to new data. Based on the Adjusted R\u00b2 scores and the mean absolute error (MAE) values, the Random Forest Regression model emerged as the best-performing model, with an Adjusted R\u00b2 of 0.8861 and an MAE of 1795.45. This indicates that the Random Forest model is better at explaining the variance in the data and has a lower average absolute difference between predicted and actual values compared to the other models.    <br>Furthermore, the Random Forest Regression model provided feature importance scores, which revealed the top three most influential features as Engine Size (0.578), Curb Weight (0.192), and HorsePower (0.031). These scores demonstrate the relative contribution of each feature to the prediction across all decision trees in the forest. Consequently, it can be concluded that the Random Forest Regression model not only performs better in terms of prediction accuracy but also offers insights into the most significant features driving the predictions.   <br>Throughout this project, we successfully predicted car prices using various regression models, achieving a high level of accuracy. This analysis provides valuable insights for car companies, dealerships, and buyers in determining the fair market value of current car prices. It also assists the Chinese car company in understanding the factors that influence car pricing, enabling them to become more competitive in the US market.    <br>In conclusion, this study showcases the practical application of the statistical methods learned in the SDSC2102 course. By working with a real-world dataset, we were able to demonstrate the power of these techniques in solving complex problems and generating valuable insights for businesses and consumers alike. <\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Appendix\"><\/span><a id=\"post-385-_z7lxv1hwla3f\"><\/a>Appendix<span class=\"ez-toc-section-end\"><\/span><\/h1>\n\n\n\n<p><strong>Python code used throughout this project &#8211; <\/strong><a href=\"https:\/\/portland-my.sharepoint.com\/:u:\/g\/personal\/ahansviar2-c_my_cityu_edu_hk\/Efi7cOKvvcFOnKcXrAy4QI4BUU6fbzHXll0TO7uEkMOwfg?e=5qZMon\"><strong>2102 project.ipynb<\/strong><\/a>\n<\/p>\n\n\n\n<p><strong>Appendix 1 &#8211; Distribution and box plot for car price<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=290892063  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"1656\" height=\"699\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png\" alt=\"\" class=\"wp-image-387\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1656\/h:699\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png 1656w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:127\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:432\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:324\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1536\/h:648\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-2.png 1536w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 2 &#8211; Correlation heatmap for most of the variables<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=225472595  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"1286\" height=\"771\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-3.png\" alt=\"\" class=\"wp-image-388\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1286\/h:771\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-3.png 1286w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:180\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-3.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:614\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-3.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:460\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-3.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 3 &#8211; The variation of car price vs selected features<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1996231596  loading=\"lazy\" decoding=\"async\" width=\"855\" height=\"911\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-4.png\" alt=\"\" class=\"wp-image-389\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:855\/h:911\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-4.png 855w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:282\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-4.png 282w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:818\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-4.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 4 &#8211; Others graph made with categorical items only<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=105268601  loading=\"lazy\" decoding=\"async\" width=\"2008\" height=\"616\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png\" alt=\"\" class=\"wp-image-390\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1920\/h:589\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png 2008w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:92\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:314\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:236\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1536\/h:471\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-5.png 1536w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 5 &#8211; Encoded categorical columns with one hot-encoding method<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1216516428  loading=\"lazy\" decoding=\"async\" width=\"990\" height=\"169\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-6.png\" alt=\"\" class=\"wp-image-391\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:990\/h:169\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-6.png 990w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:51\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-6.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:131\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-6.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 6 &#8211; Overall results of the regression model<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1355815528  loading=\"lazy\" decoding=\"async\" width=\"1178\" height=\"722\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-7.png\" alt=\"\" class=\"wp-image-392\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1178\/h:722\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-7.png 1178w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:184\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-7.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:628\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-7.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:471\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-7.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 7 &#8211; Linear regression with no correlation factors<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=916969435  loading=\"lazy\" decoding=\"async\" width=\"1852\" height=\"1112\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png\" alt=\"\" class=\"wp-image-393\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1798\/h:1080\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png 1852w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:180\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:615\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:461\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1536\/h:922\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-8.png 1536w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 8 &#8211; Linear regression of correlated factors with price<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=532194169  loading=\"lazy\" decoding=\"async\" width=\"873\" height=\"504\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-9.png\" alt=\"\" class=\"wp-image-394\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:873\/h:504\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-9.png 873w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:173\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-9.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:443\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-9.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 9 &#8211; Linear regression of factors with negative correlation<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=772626547  loading=\"lazy\" decoding=\"async\" width=\"1458\" height=\"950\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-10.png\" alt=\"\" class=\"wp-image-395\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1458\/h:950\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-10.png 1458w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:195\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-10.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:667\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-10.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:500\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-10.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 10 &#8211; Linear regression of factors with positive correlation<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=660669908  loading=\"lazy\" decoding=\"async\" width=\"1472\" height=\"982\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-11.png\" alt=\"\" class=\"wp-image-396\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1472\/h:982\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-11.png 1472w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:200\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-11.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:683\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-11.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:512\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-11.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 11 &#8211; 9 Attributes with The Highest Absolute Correlation with Price<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=712428666  loading=\"lazy\" decoding=\"async\" width=\"627\" height=\"448\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-12.png\" alt=\"\" class=\"wp-image-397\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:627\/h:448\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-12.png 627w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:214\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-12.png 300w\" sizes=\"auto, (max-width: 627px) 100vw, 627px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 12 &#8211; Multiple Linear Regression Evaluation Metrics<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><p>\n  Evaluation Metrics\n<\/p><\/th><th><p>\n  Training Set\n<\/p><\/th><th><p>\n  Testing Set\n<\/p><\/th><th><p>\n  5-Fold Cross Validation Set\n<\/p><\/th><\/tr><tr><th><p>\n  R\u00b2\n<\/p><\/th><th><p>\n  0.9303\n<\/p><\/th><th><p>\n  0.8448\n<\/p><\/th><th><p>\n  0.9096\n<\/p><\/th><\/tr><tr><th><p>\n  Adjusted R\u00b2 \n<\/p><\/th><th><p>\n  0.9262\n<\/p><\/th><th><p>\n  0.8358\n<\/p><\/th><th><p>\n  0.9043\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Square Error (MSE)\n<\/p><\/th><th><p>\n  4,181,836.2920\n<\/p><\/th><th><p>\n  12,011,794.4727\n<\/p><\/th><th><p>\n  5,252,029.2859\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Error (MAE)\n<\/p><\/th><th><p>\n  1,475.8022\n<\/p><\/th><th><p>\n  2,387.4288\n<\/p><\/th><th><p>\n  1,642.2066\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Percentage Error (MAPE)\n<\/p><\/th><th><p>\n  11.0414%\n<\/p><\/th><th><p>\n  16.6957%\n<\/p><\/th><th><p>\n  12.1913%\n<\/p><\/th><\/tr><\/thead><\/table><\/figure>\n\n\n\n<p><strong>Appendix 13 &#8211; Multiple Linear Regression Scatter Plot of Actual vs Predicted Prices <\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=126290575  loading=\"lazy\" decoding=\"async\" width=\"790\" height=\"475\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-13.png\" alt=\"\" class=\"wp-image-398\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:790\/h:475\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-13.png 790w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:180\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-13.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:462\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-13.png 768w\" sizes=\"auto, (max-width: 790px) 100vw, 790px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 14 &#8211; Random Forest Regression Evaluation Metrics<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><p>\n  Evaluation Metrics\n<\/p><\/th><th><p>\n  Training Set\n<\/p><\/th><th><p>\n  Out-of-Box\n<\/p><\/th><th><p>\n  5-fold Cross Validation Set\n<\/p><\/th><th><p>\n  Testing Set\n<\/p><\/th><\/tr><tr><th><p>\n  R\u00b2\n<\/p><\/th><th><p>\n  0.9808\n<\/p><\/th><th><p>\n  0.9365\n<\/p><\/th><th><p>\n  0.9198\n<\/p><\/th><th><p>\n  0.9117\n<\/p><\/th><\/tr><tr><th><p>\n  Adjusted R\u00b2\n<\/p><\/th><th><p>\n  0.9797\n<\/p><\/th><th><p>\n  0.9328\n<\/p><\/th><th><p>\n  0.9181\n<\/p><\/th><th><p>\n  0.8861\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Square Error (MSE)\n<\/p><\/th><th><p>\n  1,150,714.4524\n<\/p><\/th><th><p>\n  3,756,219.2979\n<\/p><\/th><th><p>\n  4,566,369.2124\n<\/p><\/th><th><p>\n  6,835,687.8587\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Error (MAE)\n<\/p><\/th><th><p>\n  765.6409\n<\/p><\/th><th><p>\n  1,421.0788\n<\/p><\/th><th><p>\n  1,509.2703\n<\/p><\/th><th><p>\n  1,795.4514\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Percentage Error (MAPE)\n<\/p><\/th><th><p>\n  5.7060%\n<\/p><\/th><th><p>\n  10.7757%\n<\/p><\/th><th><p>\n  11.3621%\n<\/p><\/th><th><p>\n  14.0521%\n<\/p><\/th><\/tr><\/thead><\/table><\/figure>\n\n\n\n<p><strong>Appendix 15 &#8211; Random Forest Scatter Plot of Actual vs Predicted Prices <\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=14465090  loading=\"lazy\" decoding=\"async\" width=\"666\" height=\"550\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-14.png\" alt=\"\" class=\"wp-image-399\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:666\/h:550\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-14.png 666w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:248\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-14.png 300w\" sizes=\"auto, (max-width: 666px) 100vw, 666px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 16 &#8211; Random Forest Feature Importance<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=789479826  loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"837\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png\" alt=\"\" class=\"wp-image-400\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:837\/h:837\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 837w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:150\/h:150\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 150w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:768\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:640\/h:640\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 640w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:90\/h:90\/q:mauto\/f:best\/ig:avif\/dpr:2\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 90w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:50\/h:50\/q:mauto\/f:best\/ig:avif\/dpr:2\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-15.png 50w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 17 &#8211; Random Forest Individual Decision Tree Visualization<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1595370560  loading=\"lazy\" decoding=\"async\" width=\"1957\" height=\"1560\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png\" alt=\"\" class=\"wp-image-401\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1354\/h:1080\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png 1957w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:239\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:816\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:612\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1355\/h:1080\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-16.png 1536w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1831781275  loading=\"lazy\" decoding=\"async\" width=\"1957\" height=\"1560\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png\" alt=\"\" class=\"wp-image-402\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1354\/h:1080\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png 1957w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:239\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:816\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:612\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1355\/h:1080\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-17.png 1536w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 18 &#8211; Random Forest Bootstrapping Statistics<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=24546052  loading=\"lazy\" decoding=\"async\" width=\"566\" height=\"457\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-18.png\" alt=\"\" class=\"wp-image-403\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:566\/h:457\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-18.png 566w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:242\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-18.png 300w\" sizes=\"auto, (max-width: 566px) 100vw, 566px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=197342699  loading=\"lazy\" decoding=\"async\" width=\"566\" height=\"457\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-19.png\" alt=\"\" class=\"wp-image-404\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:566\/h:457\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-19.png 566w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:242\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-19.png 300w\" sizes=\"auto, (max-width: 566px) 100vw, 566px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=1822204105  loading=\"lazy\" decoding=\"async\" width=\"566\" height=\"457\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-20.png\" alt=\"\" class=\"wp-image-405\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:566\/h:457\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-20.png 566w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:242\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-20.png 300w\" sizes=\"auto, (max-width: 566px) 100vw, 566px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 19 &#8211; K-Nearest Neighbor Regression Evaluation Metrics<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><p>\n  Evaluation Metrics\n<\/p><\/th><th><p>\n  Training Set\n<\/p><\/th><th><p>\n  Testing Set\n<\/p><\/th><th><p>\n  5-Fold Cross Validation Set\n<\/p><\/th><\/tr><tr><th><p>\n  R\u00b2\n<\/p><\/th><th><p>\n  0.9481\n<\/p><\/th><th><p>\n  0.8898\n<\/p><\/th><th><p>\n  0.8985\n<\/p><\/th><\/tr><tr><th><p>\n  R\u00b2 Adjusted\n<\/p><\/th><th><p>\n  0.9451\n<\/p><\/th><th><p>\n  0.8833\n<\/p><\/th><th><p>\n  0.8925\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Square Error (MSE)\n<\/p><\/th><th><p>\n  3,115,841.6219\n<\/p><\/th><th><p>\n  8,528,717.5061\n<\/p><\/th><th><p>\n  6,207,117.3859\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Error (MAE)\n<\/p><\/th><th><p>\n  1,205.7317\n<\/p><\/th><th><p>\n  1,991.5976\n<\/p><\/th><th><p>\n  1,684.7972\n<\/p><\/th><\/tr><tr><th><p>\n  Mean Absolute Percentage Error (MAPE)\n<\/p><\/th><th><p>\n  9.2461%\n<\/p><\/th><th><p>\n  14.4534%\n<\/p><\/th><th><p>\n  12.4250%\n<\/p><\/th><\/tr><\/thead><\/table><\/figure>\n\n\n\n<p><strong>Appendix 20 &#8211; KNN Regression Scatter Plot of Actual vs Predicted Values<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=114644212  loading=\"lazy\" decoding=\"async\" width=\"867\" height=\"473\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-21.png\" alt=\"\" class=\"wp-image-406\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:867\/h:473\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-21.png 867w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:164\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-21.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:419\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-21.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n\n<p><strong>Appendix 21 &#8211; KNN Contour Plot<\/strong>\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-opt-id=2096524878  loading=\"lazy\" decoding=\"async\" width=\"846\" height=\"550\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-22.png\" alt=\"\" class=\"wp-image-407\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:846\/h:550\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-22.png 846w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:195\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-22.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:499\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/word-image-385-22.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>This is the group project of SDSC2102 \u2013 Statistical Methods and Data Analysis. I did the project in my year 3 2022\/23 Semester B. Presentation Slides: Course Instructor: Prof. ZENG &hellip; <a href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/\" class=\"more-link\"><span>Continue reading<span class=\"screen-reader-text\">Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models<\/span><\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":410,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[72,3],"tags":[39,13,38,37,34],"class_list":["post-385","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-proj","tag-2022-23-semester-b","tag-data-science","tag-sdsc2102","tag-statistical-methods-and-data-analysis","tag-year-3"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary\" \/>\n<meta property=\"og:description\" content=\"This is the group project of SDSC2102 \u2013 Statistical Methods and Data Analysis. I did the project in my year 3 2022\/23 Semester B. Presentation Slides: Course Instructor: Prof. ZENG &hellip; Continue readingPredictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models\" \/>\n<meta property=\"og:url\" content=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/\" \/>\n<meta property=\"og:site_name\" content=\"Philip\u2019s Data Science Diary\" \/>\n<meta property=\"article:published_time\" content=\"2023-04-16T14:17:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-06T15:06:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Philip\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Philip\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/\"},\"author\":{\"name\":\"Philip\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"headline\":\"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models\",\"datePublished\":\"2023-04-16T14:17:00+00:00\",\"dateModified\":\"2024-03-06T15:06:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/\"},\"wordCount\":2760,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2023\\/04\\/Comparative-Analysis-of-Car-Price-Prediction-Models.png\",\"keywords\":[\"2022\\\/23 Semester B\",\"Data Science\",\"SDSC2102\",\"Statistical Methods and Data Analysis\",\"Year 3\"],\"articleSection\":[\"Machine Learning\",\"Projects\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/\",\"name\":\"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2023\\/04\\/Comparative-Analysis-of-Car-Price-Prediction-Models.png\",\"datePublished\":\"2023-04-16T14:17:00+00:00\",\"dateModified\":\"2024-03-06T15:06:28+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#primaryimage\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2023\\/04\\/Comparative-Analysis-of-Car-Price-Prediction-Models.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2023\\/04\\/Comparative-Analysis-of-Car-Price-Prediction-Models.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2023\\\/04\\\/16\\\/car-price-prediction\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9801\",\"item\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\",\"name\":\"Philip\u2019s University Data Science Journey\",\"description\":\"Navigating Data Science: From Classroom to Career\",\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\",\"name\":\"Philip\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"width\":16,\"height\":16,\"caption\":\"Philip\"},\"logo\":{\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\"},\"description\":\"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.\",\"sameAs\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\"],\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/author\\\/philip\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/","og_locale":"en_GB","og_type":"article","og_title":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary","og_description":"This is the group project of SDSC2102 \u2013 Statistical Methods and Data Analysis. I did the project in my year 3 2022\/23 Semester B. Presentation Slides: Course Instructor: Prof. ZENG &hellip; Continue readingPredictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models","og_url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/","og_site_name":"Philip\u2019s Data Science Diary","article_published_time":"2023-04-16T14:17:00+00:00","article_modified_time":"2024-03-06T15:06:28+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png","type":"image\/png"}],"author":"Philip","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Philip","Estimated reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#article","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/"},"author":{"name":"Philip","@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"headline":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models","datePublished":"2023-04-16T14:17:00+00:00","dateModified":"2024-03-06T15:06:28+00:00","mainEntityOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/"},"wordCount":2760,"commentCount":0,"publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png","keywords":["2022\/23 Semester B","Data Science","SDSC2102","Statistical Methods and Data Analysis","Year 3"],"articleSection":["Machine Learning","Projects"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/","url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/","name":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models - Philip\u2019s Data Science Diary","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#primaryimage"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png","datePublished":"2023-04-16T14:17:00+00:00","dateModified":"2024-03-06T15:06:28+00:00","breadcrumb":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#primaryimage","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2023\/04\/Comparative-Analysis-of-Car-Price-Prediction-Models.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2023\/04\/16\/car-price-prediction\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9801","item":"https:\/\/philip.twinight.co\/portfolio\/"},{"@type":"ListItem","position":2,"name":"Predictive Analysis of Car Prices in the US Market: A Comparative Study of Regression Models"}]},{"@type":"WebSite","@id":"https:\/\/philip.twinight.co\/portfolio\/#website","url":"https:\/\/philip.twinight.co\/portfolio\/","name":"Philip\u2019s University Data Science Journey","description":"Navigating Data Science: From Classroom to Career","publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/philip.twinight.co\/portfolio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":["Person","Organization"],"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414","name":"Philip","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","width":16,"height":16,"caption":"Philip"},"logo":{"@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png"},"description":"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.","sameAs":["https:\/\/philip.twinight.co\/portfolio"],"url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/author\/philip\/"}]}},"_links":{"self":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/comments?post=385"}],"version-history":[{"count":2,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/385\/revisions"}],"predecessor-version":[{"id":411,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/385\/revisions\/411"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media\/410"}],"wp:attachment":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media?parent=385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/categories?post=385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/tags?post=385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}