{"id":173,"date":"2021-11-28T17:22:00","date_gmt":"2021-11-28T09:22:00","guid":{"rendered":"https:\/\/philip.twinight.co\/portfolio\/?p=173"},"modified":"2024-03-06T09:57:48","modified_gmt":"2024-03-06T01:57:48","slug":"credit-card-fraud-detection-analysis-a-machine-learning-approach","status":"publish","type":"post","link":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/","title":{"rendered":"Credit Card Fraud Detection Analysis: A Machine Learning Approach"},"content":{"rendered":"\n<p>This is an individual project of SDSC2001 &#8211; Python for Data Science. I did the project in my year 2 2021\/22 Semester A.<\/p>\n\n\n\n<p>Course Instructor: Professor LI Xinyue<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Context\" >Context<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Content\" >Content<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Module_1_Data_Exploration\" >Module 1: Data Exploration<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Module_2_Data_Visualization\" >Module 2: Data Visualization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Module_3_Dimension_Reduction\" >Module 3: Dimension Reduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Module_4_Classification\" >Module 4: Classification<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#41_Training_with_RandomForestClassifier\" >4.1 Training with RandomForestClassifier<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#42_Training_with_LogisticRegression\" >4.2 Training with LogisticRegression<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#43_Training_with_Support_Vector_Machine\" >4.3 Training with Support Vector Machine<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#Module_5_Summary\" >Module 5: Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h4 class=\"wp-block-heading\" id=\"Context\"><span class=\"ez-toc-section\" id=\"Context\"><\/span>Context<a href=\"http:\/\/localhost:8888\/notebooks\/Downloads\/project.ipynb#Context\"><\/a><span class=\"ez-toc-section-end\"><\/span><\/h4>\n\n\n\n<p>Credit card companies aim to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"Content\"><span class=\"ez-toc-section\" id=\"Content\"><\/span>Content<a href=\"http:\/\/localhost:8888\/notebooks\/Downloads\/project.ipynb#Content\"><\/a><span class=\"ez-toc-section-end\"><\/span><\/h4>\n\n\n\n<p>The dataset contains transactions made by credit cards in September 2013 by european cardholders. Transactions occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, as the positive class (frauds) account for 0.172% of all transactions.<\/p>\n\n\n\n<p>It contains numerical input variables V1-V28 which are the result of a Principal Component Analysis (PCA) transformation, as original features are not provided due to confidentiality issues. Features that have not been transformed with PCA are &#8216;Time&#8217; and &#8216;Amount&#8217;. &#8216;Time&#8217; contains the seconds elapsed between each transaction and the first transaction in the dataset. &#8216;Amount&#8217; denotes the transaction Amount. &#8216;Class&#8217; is the response variable (labelled outcome) and it takes value 1 in case of fraud and 0 otherwise.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Module_1_Data_Exploration\"><\/span>Module 1: Data Exploration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;1]: \n# define all library that I may need to use \nimport matplotlib.pyplot as plt \nimport numpy as np \nimport pandas as pd \nimport seaborn as sns \nimport warnings \nwarnings.simplefilter(action='ignore', category=FutureWarning) \npd.options.mode.chained_assignment = None  # remove warning \nfrom collections import Counter\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;2]:\n# load the csv file into a data frame and show the first 5 rows in order to have a quick look on the data\ndf = pd.read_csv ('creditcard_train.csv')\ndf.head()\n<\/pre><\/div>\n\n\n<p><br>Out[2]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Time<\/th><th>V1<\/th><th>V2<\/th><th>V3<\/th><th>V4<\/th><th>V5<\/th><th>V6<\/th><th>V7<\/th><th>V8<\/th><th>V9<\/th><th>&#8230;<\/th><th>V21<\/th><th>V22<\/th><th>V23<\/th><th>V24<\/th><th>V25<\/th><th>V26<\/th><th>V27<\/th><th>V28<\/th><th>Amount<\/th><th>Class<\/th><\/tr><\/thead><tbody><tr><th>0<\/th><td>0.0<\/td><td>-1.359807<\/td><td>-0.072781<\/td><td>2.536347<\/td><td>1.378155<\/td><td>-0.338321<\/td><td>0.462388<\/td><td>0.239599<\/td><td>0.098698<\/td><td>0.363787<\/td><td>&#8230;<\/td><td>-0.018307<\/td><td>0.277838<\/td><td>-0.110474<\/td><td>0.066928<\/td><td>0.128539<\/td><td>-0.189115<\/td><td>0.133558<\/td><td>-0.021053<\/td><td>149.62<\/td><td>0<\/td><\/tr><tr><th>1<\/th><td>0.0<\/td><td>1.191857<\/td><td>0.266151<\/td><td>0.166480<\/td><td>0.448154<\/td><td>0.060018<\/td><td>-0.082361<\/td><td>-0.078803<\/td><td>0.085102<\/td><td>-0.255425<\/td><td>&#8230;<\/td><td>-0.225775<\/td><td>-0.638672<\/td><td>0.101288<\/td><td>-0.339846<\/td><td>0.167170<\/td><td>0.125895<\/td><td>-0.008983<\/td><td>0.014724<\/td><td>2.69<\/td><td>0<\/td><\/tr><tr><th>2<\/th><td>1.0<\/td><td>-1.358354<\/td><td>-1.340163<\/td><td>1.773209<\/td><td>0.379780<\/td><td>-0.503198<\/td><td>1.800499<\/td><td>0.791461<\/td><td>0.247676<\/td><td>-1.514654<\/td><td>&#8230;<\/td><td>0.247998<\/td><td>0.771679<\/td><td>0.909412<\/td><td>-0.689281<\/td><td>-0.327642<\/td><td>-0.139097<\/td><td>-0.055353<\/td><td>-0.059752<\/td><td>378.66<\/td><td>0<\/td><\/tr><tr><th>3<\/th><td>1.0<\/td><td>-0.966272<\/td><td>-0.185226<\/td><td>1.792993<\/td><td>-0.863291<\/td><td>-0.010309<\/td><td>1.247203<\/td><td>0.237609<\/td><td>0.377436<\/td><td>-1.387024<\/td><td>&#8230;<\/td><td>-0.108300<\/td><td>0.005274<\/td><td>-0.190321<\/td><td>-1.175575<\/td><td>0.647376<\/td><td>-0.221929<\/td><td>0.062723<\/td><td>0.061458<\/td><td>123.50<\/td><td>0<\/td><\/tr><tr><th>4<\/th><td>2.0<\/td><td>-1.158233<\/td><td>0.877737<\/td><td>1.548718<\/td><td>0.403034<\/td><td>-0.407193<\/td><td>0.095921<\/td><td>0.592941<\/td><td>-0.270533<\/td><td>0.817739<\/td><td>&#8230;<\/td><td>-0.009431<\/td><td>0.798278<\/td><td>-0.137458<\/td><td>0.141267<\/td><td>-0.206010<\/td><td>0.502292<\/td><td>0.219422<\/td><td>0.215153<\/td><td>69.99<\/td><td>0<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">5 rows \u00d7 31 columns<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;3]:\n# find out the total number of rows and columns in the file\nprint(df.shape)\n# it seems there are some data missing, lets try to solve it!\n<\/pre><\/div>\n\n\n<p><br>Out[3]: <span style=\"font-size: 14px; color: initial;\">(284657, 31)<\/span><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;4]:\n# So first, we have to find out where and how many are the missing data\ndf.isnull().sum()\n<\/pre><\/div>\n\n\n<p>Out[4]:Time 0 V1 0 V2 0 V3 0 V4 0 V5 0 V6 0 V7 0 V8 0 V9 0 V10 0 V11 0 V12 0 V13 0 V14 0 V15 0 V16 0 V17 0 V18 0 V19 0 V20 0 V21 0 V22 278 V23 520 V24 0 V25 0 V26 0 V27 0 V28 0 Amount 0 Class 0 dtype: int64<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;5]:\nmissing_col = &#x5B;'V22','V23']\n#Using mean and looping to impute the missing values\nfor i in missing_col:\n    df.loc&#x5B;df.loc&#x5B;:,i].isnull(),i] = df.loc&#x5B;:,i].mean()\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;6]:\n#After filling missing data, we have to detect the outliers and remove them, except the result\n\u200b\nQ1 = df.iloc&#x5B;:,:-1].quantile(0.25)\nQ3 = df.iloc&#x5B;:,:-1].quantile(0.75)\nIQR = Q3 - Q1\n\u200b\ndata = df&#x5B;~((df &lt; (Q1 - 2.5 * IQR)) | (df &gt; (Q3 + 2.5 * IQR))).any(axis=1)]\n\u200b\nprint(data.shape)\n<\/pre><\/div>\n\n\n<p>(213174, 31)<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;7]:\n#I want to see some basic info of the data for later data visualization\n#like mean of time..., in order to see is there any insight\ndata.describe().T\n<\/pre><\/div>\n\n\n<p><br>Out[7]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>count<\/th><th>mean<\/th><th>std<\/th><th>min<\/th><th>25%<\/th><th>50%<\/th><th>75%<\/th><th>max<\/th><\/tr><\/thead><tbody><tr><th>Time<\/th><td>213174.0<\/td><td>95141.575281<\/td><td>47383.880288<\/td><td>0.000000<\/td><td>54623.500000<\/td><td>84469.000000<\/td><td>139655.000000<\/td><td>172792.000000<\/td><\/tr><tr><th>V1<\/th><td>213174.0<\/td><td>0.496082<\/td><td>1.276026<\/td><td>-5.536010<\/td><td>-0.618926<\/td><td>0.999862<\/td><td>1.718921<\/td><td>2.454930<\/td><\/tr><tr><th>V2<\/th><td>213174.0<\/td><td>0.079463<\/td><td>0.827586<\/td><td>-4.098951<\/td><td>-0.463757<\/td><td>0.068991<\/td><td>0.703425<\/td><td>3.808020<\/td><\/tr><tr><th>V3<\/th><td>213174.0<\/td><td>0.164240<\/td><td>1.260900<\/td><td>-4.846779<\/td><td>-0.679702<\/td><td>0.286612<\/td><td>1.088969<\/td><td>4.079168<\/td><\/tr><tr><th>V4<\/th><td>213174.0<\/td><td>0.001365<\/td><td>1.283098<\/td><td>-4.826127<\/td><td>-0.784687<\/td><td>0.031907<\/td><td>0.703920<\/td><td>4.720074<\/td><\/tr><tr><th>V5<\/th><td>213174.0<\/td><td>-0.024155<\/td><td>0.910768<\/td><td>-3.252559<\/td><td>-0.621899<\/td><td>-0.070961<\/td><td>0.492316<\/td><td>3.870056<\/td><\/tr><tr><th>V6<\/th><td>213174.0<\/td><td>-0.205544<\/td><td>0.897459<\/td><td>-3.666150<\/td><td>-0.786134<\/td><td>-0.351098<\/td><td>0.189226<\/td><td>3.315460<\/td><\/tr><tr><th>V7<\/th><td>213174.0<\/td><td>0.011357<\/td><td>0.712423<\/td><td>-3.180199<\/td><td>-0.494948<\/td><td>0.044931<\/td><td>0.513259<\/td><td>3.008217<\/td><\/tr><tr><th>V8<\/th><td>213174.0<\/td><td>0.076681<\/td><td>0.381320<\/td><td>-1.548042<\/td><td>-0.170941<\/td><td>0.022833<\/td><td>0.266917<\/td><td>1.666501<\/td><\/tr><tr><th>V9<\/th><td>213174.0<\/td><td>-0.041198<\/td><td>0.975211<\/td><td>-3.672923<\/td><td>-0.621874<\/td><td>-0.062938<\/td><td>0.535013<\/td><td>3.695086<\/td><\/tr><tr><th>V10<\/th><td>213174.0<\/td><td>-0.047962<\/td><td>0.738765<\/td><td>-2.999375<\/td><td>-0.502612<\/td><td>-0.104048<\/td><td>0.342025<\/td><td>2.926167<\/td><\/tr><tr><th>V11<\/th><td>213174.0<\/td><td>0.002681<\/td><td>0.991865<\/td><td>-3.241392<\/td><td>-0.772031<\/td><td>-0.006187<\/td><td>0.757923<\/td><td>3.531399<\/td><\/tr><tr><th>V12<\/th><td>213174.0<\/td><td>0.044023<\/td><td>0.856469<\/td><td>-2.964042<\/td><td>-0.369406<\/td><td>0.161963<\/td><td>0.625702<\/td><td>2.601809<\/td><\/tr><tr><th>V13<\/th><td>213174.0<\/td><td>-0.008020<\/td><td>1.012035<\/td><td>-3.888606<\/td><td>-0.686500<\/td><td>-0.010238<\/td><td>0.683467<\/td><td>3.904562<\/td><\/tr><tr><th>V14<\/th><td>213174.0<\/td><td>0.018659<\/td><td>0.743759<\/td><td>-2.720881<\/td><td>-0.397002<\/td><td>0.048117<\/td><td>0.453591<\/td><td>2.788031<\/td><\/tr><tr><th>V15<\/th><td>213174.0<\/td><td>-0.012136<\/td><td>0.889419<\/td><td>-3.657525<\/td><td>-0.576125<\/td><td>0.036738<\/td><td>0.622986<\/td><td>3.601890<\/td><\/tr><tr><th>V16<\/th><td>213174.0<\/td><td>0.017735<\/td><td>0.779686<\/td><td>-2.944460<\/td><td>-0.430518<\/td><td>0.076966<\/td><td>0.500815<\/td><td>2.686354<\/td><\/tr><tr><th>V17<\/th><td>213174.0<\/td><td>-0.037677<\/td><td>0.629111<\/td><td>-2.311921<\/td><td>-0.487001<\/td><td>-0.093053<\/td><td>0.345882<\/td><td>2.606673<\/td><\/tr><tr><th>V18<\/th><td>213174.0<\/td><td>-0.021883<\/td><td>0.791908<\/td><td>-2.997391<\/td><td>-0.505811<\/td><td>-0.029341<\/td><td>0.460010<\/td><td>2.997719<\/td><\/tr><tr><th>V19<\/th><td>213174.0<\/td><td>0.002176<\/td><td>0.733705<\/td><td>-2.743833<\/td><td>-0.409821<\/td><td>0.016457<\/td><td>0.437332<\/td><td>2.744196<\/td><\/tr><tr><th>V20<\/th><td>213174.0<\/td><td>-0.068025<\/td><td>0.229977<\/td><td>-1.073139<\/td><td>-0.206210<\/td><td>-0.084503<\/td><td>0.064950<\/td><td>0.993129<\/td><\/tr><tr><th>V21<\/th><td>213174.0<\/td><td>-0.025548<\/td><td>0.259462<\/td><td>-1.251701<\/td><td>-0.221372<\/td><td>-0.038575<\/td><td>0.155363<\/td><td>1.222562<\/td><\/tr><tr><th>V22<\/th><td>213174.0<\/td><td>0.003638<\/td><td>0.674379<\/td><td>-2.659080<\/td><td>-0.542021<\/td><td>0.009103<\/td><td>0.513472<\/td><td>2.471164<\/td><\/tr><tr><th>V23<\/th><td>213174.0<\/td><td>-0.000756<\/td><td>0.215954<\/td><td>-0.933712<\/td><td>-0.133954<\/td><td>-0.006191<\/td><td>0.129913<\/td><td>0.921028<\/td><\/tr><tr><th>V24<\/th><td>213174.0<\/td><td>-0.026047<\/td><td>0.575898<\/td><td>-2.337548<\/td><td>-0.364409<\/td><td>0.028063<\/td><td>0.401640<\/td><td>1.307137<\/td><\/tr><tr><th>V25<\/th><td>213174.0<\/td><td>0.003608<\/td><td>0.462133<\/td><td>-1.986743<\/td><td>-0.305288<\/td><td>0.019155<\/td><td>0.341251<\/td><td>1.966419<\/td><\/tr><tr><th>V26<\/th><td>213174.0<\/td><td>-0.001525<\/td><td>0.461604<\/td><td>-1.641329<\/td><td>-0.316247<\/td><td>-0.038658<\/td><td>0.222019<\/td><td>1.660394<\/td><\/tr><tr><th>V27<\/th><td>213174.0<\/td><td>0.021914<\/td><td>0.145059<\/td><td>-0.475451<\/td><td>-0.055080<\/td><td>0.004869<\/td><td>0.078575<\/td><td>0.495576<\/td><\/tr><tr><th>V28<\/th><td>213174.0<\/td><td>0.012159<\/td><td>0.102029<\/td><td>-0.380841<\/td><td>-0.045706<\/td><td>0.009546<\/td><td>0.056236<\/td><td>0.405955<\/td><\/tr><tr><th>Amount<\/th><td>213174.0<\/td><td>41.932054<\/td><td>53.786545<\/td><td>0.000000<\/td><td>4.990000<\/td><td>18.910000<\/td><td>57.200000<\/td><td>256.000000<\/td><\/tr><tr><th>Class<\/th><td>213174.0<\/td><td>0.000145<\/td><td>0.012058<\/td><td>0.000000<\/td><td>0.000000<\/td><td>0.000000<\/td><td>0.000000<\/td><td>1.000000<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Module_2_Data_Visualization\"><\/span>Module 2: Data Visualization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;8]:\n#First graph, I have to see hows the distributions of all variable\nfig,ax=plt.subplots(5,6,figsize=&#x5B;16,9])\ncolumns=data.columns\nfor idx,ax in enumerate(ax.flat):\n    sns.kdeplot(data.loc&#x5B;:,columns&#x5B;idx]]&#x5B;:500],ax=ax)\nplt.tight_layout()\n#As the data has removed the outliers, but after plotting all the graph, \n#we can see that it is not Normalized and Standardized\n#so we may consider to apply normalization into the data\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img data-opt-id=898481986  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"1024\" height=\"573\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:573\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-4.png\" alt=\"\" class=\"wp-image-175\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:1024\/h:573\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-4.png 1024w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:168\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-4.png 300w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:768\/h:430\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-4.png 768w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:1144\/h:640\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-4.png 1144w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;9]:\n#applying Normalization from sklearn library by using minmaxscaler\nfrom sklearn.preprocessing import MinMaxScaler\nscaler = MinMaxScaler()\nscaler.fit(data.iloc&#x5B;:,:-1])\ndata.iloc&#x5B;:,:-1]=scaler.transform(data.iloc&#x5B;:,:-1])\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;10]:\n#Second graph, we have to see are there any features ('Variables') have correlation with our object ('Class')\nplt.figure(figsize=(8, 12))\nheatmap = sns.heatmap(data.corr()&#x5B;&#x5B;'Class']].sort_values(by='Class', ascending=False), vmin=-1, vmax=1, annot=True, cmap='BrBG')\nheatmap.set_title('Features correlating with Class', fontdict={'fontsize':18}, pad=12)\nplt.show()\n#After plotting the heatmap, we can concluded all the corrleation are very low.\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=713961199  fetchpriority=\"high\" loading=\"eager\" decoding=\"async\" width=\"510\" height=\"709\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-5.png\" alt=\"\" class=\"wp-image-176\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:510\/h:709\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-5.png 510w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:216\/h:300\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-5.png 216w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;11]:\n#As the info we got right now aren't very useful, we still can't make any assumption\n#then we can only try to visualize the highest correlation between the two variables and class to see is there any useful info\ncmap = sns.cubehelix_palette(rot=-.2, as_cmap=True) \ng=sns.relplot(\n    data=data,\n    x=&quot;V4&quot;, y=&quot;V17&quot;,\n    hue=&quot;Class&quot;, size=&quot;V4&quot;,\n)\ng.set(xscale=&quot;log&quot;, yscale=&quot;log&quot;)\ng.ax.xaxis.grid(True, &quot;minor&quot;, linewidth=.25)\ng.ax.yaxis.grid(True, &quot;minor&quot;, linewidth=.25)\ng.despine(left=True, bottom=True)\nplt.show()\n#After ploting the scatterplot,\n#it seems that there is a serious problem of unbalanced sample size which greatly affect my assumption\n#also we can conclude that it cannot be divided by the variables(V4 V17) alone class(0 1) category\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1977592555  loading=\"lazy\" decoding=\"async\" width=\"428\" height=\"370\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-6.png\" alt=\"\" class=\"wp-image-178\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:428\/h:370\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-6.png 428w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:259\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-6.png 300w\" sizes=\"auto, (max-width: 428px) 100vw, 428px\" \/><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;12]:\n#Further investigating of the problem of unbalanced sample size by plotting the number of normal and fraudulent cases out\nfig,ax=plt.subplots(figsize=&#x5B;10,6])\nbar=data.Class.value_counts()\nsns.barplot(x=bar.index,y=bar.values\/len(data),ax=ax)\nplt.title('The number of normal and fraudulent cases')\nplt.xlabel('Class')\nplt.ylabel('Percentage')\nfor y,x  in enumerate(bar.values\/len(data)):\n    plt.text(y,x,s=bar&#x5B;y],va='bottom',ha='center')\nplt.show()\n#As now, we can conclude the sample size are highly unbalanced,\n#I have done a similar analysis on p2p debit and credit risk, the ratio is 1:49\n#But this dataset, only have 0.014% positive sample size (fraudulent) \n#it is way more less than the given percentage 0.172%\n#maybe it is my problem that some important data are being removed during the process of data cleaning\n#Currently, I am pessimistic about this model\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=357272511  loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"387\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-8.png\" alt=\"\" class=\"wp-image-181\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:609\/h:387\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-8.png 609w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:191\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-8.png 300w\" sizes=\"auto, (max-width: 609px) 100vw, 609px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Module_3_Dimension_Reduction\"><\/span>Module 3: Dimension Reduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;13]:\n#We will use principal component analysis from sklearn library to achieve dimension reduction\nfrom sklearn.decomposition import PCA\n\u200b\npca=PCA(n_components=2).fit(data.iloc&#x5B;:,:-1]) #remove 'Class' column and compress the data into 2d\nfeatures_pca=pca.transform(data.iloc&#x5B;:,:-1])\n\nIn &#x5B;14]:\n#have some basic view with the principal component\nfeatures_pca\n<\/pre><\/div>\n\n\n<p><br>Out[14]:array([[ 6.65811147e-01, 1.89550269e-01], [ 4.93164830e-01, -3.89688202e-04], [ 5.96507876e-01, 1.64345532e-01], &#8230;, [-3.12209498e-01, -2.19958181e-01], [-5.22941908e-01, 6.27902788e-02], [-4.04661011e-01, 2.29348807e-01]])<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;15]:\nsns.relplot(x=features_pca&#x5B;:,0],y=features_pca&#x5B;:,1],hue=data.Class,col=data.Class)\nplt.show()\n#As the positive and negative sample size are unbalanced, it is difficult to show them on single graph\n#so we have to divide them into two\n#Now, we can conclude that this is a linearly inseparable problem.\n#It is very difficult for us to find a decision boundary to separate the categories of 0 and 1.\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=26217915  loading=\"lazy\" decoding=\"async\" width=\"758\" height=\"352\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-9.png\" alt=\"\" class=\"wp-image-183\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:758\/h:352\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-9.png 758w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:139\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-9.png 300w\" sizes=\"auto, (max-width: 758px) 100vw, 758px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Module_4_Classification\"><\/span>Module 4: Classification<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;16]:\n###pick 3 classification methods, and methods not in the below list can also be used\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis\nfrom sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.svm import SVC\nfrom sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier\nfrom sklearn.naive_bayes import GaussianNB\nfrom sklearn.gaussian_process import GaussianProcessClassifier\nfrom sklearn.gaussian_process.kernels import RBF\nIn &#x5B;17]:\n#Before doing classification and modelling, \n#we have to do some preparation on our data first in order to solve the undersampling problem.\n#First, we filter out all the positive sample.\npos=data.loc&#x5B;data.Class==1]\npos.head()\n<\/pre><\/div>\n\n\n<p><br>Out[17]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Time<\/th><th>V1<\/th><th>V2<\/th><th>V3<\/th><th>V4<\/th><th>V5<\/th><th>V6<\/th><th>V7<\/th><th>V8<\/th><th>V9<\/th><th>&#8230;<\/th><th>V21<\/th><th>V22<\/th><th>V23<\/th><th>V24<\/th><th>V25<\/th><th>V26<\/th><th>V27<\/th><th>V28<\/th><th>Amount<\/th><th>Class<\/th><\/tr><\/thead><tbody><tr><th>10471<\/th><td>0.099466<\/td><td>0.828987<\/td><td>0.632028<\/td><td>0.587217<\/td><td>0.837651<\/td><td>0.481326<\/td><td>0.493320<\/td><td>0.510181<\/td><td>0.478192<\/td><td>0.615223<\/td><td>&#8230;<\/td><td>0.334705<\/td><td>0.362210<\/td><td>0.545266<\/td><td>0.687345<\/td><td>0.591153<\/td><td>0.461089<\/td><td>0.502971<\/td><td>0.552996<\/td><td>0.014805<\/td><td>1<\/td><\/tr><tr><th>10484<\/th><td>0.099657<\/td><td>0.841677<\/td><td>0.637569<\/td><td>0.552223<\/td><td>0.822335<\/td><td>0.514553<\/td><td>0.494404<\/td><td>0.522550<\/td><td>0.447245<\/td><td>0.618447<\/td><td>&#8230;<\/td><td>0.302743<\/td><td>0.314153<\/td><td>0.472821<\/td><td>0.547883<\/td><td>0.639483<\/td><td>0.467340<\/td><td>0.486505<\/td><td>0.547359<\/td><td>0.014805<\/td><td>1<\/td><\/tr><tr><th>14319<\/th><td>0.147148<\/td><td>0.833612<\/td><td>0.661327<\/td><td>0.435114<\/td><td>0.785843<\/td><td>0.595657<\/td><td>0.490264<\/td><td>0.564990<\/td><td>0.519826<\/td><td>0.343319<\/td><td>&#8230;<\/td><td>0.438500<\/td><td>0.416578<\/td><td>0.442831<\/td><td>0.478909<\/td><td>0.634311<\/td><td>0.510243<\/td><td>0.505249<\/td><td>0.564889<\/td><td>0.014687<\/td><td>1<\/td><\/tr><tr><th>27339<\/th><td>0.199784<\/td><td>0.828093<\/td><td>0.571061<\/td><td>0.639674<\/td><td>0.769624<\/td><td>0.455947<\/td><td>0.605805<\/td><td>0.493959<\/td><td>0.550985<\/td><td>0.407074<\/td><td>&#8230;<\/td><td>0.441471<\/td><td>0.458833<\/td><td>0.532329<\/td><td>0.644585<\/td><td>0.597468<\/td><td>0.464918<\/td><td>0.511272<\/td><td>0.497461<\/td><td>0.005938<\/td><td>1<\/td><\/tr><tr><th>50497<\/th><td>0.257720<\/td><td>0.663387<\/td><td>0.563346<\/td><td>0.763948<\/td><td>0.374027<\/td><td>0.360987<\/td><td>0.429760<\/td><td>0.523447<\/td><td>0.480594<\/td><td>0.650837<\/td><td>&#8230;<\/td><td>0.595075<\/td><td>0.696105<\/td><td>0.349038<\/td><td>0.765157<\/td><td>0.550207<\/td><td>0.336032<\/td><td>0.616497<\/td><td>0.534346<\/td><td>0.003906<\/td><td>1<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">5 rows \u00d7 31 columns<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;18]:\n#We have to generate some random sample for later classification use\nnp.random.seed(1234)\n\u200b\ndata=data.sample(frac=1) #Return a random sample of data.\nneg=data.loc&#x5B;data.Class==0]&#x5B;:len(pos)] #Select negative samples with the same number of positive samples\nneg.head()\n<\/pre><\/div>\n\n\n<p><br>Out[18]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Time<\/th><th>V1<\/th><th>V2<\/th><th>V3<\/th><th>V4<\/th><th>V5<\/th><th>V6<\/th><th>V7<\/th><th>V8<\/th><th>V9<\/th><th>&#8230;<\/th><th>V21<\/th><th>V22<\/th><th>V23<\/th><th>V24<\/th><th>V25<\/th><th>V26<\/th><th>V27<\/th><th>V28<\/th><th>Amount<\/th><th>Class<\/th><\/tr><\/thead><tbody><tr><th>126992<\/th><td>0.452561<\/td><td>0.862580<\/td><td>0.427405<\/td><td>0.549750<\/td><td>0.420335<\/td><td>0.417202<\/td><td>0.659959<\/td><td>0.391111<\/td><td>0.538233<\/td><td>0.410159<\/td><td>&#8230;<\/td><td>0.308037<\/td><td>0.388702<\/td><td>0.434289<\/td><td>0.294569<\/td><td>0.606618<\/td><td>0.844428<\/td><td>0.468568<\/td><td>0.462663<\/td><td>0.058594<\/td><td>0<\/td><\/tr><tr><th>183343<\/th><td>0.728060<\/td><td>0.903462<\/td><td>0.402517<\/td><td>0.606008<\/td><td>0.665632<\/td><td>0.291794<\/td><td>0.659313<\/td><td>0.302441<\/td><td>0.626403<\/td><td>0.740388<\/td><td>&#8230;<\/td><td>0.609877<\/td><td>0.702927<\/td><td>0.524760<\/td><td>0.534089<\/td><td>0.440702<\/td><td>0.338447<\/td><td>0.593475<\/td><td>0.461779<\/td><td>0.343711<\/td><td>0<\/td><\/tr><tr><th>64620<\/th><td>0.296617<\/td><td>0.844377<\/td><td>0.522574<\/td><td>0.650733<\/td><td>0.624634<\/td><td>0.390661<\/td><td>0.568212<\/td><td>0.434091<\/td><td>0.504582<\/td><td>0.602396<\/td><td>&#8230;<\/td><td>0.464229<\/td><td>0.512938<\/td><td>0.436808<\/td><td>0.525002<\/td><td>0.646065<\/td><td>0.387843<\/td><td>0.571280<\/td><td>0.525252<\/td><td>0.039023<\/td><td>0<\/td><\/tr><tr><th>38807<\/th><td>0.229021<\/td><td>0.819414<\/td><td>0.525336<\/td><td>0.581708<\/td><td>0.656817<\/td><td>0.442502<\/td><td>0.508198<\/td><td>0.557331<\/td><td>0.445715<\/td><td>0.507953<\/td><td>&#8230;<\/td><td>0.490752<\/td><td>0.521458<\/td><td>0.423099<\/td><td>0.679017<\/td><td>0.674914<\/td><td>0.398378<\/td><td>0.524619<\/td><td>0.525879<\/td><td>0.344023<\/td><td>0<\/td><\/tr><tr><th>112818<\/th><td>0.421570<\/td><td>0.618444<\/td><td>0.673766<\/td><td>0.644609<\/td><td>0.540979<\/td><td>0.414852<\/td><td>0.398353<\/td><td>0.639453<\/td><td>0.526851<\/td><td>0.347534<\/td><td>&#8230;<\/td><td>0.390898<\/td><td>0.315202<\/td><td>0.699672<\/td><td>0.748187<\/td><td>0.304074<\/td><td>0.486842<\/td><td>0.461561<\/td><td>0.617672<\/td><td>0.241602<\/td><td>0<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">5 rows \u00d7 31 columns<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;19]:\n#Concatenate the positive and negative data and read the test set\ntrain=pd.concat(&#x5B;pos,neg])\ntest=pd.read_csv ('creditcard_test.csv')\ntest.head()\n<\/pre><\/div>\n\n\n<p><br>Out[19]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Time<\/th><th>V1<\/th><th>V2<\/th><th>V3<\/th><th>V4<\/th><th>V5<\/th><th>V6<\/th><th>V7<\/th><th>V8<\/th><th>V9<\/th><th>&#8230;<\/th><th>V21<\/th><th>V22<\/th><th>V23<\/th><th>V24<\/th><th>V25<\/th><th>V26<\/th><th>V27<\/th><th>V28<\/th><th>Amount<\/th><th>Class<\/th><\/tr><\/thead><tbody><tr><th>0<\/th><td>40086<\/td><td>1.083693<\/td><td>1.179501<\/td><td>-1.346150<\/td><td>1.998824<\/td><td>0.818034<\/td><td>-0.771419<\/td><td>0.230307<\/td><td>0.093683<\/td><td>-0.167594<\/td><td>&#8230;<\/td><td>-0.312000<\/td><td>-0.639700<\/td><td>-0.120249<\/td><td>-0.180218<\/td><td>0.609283<\/td><td>-0.339524<\/td><td>0.096701<\/td><td>0.114972<\/td><td>1.00<\/td><td>1<\/td><\/tr><tr><th>1<\/th><td>93860<\/td><td>-10.850282<\/td><td>6.727466<\/td><td>-16.760583<\/td><td>8.425832<\/td><td>-10.252697<\/td><td>-4.192171<\/td><td>-14.077086<\/td><td>7.168288<\/td><td>-3.683242<\/td><td>&#8230;<\/td><td>2.541637<\/td><td>0.135535<\/td><td>-1.023967<\/td><td>0.406265<\/td><td>0.106593<\/td><td>-0.026232<\/td><td>-1.464630<\/td><td>-0.411682<\/td><td>78.00<\/td><td>1<\/td><\/tr><tr><th>2<\/th><td>14152<\/td><td>-4.710529<\/td><td>8.636214<\/td><td>-15.496222<\/td><td>10.313349<\/td><td>-4.351341<\/td><td>-3.322689<\/td><td>-10.788373<\/td><td>5.060381<\/td><td>-5.689311<\/td><td>&#8230;<\/td><td>1.990545<\/td><td>0.223785<\/td><td>0.554408<\/td><td>-1.204042<\/td><td>-0.450685<\/td><td>0.641836<\/td><td>1.605958<\/td><td>0.721644<\/td><td>1.00<\/td><td>1<\/td><\/tr><tr><th>3<\/th><td>27219<\/td><td>-25.266355<\/td><td>14.323254<\/td><td>-26.823673<\/td><td>6.349248<\/td><td>-18.664251<\/td><td>-4.647403<\/td><td>-17.971212<\/td><td>16.633103<\/td><td>-3.768351<\/td><td>&#8230;<\/td><td>1.780701<\/td><td>-1.861318<\/td><td>-1.188167<\/td><td>0.156667<\/td><td>1.768192<\/td><td>-0.219916<\/td><td>1.411855<\/td><td>0.414656<\/td><td>99.99<\/td><td>1<\/td><\/tr><tr><th>4<\/th><td>84204<\/td><td>-1.927453<\/td><td>1.827621<\/td><td>-7.019495<\/td><td>5.348303<\/td><td>-2.739188<\/td><td>-2.107219<\/td><td>-5.015848<\/td><td>1.205868<\/td><td>-4.382713<\/td><td>&#8230;<\/td><td>1.376938<\/td><td>-0.792017<\/td><td>-0.771414<\/td><td>-0.379574<\/td><td>0.718717<\/td><td>1.111151<\/td><td>1.277707<\/td><td>0.819081<\/td><td>512.25<\/td><td>1<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">5 rows \u00d7 31 columns<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;20]:\n#Perform the same data transformation on the test set\ntest.iloc&#x5B;:,:-1]=scaler.transform(test.iloc&#x5B;:,:-1])\ntest.head()\n<\/pre><\/div>\n\n\n<p><br>Out[20]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Time<\/th><th>V1<\/th><th>V2<\/th><th>V3<\/th><th>V4<\/th><th>V5<\/th><th>V6<\/th><th>V7<\/th><th>V8<\/th><th>V9<\/th><th>&#8230;<\/th><th>V21<\/th><th>V22<\/th><th>V23<\/th><th>V24<\/th><th>V25<\/th><th>V26<\/th><th>V27<\/th><th>V28<\/th><th>Amount<\/th><th>Class<\/th><\/tr><\/thead><tbody><tr><th>0<\/th><td>0.231990<\/td><td>0.828401<\/td><td>0.667569<\/td><td>0.392186<\/td><td>0.714939<\/td><td>0.571503<\/td><td>0.414622<\/td><td>0.551111<\/td><td>0.510718<\/td><td>0.475750<\/td><td>&#8230;<\/td><td>0.379791<\/td><td>0.393623<\/td><td>0.438586<\/td><td>0.591911<\/td><td>0.656696<\/td><td>0.394280<\/td><td>0.589224<\/td><td>0.630167<\/td><td>0.003906<\/td><td>1<\/td><\/tr><tr><th>1<\/th><td>0.543196<\/td><td>-0.665037<\/td><td>1.369224<\/td><td>-1.334738<\/td><td>1.388192<\/td><td>-0.982804<\/td><td>-0.075344<\/td><td>-1.760852<\/td><td>2.711530<\/td><td>-0.001401<\/td><td>&#8230;<\/td><td>1.533118<\/td><td>0.544733<\/td><td>-0.048662<\/td><td>0.752826<\/td><td>0.529535<\/td><td>0.489168<\/td><td>-1.018694<\/td><td>-0.039198<\/td><td>0.304688<\/td><td>1<\/td><\/tr><tr><th>2<\/th><td>0.081902<\/td><td>0.103302<\/td><td>1.610625<\/td><td>-1.193088<\/td><td>1.585916<\/td><td>-0.154267<\/td><td>0.049195<\/td><td>-1.229422<\/td><td>2.055789<\/td><td>-0.273668<\/td><td>&#8230;<\/td><td>1.310389<\/td><td>0.561935<\/td><td>0.802334<\/td><td>0.311003<\/td><td>0.388564<\/td><td>0.691507<\/td><td>2.143513<\/td><td>1.401232<\/td><td>0.003906<\/td><td>1<\/td><\/tr><tr><th>3<\/th><td>0.157525<\/td><td>-2.469089<\/td><td>2.329869<\/td><td>-2.462136<\/td><td>1.170662<\/td><td>-2.163769<\/td><td>-0.140548<\/td><td>-2.390113<\/td><td>5.655904<\/td><td>-0.012952<\/td><td>&#8230;<\/td><td>1.225578<\/td><td>0.155502<\/td><td>-0.137192<\/td><td>0.684343<\/td><td>0.949856<\/td><td>0.430506<\/td><td>1.943618<\/td><td>1.011059<\/td><td>0.390586<\/td><td>1<\/td><\/tr><tr><th>4<\/th><td>0.487314<\/td><td>0.451581<\/td><td>0.749538<\/td><td>-0.243416<\/td><td>1.065809<\/td><td>0.072076<\/td><td>0.223291<\/td><td>-0.296627<\/td><td>0.856703<\/td><td>-0.096334<\/td><td>&#8230;<\/td><td>1.062393<\/td><td>0.363933<\/td><td>0.087504<\/td><td>0.537213<\/td><td>0.684379<\/td><td>0.833650<\/td><td>1.805468<\/td><td>1.525073<\/td><td>2.000977<\/td><td>1<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">5 rows \u00d7 31 columns<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;21]:\n#import cross-validation from sklearn library\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.metrics import plot_confusion_matrix,classification_report\nIn &#x5B;22]:\n#train our data\ntrain_y=train.pop('Class')\ntrain_x=train\ntest_y=test.pop('Class')\ntest_x=test\nIn &#x5B;23]:\n#build a function that uses for training and evaluating different models, using 5-fold cross-validation\ndef train_and_evaluate(model,params):\n    gs=GridSearchCV(model(random_state=1234),\n                 param_grid=params, cv=5).fit(train_x,train_y)\n    print('Train score :',gs.best_score_)\n    print('Test score :',gs.score(test_x,test_y))\n    plot_confusion_matrix(gs, test_x, test_y)\n    plt.show()\n    print(classification_report(test_y,gs.predict(test_x)))\n    return gs\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"41_Training_with_RandomForestClassifier\"><\/span>4.1 Training with RandomForestClassifier<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;24]:\nm1=train_and_evaluate(RandomForestClassifier,\n                   params={'n_estimators':np.arange(10,40,5),\n                          'max_features':np.arange(10,20,1)})\n<\/pre><\/div>\n\n\n<p>Train score : 0.8397435897435898 Test score : 0.78<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=969066098  loading=\"lazy\" decoding=\"async\" width=\"306\" height=\"263\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-10.png\" alt=\"\" class=\"wp-image-196\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:306\/h:263\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-10.png 306w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:258\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-10.png 300w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=487373498  loading=\"lazy\" decoding=\"async\" width=\"369\" height=\"175\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-11.png\" alt=\"\" class=\"wp-image-197\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:369\/h:175\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-11.png 369w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:142\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-11.png 300w\" sizes=\"auto, (max-width: 369px) 100vw, 369px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"42_Training_with_LogisticRegression\"><\/span>4.2 Training with LogisticRegression<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;25]:\nm2=train_and_evaluate(LogisticRegression,\n                   params={\n                          'C':np.linspace(0.1,1,10)}\n                           )\n<\/pre><\/div>\n\n\n<p>Train score : 0.7371794871794872 Test score : 0.7333333333333333<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1633554621  loading=\"lazy\" decoding=\"async\" width=\"306\" height=\"262\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-12.png\" alt=\"\" class=\"wp-image-200\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:306\/h:262\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-12.png 306w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:257\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-12.png 300w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=260759593  loading=\"lazy\" decoding=\"async\" width=\"362\" height=\"174\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-13.png\" alt=\"\" class=\"wp-image-202\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:362\/h:174\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-13.png 362w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:144\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-13.png 300w\" sizes=\"auto, (max-width: 362px) 100vw, 362px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"43_Training_with_Support_Vector_Machine\"><\/span>4.3 Training with Support Vector Machine<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;26]:\nm3=train_and_evaluate(SVC,\n                   params={\n                          'C':np.arange(1,10,1),\n                          'gamma':np.linspace(0.01,0.1,10)\n                   }\n                           )\n<\/pre><\/div>\n\n\n<p>Train score : 0.7717948717948718 Test score : 0.66<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=443853535  loading=\"lazy\" decoding=\"async\" width=\"306\" height=\"266\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-14.png\" alt=\"\" class=\"wp-image-204\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:306\/h:266\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-14.png 306w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:261\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-14.png 300w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1747958952  loading=\"lazy\" decoding=\"async\" width=\"375\" height=\"176\" src=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-15.png\" alt=\"\" class=\"wp-image-206\" srcset=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:375\/h:176\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-15.png 375w, https:\/\/mlcznkdztmb6.i.optimole.com\/w:300\/h:141\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/image-15.png 300w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\" \/><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIn &#x5B;27]:\n#After we have realised that the best model among these three should be random forest,\n#we sort out most relevant features ranking\npd.DataFrame({'Feature':train_x.columns,\n             'Value':m1.best_estimator_.feature_importances_}).sort_values(by='Value',ascending=False)\n<\/pre><\/div>\n\n\n<p><br>Out[27]:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><\/th><th>Feature<\/th><th>Value<\/th><\/tr><\/thead><tbody><tr><th>14<\/th><td>V14<\/td><td>0.218701<\/td><\/tr><tr><th>17<\/th><td>V17<\/td><td>0.125888<\/td><\/tr><tr><th>4<\/th><td>V4<\/td><td>0.111668<\/td><\/tr><tr><th>19<\/th><td>V19<\/td><td>0.084166<\/td><\/tr><tr><th>16<\/th><td>V16<\/td><td>0.065130<\/td><\/tr><tr><th>29<\/th><td>Amount<\/td><td>0.052899<\/td><\/tr><tr><th>15<\/th><td>V15<\/td><td>0.038870<\/td><\/tr><tr><th>13<\/th><td>V13<\/td><td>0.038309<\/td><\/tr><tr><th>27<\/th><td>V27<\/td><td>0.034972<\/td><\/tr><tr><th>2<\/th><td>V2<\/td><td>0.027462<\/td><\/tr><tr><th>28<\/th><td>V28<\/td><td>0.022460<\/td><\/tr><tr><th>8<\/th><td>V8<\/td><td>0.022411<\/td><\/tr><tr><th>23<\/th><td>V23<\/td><td>0.018893<\/td><\/tr><tr><th>20<\/th><td>V20<\/td><td>0.018022<\/td><\/tr><tr><th>6<\/th><td>V6<\/td><td>0.016508<\/td><\/tr><tr><th>5<\/th><td>V5<\/td><td>0.012932<\/td><\/tr><tr><th>0<\/th><td>Time<\/td><td>0.012430<\/td><\/tr><tr><th>18<\/th><td>V18<\/td><td>0.011545<\/td><\/tr><tr><th>26<\/th><td>V26<\/td><td>0.009688<\/td><\/tr><tr><th>9<\/th><td>V9<\/td><td>0.009620<\/td><\/tr><tr><th>10<\/th><td>V10<\/td><td>0.009097<\/td><\/tr><tr><th>22<\/th><td>V22<\/td><td>0.007824<\/td><\/tr><tr><th>11<\/th><td>V11<\/td><td>0.007375<\/td><\/tr><tr><th>12<\/th><td>V12<\/td><td>0.006643<\/td><\/tr><tr><th>7<\/th><td>V7<\/td><td>0.004063<\/td><\/tr><tr><th>1<\/th><td>V1<\/td><td>0.003455<\/td><\/tr><tr><th>25<\/th><td>V25<\/td><td>0.003063<\/td><\/tr><tr><th>21<\/th><td>V21<\/td><td>0.002532<\/td><\/tr><tr><th>24<\/th><td>V24<\/td><td>0.001935<\/td><\/tr><tr><th>3<\/th><td>V3<\/td><td>0.001440<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Module_5_Summary\"><\/span>Module 5: Summary<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>I am going to summarize my findings and draw conclusions by using Q and A format. So that this chance of experience can be better recorded.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>What have you done and what have you learned?<\/strong><br>A: So basically to conclude what I have done, I have to understand the property of the given data first so that I can start building up the plan to analyze it which is all the stuff you can see in module 1. Then, I will go deeper into the data like visualizing different variables with the target value by plotting graph. Although it may not really work just like this time, at least it still gives me some insight like the problem is linearly inseparable so that I can solve it with different method. In the process of doing all of this, you can see that I keep adjusting the dataset so that it fits my requirement to finish certain tasks. For example, without removing outliers, the accuracy of the model must be affected so all the steps I have done are effective and useful for the entire project. In the dimension reduction and classification, I have applied the skills I learnt from letures and tutorials. Apart from that, I use quite a lot of time on reading docs of some function and the library. Except from the offical docs, I found stackerflow and geeksforgeeks provided me many useful resources as well. This experience can certainly conslidate my coding skills and logic flow.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>What is the biggest difficuly in this project and how did you solve it?<\/strong><br>A: Undoubtedly, I would say module 2 is the most challenging parts for me. As there isn&#8217;t many instructions on this module which gives me more freedom to interpret it, the dataset itself doesn&#8217;t give me any hints of finding the relation between feautres(variables) and objects(class). So I stucked in this part for quite long after plotting many different graphs, but I turned out understand that not always there is a big relationship between a single variable and a target. You have to know that sometimes finding them doesn&#8217;t have great relation is also a part of exploration and analyzing result. There is no fixed solution for a problem, data only gives you insight on how you can use it to prove your view.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>What do you think of annotation, how does it helps?<\/strong><br>A: I have annotate each part steps by steps in order to let the readers understand what is that part doing. It helps me to debug as I can find the problem of code quickly with those annotation.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>How you finish solving this project?<\/strong><br>A: My basic flow of solving quesion and doing analysis is writing all steps onto a paper first. During the progress, I have to further brainstorm the possibility of doing wrong or missing important factors, so if there are any hints that can be followed, then I will strictly follow the given procedures so that I won&#8217;t digress from the topic and don&#8217;t know what I am doing.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>What are the main results?<\/strong><br>A: As I stated above, Random forest is currently the best performing model among the models I have tested, but the probability of classifying positive samples on the test set is the same as logistic regression, which is only 65%.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>From the result, what have you discovered?<\/strong><br>A: In my opinion, this kind of fraud detection is obviously more important to correctly classify a positive sample than to classify a negative sample. It is because if you judge a negative sample as a positive sample, it may cause economic losses for the client and the company.<\/p>\n\n\n\n<p>Q:&nbsp;<strong>What advice you can give to the credit card company?<\/strong><br>A: If you want to avoid this situation, you need to increase the weight of the positive samples, but after doing so, more negative samples will be classified into positive samples, thus rejecting more customers to apply for credit cards. And there will also be a lose of profit. As a result, in order to find a balance, it really depends on how your company choose between risks and profits.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is an individual project of SDSC2001 &#8211; Python for Data Science. I did the project in my year 2 2021\/22 Semester A. Course Instructor: Professor LI Xinyue Context Credit &hellip; <a href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/\" class=\"more-link\"><span>Continue reading<span class=\"screen-reader-text\">Credit Card Fraud Detection Analysis: A Machine Learning Approach<\/span><\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":327,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[72,3],"tags":[20,13,19,18,21],"class_list":["post-173","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-proj","tag-2021-22-semester-a","tag-data-science","tag-python-for-data-science","tag-sdsc2001","tag-year-2"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary\" \/>\n<meta property=\"og:description\" content=\"This is an individual project of SDSC2001 &#8211; Python for Data Science. I did the project in my year 2 2021\/22 Semester A. Course Instructor: Professor LI Xinyue Context Credit &hellip; Continue readingCredit Card Fraud Detection Analysis: A Machine Learning Approach\" \/>\n<meta property=\"og:url\" content=\"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/\" \/>\n<meta property=\"og:site_name\" content=\"Philip\u2019s Data Science Diary\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-28T09:22:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-06T01:57:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Philip\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Philip\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/\"},\"author\":{\"name\":\"Philip\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"headline\":\"Credit Card Fraud Detection Analysis: A Machine Learning Approach\",\"datePublished\":\"2021-11-28T09:22:00+00:00\",\"dateModified\":\"2024-03-06T01:57:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/\"},\"wordCount\":1341,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2021\\/11\\/Credit-Card-Fraud-Detection-Analysis.png\",\"keywords\":[\"2021\\\/22 Semester A\",\"Data Science\",\"Python for Data Science\",\"SDSC2001\",\"Year 2\"],\"articleSection\":[\"Machine Learning\",\"Projects\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/\",\"name\":\"Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2021\\/11\\/Credit-Card-Fraud-Detection-Analysis.png\",\"datePublished\":\"2021-11-28T09:22:00+00:00\",\"dateModified\":\"2024-03-06T01:57:48+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#primaryimage\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2021\\/11\\/Credit-Card-Fraud-Detection-Analysis.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2021\\/11\\/Credit-Card-Fraud-Detection-Analysis.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/2021\\\/11\\\/28\\\/credit-card-fraud-detection-analysis-a-machine-learning-approach\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9801\",\"item\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Credit Card Fraud Detection Analysis: A Machine Learning Approach\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#website\",\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/\",\"name\":\"Philip\u2019s University Data Science Journey\",\"description\":\"Navigating Data Science: From Classroom to Career\",\"publisher\":{\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/#\\\/schema\\\/person\\\/ef4f7cedd9b3bde11e126c4dbe1f8414\",\"name\":\"Philip\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"url\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"contentUrl\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\",\"width\":16,\"height\":16,\"caption\":\"Philip\"},\"logo\":{\"@id\":\"https:\\/\\/philip.twinight.co\\/portfolio\\/wp-content\\/uploads\\/2024\\/03\\/favicon.png\"},\"description\":\"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.\",\"sameAs\":[\"https:\\\/\\\/philip.twinight.co\\\/portfolio\"],\"url\":\"https:\\\/\\\/philip.twinight.co\\\/portfolio\\\/index.php\\\/author\\\/philip\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/","og_locale":"en_GB","og_type":"article","og_title":"Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary","og_description":"This is an individual project of SDSC2001 &#8211; Python for Data Science. I did the project in my year 2 2021\/22 Semester A. Course Instructor: Professor LI Xinyue Context Credit &hellip; Continue readingCredit Card Fraud Detection Analysis: A Machine Learning Approach","og_url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/","og_site_name":"Philip\u2019s Data Science Diary","article_published_time":"2021-11-28T09:22:00+00:00","article_modified_time":"2024-03-06T01:57:48+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png","type":"image\/png"}],"author":"Philip","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Philip","Estimated reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#article","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/"},"author":{"name":"Philip","@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"headline":"Credit Card Fraud Detection Analysis: A Machine Learning Approach","datePublished":"2021-11-28T09:22:00+00:00","dateModified":"2024-03-06T01:57:48+00:00","mainEntityOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/"},"wordCount":1341,"commentCount":0,"publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png","keywords":["2021\/22 Semester A","Data Science","Python for Data Science","SDSC2001","Year 2"],"articleSection":["Machine Learning","Projects"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/","url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/","name":"Credit Card Fraud Detection Analysis: A Machine Learning Approach - Philip\u2019s Data Science Diary","isPartOf":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#website"},"primaryImageOfPage":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#primaryimage"},"image":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png","datePublished":"2021-11-28T09:22:00+00:00","dateModified":"2024-03-06T01:57:48+00:00","breadcrumb":{"@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#primaryimage","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2021\/11\/Credit-Card-Fraud-Detection-Analysis.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/philip.twinight.co\/portfolio\/index.php\/2021\/11\/28\/credit-card-fraud-detection-analysis-a-machine-learning-approach\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9801","item":"https:\/\/philip.twinight.co\/portfolio\/"},{"@type":"ListItem","position":2,"name":"Credit Card Fraud Detection Analysis: A Machine Learning Approach"}]},{"@type":"WebSite","@id":"https:\/\/philip.twinight.co\/portfolio\/#website","url":"https:\/\/philip.twinight.co\/portfolio\/","name":"Philip\u2019s University Data Science Journey","description":"Navigating Data Science: From Classroom to Career","publisher":{"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/philip.twinight.co\/portfolio\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":["Person","Organization"],"@id":"https:\/\/philip.twinight.co\/portfolio\/#\/schema\/person\/ef4f7cedd9b3bde11e126c4dbe1f8414","name":"Philip","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","url":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","contentUrl":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png","width":16,"height":16,"caption":"Philip"},"logo":{"@id":"https:\/\/mlcznkdztmb6.i.optimole.com\/w:auto\/h:auto\/q:mauto\/f:best\/ig:avif\/https:\/\/philip.twinight.co\/portfolio\/wp-content\/uploads\/2024\/03\/favicon.png"},"description":"Data Scientist &amp; Systems Engineer. Graduated from City University of Hong Kong. Previously founded Twinight Limited as CTO, developing AI investment analytics and automated trading solutions. Currently working as a Test and Integration Engineer on a Vessel Traffic Service (VTS) system in the maritime industry since December 2024.","sameAs":["https:\/\/philip.twinight.co\/portfolio"],"url":"https:\/\/philip.twinight.co\/portfolio\/index.php\/author\/philip\/"}]}},"_links":{"self":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/comments?post=173"}],"version-history":[{"count":17,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/173\/revisions"}],"predecessor-version":[{"id":343,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/posts\/173\/revisions\/343"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media\/327"}],"wp:attachment":[{"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/media?parent=173"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/categories?post=173"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philip.twinight.co\/portfolio\/index.php\/wp-json\/wp\/v2\/tags?post=173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}