BREAST CANCER CLASSIFICATION ANALYSIS USING RANDOM FOREST ALGORITHM

  • Henny Wahyu Sulistyo Universitas Muhammadiyah Jember
  • Hardian Oktavianto Universitas Muhammadiyah Jember
Keywords: klasifikasi, kanker payudara, naive bayes

Abstract

Breast cancer is considered the most common disease among women worldwide. Cancer consists of abnormal cells in the human body that have the potential to spread to other parts of the body than the affected part. Machine Learning presents a significant advantage over pathologists, Random Forest is one of the methods in machine learning that is used to solve classification problems. This method is a composite tree method derived from the classification and regression tree method and is based on a decision tree technique, so that it is able to overcome non-linear problems. In this study, we will analyze the application of the random forest algorithm in the classification of breast cancer. The accuracy value using k-fold cross validation has the lowest value of 95% and the highest value of 96%, while the accuracy value using percentage split has the lowest value of 94% and the highest value of 97%. The precision value using k-fold cross validation has the lowest value of 96% and the highest value of 97%, while the precision value using percentage split has the lowest value of 94% and the highest value of 98%. The recall value using k-fold cross validation has the lowest value of 96% and the highest value of 97%, while the precision value using percentage split has the lowest value of 94% and the highest value of 98%. In average, the accuracy and precision values ​​when using percentage split are higher, while the recall value is higher when using k-fold cross validation. The scenario of sharing data using either k-fold cross validation with percentage split results in relatively the same accuracy, precision, and recall values ​​so that it can be said that there is no significant difference when applying the distribution of training data and test data using k-fold cross validation or percentage split.

Published
2020-02-17
Section
Articles