Hybrid Medical Data Mining Model for Identifying Tumor Severity in Breast Cancer Diagnosis

Document Type : Research Paper


1 School of Industrial Engineering, Iran University of Science & Technology, Tehran, Iran.

2 Department of Medical Informatics, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran


Purpose: This study proposes a methodology for detecting tumor severity using data mining of databases relating to breast imaging modalities. In doing so, it proposes creating a software application that can serve as an efficient decision-making support system for medical practitioners, especially those in areas where there is a shortage of modern medical diagnostic devices or specialized practitioners, such as in developing countries.
Method: we investigated the data of approximately 3754 screened women by using “BI-RADS” categories as a quality assessment tool to screening, measure, and identify the size and location of lesions, determine the number of lymph nodes, collect biopsy samples, determine final diagnoses, prognoses, and age which were all available from the screening registry.
Result: The application of each algorithm on BI-RADS values 4 and 5 for Invasive Ductal Carcinoma lesions was assessed, and the following accuracy was acquired: CART: 84.71%. In order to get the best result, four optimum clusters based on tumor size were applied to constructing simple rules with significant confidence.
Conclusion: This study presents a hybrid approach - a combination of k-means with GRI and CART decision tree - to better assess breast cancer data sets.


  • [1] Keleş, M.K., Breast cancer prediction and detection using data mining classification algorithms: a comparative study. Tehnički vjesnik, 2019. 26(1): p. 149-155.
  • [2] Ghousi, R., Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures. Journal of Industrial and Systems Engineering, 2015. 8(3): p. 59-76.
  • [3] Sohrabei, S. and A. Atashi, Performance Analysis of Data Mining Techniques for the Prediction Breast Cancer Risk on Big Data. Frontiers in Health Informatics, 2021. 10(1): p. 83.
  • [4] Diz, J., G. Marreiros, and A. Freitas, Applying data mining techniques to improve breast cancer diagnosis. Journal of medical systems, 2016. 40(9): p. 1-7.
  • [5] Masoumi, A., et a, A quantitative scoring system to compare the degree of COVID-19 infection in patients’ lungs during the three peaks of the pandemic in Iran. Journal of Industrial and Systems Engineering, 2021. 13(3): p. 61-69.
  • [6] Higa, A., Diagnosis of breast cancer using decision tree and artificial neural network algorithms. cell, 2018. 1: p. 10.
  • [7] Ghorbani, R. and R. Ghousi, Predictive data mining approaches in medical diagnosis: A review of some diseases prediction. International Journal of Data and Network Science, 2019. 3(2): p. 47-70.
  • [8] Kharya, S., Using data mining techniques for diagnosis and prognosis of cancer disease. arXiv preprint arXiv:1205.1923, 2012.
  • [9] Chaurasia, V., Pal, and B. Tiwari, Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 2018. 12(2): p. 119-126.
  • [10] Gupta, S., D. Kumar, and A. Sharma, Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian Journal of Computer Science and Engineering (IJCSE), 2011. 2(2): p. 188-195.
  • [11] Sahu, B., S. Mohanty, and S. Rout, A hybrid approach for breast cancer classification and diagnosis. EAI Endorsed Transactions on Scalable Information Systems, 2019. 6(20).
  • [12] Khamparia, A., et al., Diagnosis of breast cancer based on modern mammography using hybrid transfer learning. Multidimensional systems and signal processing, 2021. 32(2): p. 747-765.
  • [13] Chaurasia, V. and S. Pal, Applications of machine learning techniques to predict diagnostic breast cancer. SN Computer Science, 2020. 1(5): p. 1-11.
  • [14] Farid, A.A., G. Selim, and H. Khater, A Composite Hybrid Feature Selection Learning-Based Optimization of Genetic Algorithm For Breast Cancer Detection.
  • [15] Niaksu, O., CRISP data mining methodology extension for medical domain. Baltic Journal of Modern Computing, 2015. 3(2): p. 92.
  • [16] Dubey, A.K., U. Gupta, and S. Jain, Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. International journal of computer assisted radiology and surgery, 2016. 11(11): p. 2033-2047.
  • [17] Liu, Y., et Understanding of internal clustering validation measures. in 2010 IEEE international conference on data mining. 2010. IEEE.
  • [18] Mahmud, M.S., M.M. Rahman, and M.N. Akhtar. Improvement of K-means clustering algorithm with better initial centroids based on weighted average. in 2012 7th International Conference on Electrical and Computer Engineering. 2012. IEEE.
  • [19] Mughnyanti, M., S. Efendi, and M. Zarlis. Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation. in IOP Conference Series: Materials Science and Engineering. 2020. IOP Publishing.
  • [20] Sinaga, K.P. and M.-S. Yang, Unsupervised K-means clustering algorithm. IEEE Access, 2020. 8: p. 80716-80727.
  • [21] Brijs, T., et al., Building an association rules framework to improve product assortment decisions. Data Mining and Knowledge Discovery, 2004. 8(1): p. 7-23.
  • [22] Erpolat, S., Comparison of Apriori and FP-Growth Algorithms on Determination of Association Rules in Authorized Automobile Service Centres. Anadolu University Journal of Social Sciences, 2012. 12(2): p. 137-146.
  • [23] Özseyhan, C., B. Badur and O.N. Darcan, An association rule-based recommendation engine for an online dating site. Communications of the IBIMA, 2012. 2012: p. 1.
  • [24] Hu, R., Medical data mining based on association rules. Computer and information science, 2010. 3(4): p. 104.
  • [25] Vougas, K., et al., Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining. Pharmacology & therapeutics, 2019. 203: p. 107395.
  • [26] Breiman, L., et al., Classification and regression trees. 2017: Routledge.
  • [27] Yan, S., L. Zhang, and C. Song, Applying a new maximum local asymmetry feature analysis method to improve near-term breast cancer risk prediction. Physics in Medicine & Biology, 2018. 63(20): p. 205010.
  • [28] Mohapatra, S.K., et al., The Positive Predictive Values of the Breast Imaging Reporting and Data System (BI-RADS) 4 Lesions and its Mammographic Morphological Features. Indian Journal of Surgical Oncology, 2021. 12(1): p. 182-189.
  • [29] Trieu, P.D., et al., Reader characteristics and mammogram features associated with breast imaging reporting scores. The British Journal of Radiology, 2020. 93(1114): p. 20200363.
  • [30] Bihrmann, K., et al., Performance of systematic and non-systematic (‘opportunistic’) screening mammography: a comparative study from Denmark. Journal of Medical Screening, 2008. 15(1): p. 23-26.
  • [31] Elmore, G., et al., International variation in screening mammography interpretations in community-based programs. Journal of the National Cancer Institute, 2003. 95(18): p. 1384-1393.
  • [32] Van der Steeg, A., et al., Effect of abnormal screening mammogram on quality of life. Journal of British Surgery, 2011. 98(4): p. 537-542.