A New Hybrid Data Mining Technique to Forecast the Greenhouse Gases Emissions

Document Type : Research Paper


Department of Industrial Engineering, Semnan University, Semnan, Iran.


Expansion of industrial activities and the unnecessary growth of cities has increased the concentration of greenhouse gases, including carbon dioxide in the atmosphere. Mostly, CO2 emissions are caused by the consumption of different forms of energy and the combustion of all types of fuels, especially fossil fuels. The development of data mining techniques that lead to accurate prediction of CO2 emissions is very useful in deciding the Preventive measures and appropriate policies in this area. Most studies in this field are limited to models that do not compare different techniques and Features and only examine the effect of economic factors and fossil fuel consumption on CO2 emissions. The aim of this study is to identify a combination of significant features as well as to select the best technique to predict CO2 emissions. For this purpose, a huge dataset containing various features was obtained from the IEA database. A new hybrid method for predicting CO2 emissions was developed, then results were compared with proposed data mining techniques including: ANN, KNN, GLE, Linear-AS, Regression. Also a combination of significant features, and the best techniques for predicting CO2 emissions were identified. The results show that the proposed hybrid technique, which is a combination of K-Means, Linear-AS and Discriminant Analysis, is most accurate in this case.


       [1]        ALAM, S., FATIMA, A. & BUTT, M. S. 2007. Sustainable development in Pakistan in the context of energy consumption demand and environmental degradation. Journal of Asian Economics, 18, 825-837.
       [2]        BEGUM, R. A., SOHAG, K., ABDULLAH, S. M. S. & JAAFAR, M. 2015. CO2 emissions, energy consumption, economic and population growth in Malaysia. Renewable and Sustainable Energy Reviews, 41, 594-601.
       [3]        FOX, J. 1997. Applied regression analysis, linear models, and related methods, Sage Publications, Inc.
       [4]        FURUOKA, F. 2015. The CO2 emissions–development nexus revisited. Renewable and Sustainable Energy Reviews, 51, 1256-1275.
       [5]        HALICIOGLU, F. 2009. An econometric study of CO2 emissions, energy consumption, income and foreign trade in Turkey. Energy Policy, 37, 1156-1164.
       [6]        HAMZACEBI, C. & KARAKURT, I. 2015. Forecasting the energy-related CO2 emissions of Turkey using a grey prediction model. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 37, 1023-1031.
       [7]        HE, J. & RICHARD, P. 2010. Environmental Kuznets curve for CO2 in Canada. Ecological Economics, 69, 1083-1093.
       [8]        HOLTZ-EAKIN, D. & SELDEN, T. M. 1995. Stoking the fires? CO2 emissions and economic growth. Journal of public economics, 57, 85-101.
       [9]        HOSSEINI, S. M., SAIFODDIN, A., SHIRMOHAMMADI, R. & ASLANI, A. 2019. Forecasting of CO2 emissions in Iran based on time series and regression analysis. Energy Reports, 5, 619-631.
     [10]      JALIL, A. & MAHMUD, S. F. 2009. Environment Kuznets curve for CO2 emissions: a cointegration analysis for China. Energy policy, 37, 5167-5172.
     [11]      KARGUPTA, H., GAMA, J. & FAN, W. The next generation of transportation systems, greenhouse emissions, and data mining.  Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010. 1209-1212.
     [12]      KLECKA, W. R., IVERSEN, G. R. & KLECKA, W. R. 1980. Discriminant analysis, Sage.
     [13]      KNAPP, T. & MOOKERJEE, R. 1996. Population growth and global CO2 emissions: A secular perspective. Energy Policy, 24, 31-37.
     [14]      KUMAR, S. & MUHURI, P. K. 2019. A novel GDP prediction technique based on transfer learning using CO2 emission dataset. Applied Energy, 253, 113476.
     [15]      KUNDA, D. & PHIRI, H. 2017. An Approach for Predicting CO2 Emissions using Data Mining Techniques. International Journal of Computer Applications, 172, 7-10.
     [16]      LI, X., SONG, Y., YAO, Z. & XIAO, R. 2018. Forecasting China’s CO2 Emissions for Energy Consumption Based on Cointegration Approach. Discrete Dynamics in Nature and Society, 2018.
     [17]      LIU, Z., WANG, F., TANG, Z. & TANG, J. 2020. Predictions and driving factors of production-based CO2 emissions in Beijing, China. Sustainable Cities and Society, 53, 101909.
     [18]      LOTFALIPOUR, M. R., FALAHI, M. A. & BASTAM, M. 2013. Prediction of CO2 emissions in Iran using Grey and ARIMA models. International Journal of Energy Economics and Policy, 3, 229-237.
     [19]      MAIMON, O. & ROKACH, L. 2005. Data mining and knowledge discovery handbook.
     [20]      MARJANOVIĆ, V., MILOVANČEVIĆ, M. & MLADENOVIĆ, I. 2016. Prediction of GDP growth rate based on carbon dioxide (CO2) emissions. Journal of CO2 Utilization, 16, 212-217.
     [21]      NYONI, T. & BONGA, W. G. 2019. Prediction of CO2 Emissions in India Using ARIMA Models. DRJ-Journal of Economics & Finance, 4, 01-10.
     [22]      TOL, R. S., PACALA, S. W. & SOCOLOW, R. 2006. Understanding long-term energy use and carbon dioxide emissions in the USA.
     [23]      WAGSTAFF, K., CARDIE, C., ROGERS, S. & SCHRÖDL, S. Constrained k-means clustering with background knowledge.  Icml, 2001. 577-584.
     [24]      WEN, L. & CAO, Y. 2019. Influencing factors analysis and forecasting of residential energy-related CO2 emissions utilizing optimized support vector machine. Journal of Cleaner Production, 119492.
     [25]      YU, Y. & DU, Y. 2019. Impact of technological innovation on CO2 emissions and emissions trend prediction on ‘New Normal’economy in China. Atmospheric Pollution Research, 10, 152-161.
     [26]      ZHENG, X., LU, Y., YUAN, J., BANINLA, Y., ZHANG, S., STENSETH, N. C., HESSEN, D. O., TIAN, H., OBERSTEINER, M. & CHEN, D. 2020. Drivers of change in China’s energy-related CO2 emissions. Proceedings of the National Academy of Sciences, 117, 29-36.