Machine learning:

a bibliometric analysis

Authors

DOI:

https://doi.org/10.5585/2023.24056

Keywords:

machine learning, Big Data analysis, bibliometric analysis, prediction.

Abstract

Objective: Present an overview of scientific articles published in the last ten years on the topic of machine learning (ML), with an emphasis on predictive algorithms.

Method/approach: Bibliometric analysis, with support from the PRISMA protocol, to evaluate authors, universities and countries, regarding productivity, bibliographic citations and focuses on the topic, with a sample of 773 articles from the Scopus and Web of Science databases, from 2013 to May /2023.

Originality/value: There is an absence of studies in the literature that consolidate articles related to ML and Big Data. The research contributes to covering this gap, favoring the design of future actions and research.

Main results: The following were identified in the ML bibliometric corpus: most cited authors with the greatest number of publications, most productive countries and universities, journals with the greatest number of publications and citations, areas of knowledge with the greatest number of publications, and the most prestigious articles. In the ML themes and domains, the following were identified: main co-occurrences of keywords, emerging themes (grouped into five clusters), and word clouds by title and abstract. Studies on the impact of data acquisition and predictive analysis represent opportunities for future research.

Theoretical/methodological contributions: The PRISMA protocol enabled the identification and relevant quantitative and qualitative analyzes of articles, consolidating scientific knowledge on the topic.  

Social/managerial contributions: Ease of understanding the maturity of research on ML and Big Data by company managers and researchers, regarding the feasibility of investments to obtain competitive advantages with such technologies.

Downloads

Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 3
  • Captures
    • Readers: 8
  • Mentions
    • News Mentions: 2
see details

Author Biographies

Emerson Martins, CEETEPS – State Center for Technological Education Paula Souza / São Paulo (SP) - Brazil

Master in Management and Technology in Production Systems (CEETEPS) and Researcher at the IT Strategic Management Research Group (CEETEPS/CNPq)

Napoleao Verardi Galegale, CEETEPS – State Center for Technological Education Paula Souza / São Paulo (SP) – Brazil

PhD in Controllership and Accounting (FEA/USP), Master in Production Engineering (POLI/USP), Professor and Researcher at UPEP/CEETEPS and FEA/PUC-SP, leader of the IT Strategic Management Research Group (CEETEPS/CNPq ) and Business Consultant

References

Ahani A., Nilashi M., Ibrahim O., Sanzogni L., Weaven S., (2019) - Market segmentation and travel choice prediction in Spa hotels through TripAdvisors online reviews https://doi.org/10.1016/j.ijhm.2019.01.003

Ahmadi H., Arji G., Shahmoradi L., Safdari R., Nilashi M., Alizadeh M., (2019) - The application of internet of things in healthcare a systematic literature review and classification. https://doi.org/10.1007/s10209-018-0618-4

Ali M.A.M., Bashar A., Rabbani M.R., Abdulla Y., (2020) - Transforming Business Decision Making with Internet of Things IoT and Machine Learning ML. https://doi.org/10.1109/dasa51403.2020.9317174

Alonso-Betanzos A., Bolon-Canedo V., (2018) - Big-Data Analysis, Cluster Analysis, and Machine-Learning Approaches. https://doi.org/10.1007/978-3-319-77932-4_37

Antonopoulos I., Robu V., Couraud B., Et Al (2020) - Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review. https://doi.org/10.1016/j.rser.2020.109899

Athmaja S.; Hanumanthappa M., Kavitha V., (2017) - A Survey of Machine Learning Algorithms for Big Data Analytics. https://doi.org/10.1109/iciiecs.2017.8276028

Baryannis G., Validi S., Dani S., Antoniou G., (2019) - Supply chain risk management and artificial intelligence state of the art and future research directions. https://doi.org/10.1080/00207543.2018.1530476

Batistic S., Van D.L.P., (2019) - History Evolution and Future of Big Data and Analytics A Bibliometric Analysis of Its Relationship to Performance in Organizations. https://doi.org/10.1111/1467-8551.12340

Bhavnani S.P., Parakh K., Atreja A., Et Al (2017) - 2017 Roadmap for Innovation - ACC Health Policy Statement on Healthcare Transformation in the Era of Digital Health, Big Data and Precision Health. https://doi.org/10.1016/j.jacc.2017.10.018

Bilgic E., Cakir O., Kantardzic M., Duan Y., Cao G., (2021) - Retail analytics: store segmentation using Rule-Based Purchasing behavior analysis. https://doi.org/10.1080/09593969.2021.1915847

Böse J.-H., Flunkert V., Gasthaus J., Et Al (2017) - Probabilistic demand forecasting at scale. https://doi.org/10.14778/3137765.3137775

Bui T.D., Tsai F.M., Tseng M.L., Tan R.R., Yu K.D.S., Lim M.K., (2021) - Sustainable supply chain management towards disruption and organizational ambidexterity A data driven analysis. https://doi.org/10.1016/j.spc.2020.09.017

Calatayud A., Mangan J., Christopher M., (2019) - The self-thinking supply chain - Supply Chain Management - Emerald Group Holdings Ltd. - United Kingdom. https://doi.org/10.1108/SCM-03-2018-0136

Cerruela García G., Luque Ruiz I., Gómez-Nieto M., (2016) - State of the art trends and future of bluetooth low energy near field communication and visible light communication in the development of smart cities - Sensors (Switzerland) - MDPI AG – Spain. https://doi.org/10.3390/s16111968

Chandra S. E Verma S., (2021) - Big Data and Sustainable Consumption A Review and Research Agenda – Vision - Sage Publications India Pvt. Ltd – India. https://doi.org/10.1177/09722629211022520

Chang, P.C., Liu, C.H., And Fan, C.Y. (2009) - Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry. https://doi.org/10.1016/j.knosys.2009.02.005

Chen M., Mao S., Liu Y., (2014) - Big data: A survey - Mobile Networks and Applications. https://doi.org/10.1007/s11036-013-0489-0

Chen M., Hao Y.X., Hwang K., Wang L., Wang L., (2017) - Disease Prediction by Machine Learning Over Big Data From Healthcare Communities. https://doi.org/10.1109/access.2017.2694446

Choi T.-M., Wallace S.W., Wang Y., (2018) - Big Data Analytics in Operations Management. https://doi.org/10.1111/poms.12838

Dinov I.D., Heavner B., Tang M., et al (2016) - Predictive Big Data Analytics A Study of Parkinsons Disease Using Large Complex Heterogeneous Incongruent MultiSource and Incomplete Observations - Plos One - Public Library Science - United States. https://doi.org/10.1371/journal.pone.0157077

Duan Y., Edwards J.S., Dwivedi Y.K., (2019) - Artificial Intelligence for Decision Making In The Era Of Big Data Evolution Challenges And Research Agenda. https://doi.org/10.1016/j.ijinfomgt.2019.01.021

Dwivedi Y.K., Hughes L., Ismagilova E., et al (2021) - Artificial Intelligence AI Multidisciplinary perspectives on emerging challenges opportunities and agenda for research practice and policy. https://doi.org/10.1016/j.ijinfomgt.2019.08.002

George G., Osinga E., Lavie D., Scott B., (2016) - Big data and data science methods for management research. https://doi.org/10.5465/amj.2016.4005

Gill S. S., Tuli S., Xu M., et al, (2019) - Transformative effects of IoT Blockchain and Artificial Intelligence on cloud computing Evolution vision trends and open challenges. https://doi.org/10.1016/j.iot.2019.100118

Gupta N., Ahuja N., Malhotra S., Bala A., Kaur G., (2017) - Intelligent heart disease prediction in cloud environment through ensembling - Expert Systems – Wiley – India. https://doi.org/10.1111/exsy.12207

Hashimoto D.A., Rosman G., Rus D., Meireles O.R., (2018) - Artificial Intelligence in Surgery Promises and Perils - Annals of Surgery - Lippincott Williams & Wilkins - United States. http://dx.doi.org/10.1097/SLA.0000000000002693

Hassija V., Chamola V., Saxena V., Jain D., Goyal P., Sikdar B., (2019) - A Survey on IoT Security Application Areas Security Threats and Solution Architectures. https://doi.org/10.1109/access.2019.2924045

Hu H., Wen Y., Chua T-S., Li X., (2014) - Toward scalable systems for big data analytics A technology tutorial - IEEE Access - Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/access.2014.2332453

Kitchens B., Dobolyi D., Li J., Abbasi A., (2018) - Advanced Customer Analytics Strategic Value Through Integration of RelationshipOriented Big Data. https://doi.org/10.1080/07421222.2018.1451957

Kou G., Chao X., Peng Y., Alsaadi F.E., Herrera-Viedma E., (2019) - Machine learning methods for systemic risk analysis in financial sectors. https://doi.org/10.3846/tede.2019.8740

Kousis A. E Tjortjis C., (2021) - Data mining algorithms for smart cities A bibliometric analysis - Algorithms - MDPI AG – Greece. https://doi.org/10.3390/a14080242

Lichman, M. (2013) - UCI Machine Learning Repository. Disponível em: https://archive.ics.uci.edu/ml/datasets/wine

Johnson A.E.W., Ghassemi M.M., Nemati S., Niehaus K.E., Clifton D.A., Clifford G.D., (2016) - Machine Learning and Decision Support in Critical Care. https://doi.org/10.1109/jproc.2015.2501978

Jordan, M.I. E Mitchell, T.M. (2015) - Machine learning: Trends perspectives and prospects. Science, 349:255–260. https://doi.org/10.1126/science.aaa8415

Ke J., Zheng H., Yang H., Chen X. (2017) - Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. https://doi.org/10.1016/j.trc.2017.10.016

Krawczyk B., (2016) - Learning from imbalanced data open challenges and future directions - Progress in Artificial Intelligence – Springernature – Poland. https://doi.org/10.1007/s13748-016-0094-0

L'heureux A., Grolinger K., Elyamany H.F., Capretz M.A.M., (2017) - Machine Learning with Big Data Challenges and Approaches - IEEE Access - Institute of Electrical and Electronics https://doi.org/10.1109/access.2017.2696365

Levy, Y.; Ellis, T.J. A system approach to conduct an effective literature review in support of information systems research. Informing Science Journal, v.9, p.181-212, 2006. https://doi.org/10.28945/479

Ma C., Zhang H.H., Wang X.F., (2014) - Machine learning for Big Data analytics in plants - Trends in Plant Science - Elsevier Science London – China. https://doi.org/10.1016/j.tplants.2014.08.004

Mishra D., Gunasekaran A., Papadopoulos T., Childe S.J., (2018) - Big Data and supply chain management a review and bibliometric analysis. https://doi.org/10.1007/s10479-016-2236-y

Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Stewart, L. A. (2015) - Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4(1). https://doi.org/10.1186/2046-4053-4-1

Moreira Mwl., Rodrigues Jjpc., Kumar N., Saleem K., Illin Iv, (2019) - Postpartum depression prediction through pregnancy data analysis for emotionaware smart systems updates. https://doi.org/10.1016/j.inffus.2018.07.001

Nguyen H.D., Tran K.P., Thomassey S., Hamad M., (2021) - Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. https://doi.org/10.1016/j.ijinfomgt.2020.102282

Nguyen T., Zhou L., Spiegler V., Ieromonachou P., Lin Y., (2018) - Big data analytics in supply chain management A stateoftheart literature review. https://doi.org/10.1016/j.cor.2017.07.004

Qian T.Q., Zhu S.J., Hoshida Y., (2019) - Use of big data in drug development for precision medicine an update. https://doi.org/10.1080/23808993.2019.1617632

Razavian N., Blecker S., Schmidt A.M., Smith-Mclallen A., Nigam S., Sontag D., (2015) -PopulationLevel Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors https://doi.org/10.1089/big.2015.0020

Sahoo S., (2021) - Big data analytics in manufacturing a bibliometric analysis of research in the field of business management. https://doi.org/10.1080/00207543.2021.1919333

Sharma, R., Kamble, S.S., Gunasekaran, A., Kumar, V., Kumar, A., (2020) - A systematic literature review on machine learning applications for sustainable agriculture supply chain performance - Computers & Operations Research - Pergamon-Elsevier Science Ltd – England. https://doi.org/10.1016/j.cor.2020.104926

Shokouhyar S., Shokoohyar S., Sobhani A., Gorizi A.J., (2021) - Shared mobility in post-COVID era: New challenges and opportunities - Sustainable Cities and Society - Elsevier Ltd https://doi.org/10.1016/j.scs.2021.102714

Silver, D., Huang, A. E Guez, A. (2016) - Mastering the game of go with deep neural networks and tree search - Nature, 529:484–489. https://doi.org/10.1038/nature16961

Silver, D., Schrittwieser, J., Simonyan, K. E Antonoglou, I. (2017) - Mastering the game of go without human knowledge - Nature, 550:354–359. https://doi.org/10.1038/nature24270

Raschka, S. E Mirjalili, V. (2017) - Python Machine Learning, 2nd Ed.- Packt Publishing, Birmingham, UK, 2 edition.

Trieu V.-H., (2017) - Getting value from Business Intelligence systems A review and research agenda - Decision Support Systems - Elsevier B.V. – Australia. https://doi.org/10.1016/j.dss.2016.09.019

Tzeng G.-H., Shen K.-Y., (2017) - New concepts and trends of hybrid multiple criteria decision making - ISBN 9780367573133

Wanasinghe T.R., Wroblewski L., Petersen B.K., et al (2020) - Digital Twin for the Oil and Gas Industry Overview Research Trends Opportunities and Challenges. https://doi.org/10.1109/access.2020.2998723

Wang D., Liu X., Wang, M., (2013) - A dt-svm strategy for stock futures prediction with big data - IEEE 16th International Conference on Computational Science and Engineering. https://doi.org/10.1109/cse.2013.147

Wang J.L., Zhao P.L., Hoi S.C.H., Jin R., (2014) - Online Feature Selection and Its Applications - IEEE Transactions on Knowledge and Data Engineering - IEEE Computer Soc - United States. https://doi.org/10.1109/tkde.2013.32

Wang W., Gao J.Y., Zhang M.H., et al (2018) - Rafiki Machine Learning as an Analytics Service System - Proceedings of The Vldb Endowment - Assoc Computing Machinery – China. https://doi.org/10.48550/arXiv.1804.06087

Wang Y., Chen Q., Hong T., Kang C., (2019) - Review of Smart Meter Data Analytics Applications Methodologies and Challenges. https://doi.org/10.1109/tsg.2018.2818167

Xu J., Huang E., Chen C.-H., Lee L.H., (2015) - Simulation optimization A review and exploration in the new era of cloud computing and big data. https://doi.org/10.1142/s0217595915500190

Published

06.10.2023

How to Cite

Martins, E., & Galegale, N. V. (2023). Machine learning: : a bibliometric analysis. International Journal of Innovation, 11(3), e24056. https://doi.org/10.5585/2023.24056

Issue

Section

Articles
Crossref
4
Scopus
0
Marilia Macorin de Azevedo, Napoleão Verardi Galegale, Robinson Patara Matthes (2024)
Green I.T and Datacenter: a Study of Environmental Management Indicators. Revista de Gestão Social e Ambiental, 18(1), e07740.
10.24857/rgsa.v18n1-179
Alexandre dos Santos Andrade, Marcelo Duduchi Feitosa, Marilia Macorin de Azevedo (2024)
From Software Management to Software Ecosystem Governance: A Bibliographic Review on Software Governance Practices Within Organizations and Their Evolution. Revista de Gestão Social e Ambiental, 18(11), e09727.
10.24857/rgsa.v18n11-088
Napoleão Verardi Galegale, Marilia Macorin de Azevedo, Alexandre Barcelos (2024)
Improving Productivity and Reducing Costs in an IT Service Provider: Applying the Lean Six Sigma Methodology. Revista de Gestão Social e Ambiental, 18(10), e09036.
10.24857/rgsa.v18n10-259
Cláudio Rodrigo Torres, Fabricio José Piacente, Francisco del Moral Hernandez, Antônio Cesar Galhardi, Alexandre Formigoni, Antônio Cláudio de Arruda (2024)
The Inability of Brazilian Industry to Plead For the Promotion of Technological Innovation as an Obstacle to The Process of Brazilian Neo-Industrialization. Revista de Gestão Social e Ambiental, 18(12), e010122.
10.24857/rgsa.v18n12-166
Views
  • Abstract 485
  • pdf (Português (Brasil)) 491
  • pdf 151