Introduction
The globe is being inundated with data at a rate of 7 ZB per year, primarily from ‘Internet of Things (IoT)’ devices (Jamali, Bahrami, Heidari, Allahverdizadeh, & Norouzi, Reference Jamali, Bahrami, Heidari, Allahverdizadeh and Norouzi2020). These data are dispersed across numerous devices, making it impossible to extract any meaningful relationships from them; conventional storage and processors are incapable of keeping up with this incredible velocity. Companies that are best equipped to make real-time business choices utilizing big-data solutions are expected to prosper. In contrast, those unable to adapt and exploit this change will progressively find themselves at a competitive disadvantage in the market and may collapse (Waga, Reference Waga2013). In government, industry, and research, the demand to evaluate enormous quantities of data is growing. Data analysis is now considered the fourth paradigm in research (Hey, Tansley, & Tolle, Reference Hey, Tansley and Tolle2009; Wang & Li, Reference Wang and Li2021). Users require tools to quickly and simply examine these data (Wang et al., Reference Wang, Baker, Balazinska, Halperin, Haynes, Howe and Mehta2017). Cloud computing has been widely adopted in the information technology (IT) sector, thanks to fully advanced cloud-computing business models, middleware technologies, and well-cultivated ecosystems (Wang, Ma, Yan, Chang, and Zomaya, Reference Wang, Ma, Yan, Chang and Zomaya2018). Cloud computing is a platform that allows end-users worldwide to access a shared pool of resources on-demand over the internet. Physical servers in huge geo-distributed data centers host such shared pools of resources (Chaudhary, Aujla, Kumar, & Rodrigues, Reference Chaudhary, Aujla, Kumar and Rodrigues2018). Also, an impressive way to mitigate the overhead of computation is to offload the computing tasks to powerful devices, cloud, edge, or fog (Heidari, Jabraeil Jamali, Jafari Navimipour, & Akbarpour, Reference Heidari, Jabraeil Jamali, Jafari Navimipour and Akbarpour2020; Song, Cui, Li, Qiu, & Buyya, Reference Song, Cui, Li, Qiu and Buyya2014).
Research motivation
The use of cloud technology for storing, processing, and analyzing big data is increasing. However, it also has some problems and challenges. Several studies have been done in this area. To motivate this study, some studies on the subject and their results are reviewed to find out the weaknesses of previous articles. Our goal is to fill in the gaps of previous articles. Table 1 provides information about these studies.
As reviewed, several reviews are conducted about this area. However, as Table 2 shows, no systematic reviews are provided. We intend to manage big modern data in an application group in a systematic review.
Cloud computing is an IT infrastructure that divides computing resources into service tiers and delivers them on demand. The service mode level is where the innovation is most visible, and the commercial value is achieved through fundamental operating features, including application hosting, resource leasing, and service outsourcing (Heidari & Navimipour, Reference Heidari and Navimipour2021; Sun, Reference Sun2021). Increased communication, efficiency, and resource management are advantages of cloud computing, while big-data management (BDM) can fundamentally simplify internal and external connections. Adaptability, competitiveness, cost savings, and increased efficiency and profitability are the critical financial consequences of cloud computing. The most significant element in growing competitiveness in the creative industries is improved innovative capabilities. This research will look into the impact of cloud computing on BDM in businesses.
It serves as a benchmark for how IT management may affect the trajectory of the interaction between cloud computing and BDM deployment in the digital creative industries' innovative capabilities. It also adds to the body of information in the literature on performance enhancement in the digital creative sectors, which may be cited by data management and cloud-computing deployment innovation and management capabilities.
Therefore, in this systematic review, the following questions will be answered:
(1) In what areas can cloud technology for modern management be used? This question will be answered in Sections ‘Research methodology and data statistics’ and ‘BDM in cloud.’
(2) What are the problems with using the cloud to manage big data? The answer to this question is in Sections ‘BDM in cloud’ and ‘Results and discussion.’
(3) What can researchers do to improve the use of cloud technology for BDM? This question will also be answered in Sections ‘Results and discussion’ and ‘Challenges.’
The remaining of the article is organized as follows. The second section ‘Background’ presents study background. The third section ‘Research methodology and data statistics’ describes the research method. The fourth section ‘BDM in cloud’ presents an overview of articles related to selective grouping. The fifth section ‘Results and discussion’ outlines the results and discussion, and the sixth section ‘Challenges’ restates some of the problems and challenges in BDM. The seventh section ‘Future directions’ is a guide to the future work of researchers. Eventually, the conclusions of the study are presented in the last section.
Background
Electronic gadgets and any computer-based (distributed) service are becoming progressively embedded in people's daily lives. Hence, in order to grow their incomings or enhance their services, businesses must analyze massive volumes of data (Amato & Moscato, Reference Amato and Moscato2016). Parallel database servers and cloud server technologies are two ways of managing a large quantity of data. Parallel database servers have been a huge success in both academics and industry since the early 1990s. Thanks to them, several apps that deal with huge amounts of data have fulfilled their performance and resource accessibility goals. Nevertheless, using a parallel database server is costly for a business. Furthermore, it necessitates acquiring a costly server and the availability of high-level talents within the firm to manage databases and servers (Hameurlain & Morvan, Reference Hameurlain and Morvan2015). Because of the services it provides, cloud computing might be utilized as a foundation technology for a variety of technologies. Cloud computing is a novel generation of services aimed at providing access to information, apps, and data from any location at any time. Besides, Stergiou, Psannis, and Gupta (Reference Stergiou, Psannis and Gupta2020) introduced and detailed a new cloud-based system structure that relies on a unique federated learning scenario known as the integrated federated model – InFeMo. All cloud models with a federated learning scenario and additional technologies that might have been used in tandem were included in their model.
Big data and cloud computing have a close relationship. Big data in the cloud is a next-generation data-intensive platform that aims to provide rapid analytics across a flexible and scalable architecture. Cloud computing is a large computing capacity and infrastructure that allows storing and processing large data volumes, often known as big data. Besides, the emergence of big data has accelerated the growth of cloud computing. The cloud's distributed storage feature aids in the management of big data, while the parallel processing feature aids in collecting and analyzing large data (Agarwal & Srivastava, Reference Agarwal and Srivastava2019).
Disk failures frequently cause outages in cloud-based services. Most of these failures are caused by electro-mechanical issues, which are nearly always visible in data utilized to monitor hard drives. The present procedures are reactive, which has an impact on the customer experience. Published work in disk failure prediction models is either outdated or barely 50–60% accurate (Pinheiro, Weber, & Barroso, Reference Pinheiro, Weber and Barroso2007). Because the hard disk drives deployed in cloud systems are already tens of millions, proactively detecting problems and taking corrective action can provide considerable advantages (Ganguly, Consul, Khan, Bussone, Richards, & Miguel, Reference Ganguly, Consul, Khan, Bussone, Richards and Miguel2016). In cloud computing, the existence of duplicated data is a critical issue. Data duplication is defined as the storage of the same data multiple times. Storage space is wasted when duplicated data are stored. Even though the cloud has a huge memory, duplicated information causes the large memory to be wasted, making data processing more difficult. As a result, deduplication has become more important in cloud data processing. Deduplication seeks to reduce storage costs. The cloud will become more profitable as a result of these savings. Deduplication is a key challenge when it comes to governing encoded info (Aslam & Swaraj, Reference Aslam and Swaraj2019).
Research methodology and data statistics
Using a systematic literature review technique to discover, choose, and analyze the particular field of study has recently received much attention (Esmailiyan, Amerizadeh, Vahdat, Ghodsi, Doewes, & Sundram, Reference Esmailiyan, Amerizadeh, Vahdat, Ghodsi, Doewes and Sundram2021; Vahdat & Shahidi, Reference Vahdat and Shahidi2020). The systematic literature review approach is being used to perform a survey because:
(1) comprehensively available in selected fields,
(2) reviewing relevant research, and
(3) prevent accidental or intentional omission of important research work to achieve the desired result.
It leads to the elimination of a set of studies that sufficiently represent the field of research. Nowadays, investigators utilize a variety of systematic literature review techniques (Petersen, Vakkalanka, & Kuzniarz, Reference Petersen, Vakkalanka and Kuzniarz2015; Vahdat, Reference Vahdat2021). Figure 1 illustrates the systematic literature review method used in this paper. We first extracted the desired articles using keywords through this method. Then, by reviewing the titles and abstracts of the articles, we removed unused articles such as review articles, irrelevant articles, and duplicate articles. Finally, 23 articles were analyzed. The list of these articles is given in Table 3.
In this article, we used keywords such as ‘big data management,’ ‘big data management AND cloud,’ ‘big data management AND cloud AND organization,’ ‘big data management AND cloud AND education,’ ‘big data management AND cloud AND healthcare,’ ‘big data management AND cloud AND business,’ ‘big data management AND cloud AND smart city,’ etc.
We systematically reviewed our literature in databases such as the Scopus, Springer Online Journal Collection, Google Scholar, ACM Digital Library, IEEEXplore, WoS, and ScienceDirect. Figure 2 illustrates the articles obtained from these databases in the last 15 years according to the publication year.
According to Figure 2, researchers' attention has increased to the use of cloud technology in managing data from different applications such as organizations and offices in recent years. It is good to know what is the contribution of famous publications in these published articles. Figure 3 shows the contribution of each popular publication.
According to Figure 3, it is concluded that the articles we are interested in have not necessarily been published in well-known publications. However, other publications have a larger contribution to this research. Therefore, regardless of the publication, we will examine any research if relevant to our study's subject. By reviewing 220 articles found and reviewing their titles and abstracts, the desired grouping was selected as shown in Table 3.
BDM in cloud
Big data is created in different organizations and departments and should be stored, processed, and analyzed. Certainly, the use of cloud technology will be helpful to improve the management of these data. Studies on some applications are considered in this section. Therefore, articles will be studied in the grouping as shown in Figure 4.
Smart city
Big-data and cloud-computing analytics are critical components of smart city construction (Zhuang, Zhu, Huang, & Pan, Reference Zhuang, Zhu, Huang and Pan2021). They may help communities become more dependable, safe, healthy, and informed while also creating massive data for the public and commercial sectors. Because smart cities create massive volumes of streaming data from sensors and other devices, preserving and analyzing this massive real-time data generally necessitates a substantial amount of computer power. The majority of smart city solutions combine basic technologies such as computers, databases, storage, and data warehouses with modern technologies such as big-data analytics, artificial intelligence, real-time streaming data, machine learning (ML), and the IoT (Maroli, Narwane, & Gardas, Reference Maroli, Narwane and Gardas2021; Suresh et al., Reference Suresh, Keerthika, Sathiyamoorthi, Logeswaran, Sentamilselvan, Sangeetha and Sagana2021).
In this section, 23 articles will be analyzed, six of which will be related to the smart city.
Sinaeepourfard, Krogstie, and Petersen (Reference Sinaeepourfard, Krogstie and Petersen2018) created a hierarchical distributed data management infrastructure for a zero-emission community center in Norway. In the beginning, they described (from creation to consumption) the hierarchical distributed architecture capable of organizing the whole data life cycle levels. Afterward, they demonstrated that each cross-tier of the infrastructure (from IoT devices to cloud technologies) could handle various types of acquired data (containing recent, real time, and historical data). They described that fog-to-cloud data management (from distributed to centralized) has a great possibility to handle all data life stages (from creation to conception) concerning the data life cycle concepts. Also, they contributed to different smart city scenarios to demonstrate their proposed big-data architecture for smart cities. Also, in Gupta and Godavarti (Reference Gupta and Godavarti2020), IoT data management utilized cloud and big-data technology to build a system that can manage the vast and rapidly expanding amount of data created by IoT devices. Its goal was to provide a more secure, scalable, fault-tolerant, and cost-effective environment for analyzing large data using cloud-computing services. A paradigm is presented in the suggested technique to effectively manage data supplied by IoT devices through Rest Application Programming Interfaces (APIs). The outcomes are given to show how the Rest API works throughout all nodes in a cluster using Javascript Object Notation (JSON) requests. The model was fed a request with a matching JSON payload. The transactions were added to the registered nodes with no need to add the payload again. A fresh batch was produced with all of the devices' readings. While retrieving the findings, the contents of the complete batch and all systems were retrieved, indicating the efficacy of the planned work.
Baek, Vu, Liu, Huang, and Xiang (Reference Baek, Vu, Liu, Huang and Xiang2014) unveiled the Smart-Frame, a generic framework for managing large data sets in smart grids using cloud-computing technology. Their fundamental concept was to create three hierarchical layers of cloud-computing centers to handle information: top, regional, and end-user. The top cloud level provided a worldwide perspective of the architecture, while each regional cloud center was responsible for processing and maintaining regional data. Besides, they proposed identity-based cryptography and identity-based proxy re-encryption-based solution. Hence, not only does their suggested framework have scalability and flexibility, but it also has security characteristics. They created a proof-of-concept for our framework using a basic identity-based data confidentiality management system. Additionally, Kaseb, Mohan, and Lu (Reference Kaseb, Mohan and Lu2015) demonstrated a system that employed the suggested resource manager to analyze large data from worldwide network cameras for video and image analysis. Investigations confirmed that using a resource manager can result in a cost reduction of 13%. Four analytic programs were employed throughout the studies, each representing a distinct workload regarding CPU and memory. In addition, the tests revealed that certain cloud instances were more cost-effective for various analytic procedures. Using multiple analytic programs at varying frame speeds, one study evaluated data streams from 1026 cameras concurrently for 6 hr. The study looked at 5.5 million pictures, totaling 260 GB of data. Besides, Park, Kim, Jeong, and Lee (Reference Park, Kim, Jeong and Lee2014) developed and tested the two-phase group categorization in a range of mobile device distributions. Previous investigations that used arbitrary cut-off thresholds were ineffective in mobile cloud systems, which had a high level of instability. The recommended approach created a two-phase grouping by merging groups from entropy-based grouping with displaying group similarity. Even when the distribution of mobile devices varies, the algorithm correctly produces two-phase groups, according to the testing outcomes. When it came to sustaining reliable massive data processing and managing dependable resources, their algorithm beat standard grouping approaches.
Munir, Wei, Ullah, Hussain, Arshid, and Tariq (Reference Munir, Wei, Ullah, Hussain, Arshid and Tariq2020) described a cloud-computing-based smart grid system that incorporates a big-data strategy. Data source, storage/processing, transmission, and analysis were the four levels that make up the architecture. A case study was created using a data set from three cities in the Pakistani region and two cloud-based data centers. High load (on data centers) and network latency, according to the research, may impair overall efficiency by generating a reaction time delay. They argued that having a local data center might help minimize data load and network delay. For both customers and service suppliers, the provided paradigm may be useful in achieving sustainability, reliability, and cost-effectiveness in the power grid.
To conclude and summarize the articles related to the smart city, Table 4 provides some details and features of these studies.
Healthcare
The world's population is growing, expecting more effective treatments and a higher overall quality of life. It is putting more strain on healthcare (Simpson, Farr-Wharton, & Reddy, Reference Simpson, Farr-Wharton and Reddy2020). As a result, healthcare continues to be one of the world's most pressing social and economic issues, requiring newer and more developed solutions from technology and science (Aceto, Persico, & Pescapé, Reference Aceto, Persico and Pescapé2020; Chiuchisan, Costin, & Geman, Reference Chiuchisan, Costin and Geman2014; Omanović-Mikličanin, Maksimović, & Vujović, Reference Omanović-Mikličanin, Maksimović and Vujović2015). The following research provides a solution to these challenges. Five articles related to this topic are as follows.
In Thanigaivasan, Narayanan, Iyengar, and Ch (Reference Thanigaivasan, Narayanan, Iyengar and Ch2018), the heart disease data set was used for analysis. The data set was used in several tests to assess the performance of classification algorithms, and support vector machine (SVM) was shown to outperform the others. In the case of huge data, SVM was discovered to have a long processing time. Thus, the large-scale data set was classified using parallel SVM-based categorization. The parallel SVM substantially decreased the processing time, properly classifying the data. Besides, Celesti, Fazio, Romano, and Villari (Reference Celesti, Fazio, Romano and Villari2016) spoke about an open archival information system -based hospital information system that can manage large amounts of data in a cloud-computing environment. They explored two alternative executions of archival storage sub-components based on MySQL and MongoDB, respectively, regarding the health level 7 v3 standard. Studies demonstrated that MongoDB was an excellent candidate for implementing an archival storage sub-component capable of handling large amounts of data. In reality, while SQL is the most widely used technology for archival storage in hospital information systems worldwide, it cannot meet the new difficulties posed by cloud-based hospital information systems and big health data. In comparison with MySQL, MongoDB makes it easier to retain health level 7 documents with minimum processing work.
Sreekanth, Rao, and Nanduri (Reference Sreekanth, Rao and Nanduri2015) looked at how MongoDB may handle and analyze large data in electronic health records systems on the cloud. Afterward, they explored creating an electronic health records system using MongoDB, an NoSQL database. Because electronic health records are projected to grow in popularity, a system based on NoSQL is essential. Document-based JSON files can be used to create electronic healthcare-records systems. Systems based on NoSQL outperform SQL-based systems. Additionally, Shan, Chao, Zhang, and Tian (Reference Shan, Chao, Zhang and Tian2017) discussed the meanings of big data and cloud computing and the state of health management studies in the country and overseas. It also explained the data methodology and essential technologies before going over the monitoring data transfer procedures. It also highlighted a novel pattern that employs a cloud-based warning data platform as a carrier to provide all types of early warning services to hospitals, communities, families, and other subscribers in the health management system. Furthermore, Das et al. (Reference Das, Adhikary, Razzaque, Alrubaian, Hassan, Uddin and Song2017) created a global and local cloud confederation architecture, dubbed FnF, for performing heterogeneous large healthcare data processing demands from consumers. FnF uses fuzzy logic to make an appropriate selection decision for target cloud data center(s). In choosing a federated data center(s), the FnF trades off between user application Quality of Service (QoS) and cloud provider profit. Furthermore, FnF improves its decision accuracy by utilizing multiple linear regression to properly estimate the resource needs for massive data processing tasks. Numerical and empirical assessments were used to validate the suggested FnF model. In comparison with modern techniques, simulation outcomes demonstrated the efficacy and efficiency of the FnF model.
Everything obtained in this section is summarized in Table 5. Some features of the articles are listed in this table.
Accounting
In the big-data sector, cloud computing and large accounting data are combined to produce a cloud-accounting application framework that emphasizes spatial accessibility, security, distribution, and changing the accounting data condition. Confronted with a tidal wave of economic expansion, administrative agencies will begin to use cloud accounting, which will show considerable promise in these sectors (Li, Reference Li2021; Nosratabadi, Mosavi, Shamshirband, Kazimieras Zavadskas, Rakotonirainy, & Chau, Reference Nosratabadi, Mosavi, Shamshirband, Kazimieras Zavadskas, Rakotonirainy and Chau2019). The following four articles are related to this section and explain some challenges and benefits of this data management in the cloud.
The growth of agricultural firms is inextricably linked to the growth of the local agricultural sector. Nonetheless, the utilization of cloud accounting in agricultural businesses is restricted, and its application in comprehensive budget management is constrained, hindering agricultural businesses and the economy from docking efficiently (Yan & Nanyun, Reference Yan and Nanyun2020). So, they thought that agricultural businesses might benefit from the benefits of big-data and cloud-accounting platforms by developing a more information-based comprehensive budget management system, which would help them strengthen their core competitiveness. Besides, Li (Reference Li2019) discussed the importance of cloud computing and big data in management accounting and the possibilities and difficulties that management accounting education faces in the big-data era. Accordingly, they discussed how to incorporate management accounting and cloud-accounting systems efficiently, based on their extensive teaching experience, in order to support the fast growth of management accounting education. Also, Zuo (Reference Zuo2017) discussed the impacts cloud accounting and big data have on an enterprise's overall budget management. The system then develops a framework for the company's complete budget management system, which optimizes budget enforcement, budget modification, budgeting, and budget evaluation operations, leading to rational resource allocation. Big data provides more extensive and accurate data assistance with new opportunities and directions for comprehensive budget management. He illustrated the impact of big data on comprehensive budget management and proposed a systematic framework for setting budgeting, strategic goals, enforcing budgets, and evaluating budgets to attain an appropriate allocation of company resources.
Yang (Reference Yang2018) put forward the methods to solve the dilemma of data standards from the three principles of standard data formulation, formulation ideas, and specific recommendations. From the seven aspects of technical means and management methods, he put forward the idea of solving security dilemmas. Therefore, enterprises should strengthen the application of cloud-accounting technology to meet enterprise development needs under the era of big data and promote better development of enterprises. The results of the analyzed articles in this section are summarized in a table. Table 6 shows these details better.
Education
Learners' learning has shifted from a single conventional instruction style to a composite learning model of classroom teaching and network learning (Anshari, Alas, & Guan, Reference Anshari, Alas and Guan2016). Due to the growth of network technology, their learning time has grown more diversified and dispersed. The conventional teaching approach has failed to fulfill learners' various learning demands. In light of this, online education based on an online training platform with features such as autonomy, customization, and interaction has emerged as a necessary component of modern training (Wang & Zhao, Reference Wang and Zhao2021). Four articles in this grouping will be analyzed as follows.
Jain (Reference Jain2020) offered cloud data security techniques and strategies to assure protection by reducing risks and hazards to a minimum. They addressed offering data security, network security, and privacy-preserving for cloud-computing security concerns. The suggested technique allows academic institutions to safely and efficiently retain data in the cloud. It proposed encryption and compression-based solution to the challenge of massive data security concerns. The outcomes of the experiments revealed that the suggested approaches outperform other systems in terms of efficiency and accuracy. Besides, Jianhua Chen and Dou (Reference Chen and Dou2020) concentrated on studying university education and teaching management's informatization approach in the big-data and cloud-computing era. The cloud computing and big-data age first looked at the current status of university education and teaching management informatization. Afterward, it analyzed and constructed a set of info management systems using an implemented parameter setting and collaborative filtering algorithm. Eventually, it described and addressed it from various perspectives. In the cloud computing and big data, the executing measures of university education and teaching management informatization sought to provide reference materials for related previous studies.
Zhang, Fang, Yin, and Yu (Reference Zhang, Fang, Yin and Yu2018) created a university P.E. cloud platform management system based on a big-data analysis methodology and blockchain technology. It had a positive influence on the current state of university P.E. and the quality of teaching. The management system integrated traditional sports health data analysis, education management, and big-data analysis for the first time. It sought to apply new blockchain technology to increase data security, reliability, and reuse (Dehghani et al., Reference Dehghani, Ghiasi, Niknam, Kavousi-Fard, Shasadeghi, Ghadimi and Taghizadeh-Hesary2020, Reference Dehghani, Ghiasi, Niknam, Kavousi-Fard, Shasadeghi, Ghadimi and Taghizadeh-Hesary2021). Additionally, Xiaona (Reference Xiaona2021) looked at the relevance of informatization in education and teaching management in the big-data and cloud-computing era and offered strategies to build info in education and teaching management to aid the pertinent. To summarize, the advancement of educational and instructional management information can greatly improve pupil efficiency, effectiveness, and inner capability. To accomplish information management, it is important to include the notion of innovative instructional techniques and models and an innovative training system, environment, and philosophy into management. Universities and colleges should set a clear goal in line with their own advancement education teaching information, create a perfect information management system, continuously improve the education teaching management informatization level, establish scientific solutions, and instill high-quality talents for society in the big-data and cloud-computing era.
In this section, a table is created that describes the characteristics of the analyzed articles. Table 7 contains this information.
Business
In small and medium businesses, cloud-based architecture adds a whole novel dimension to data and insight sharing (Kars-Unluoglu & Kevill, Reference Kars-Unluoglu and Kevill2021; Xiang, Zhang, & Worthington, Reference Xiang, Zhang and Worthington2018). Small and medium businesses may not have the resources or desire to run their own big-data architecture (Lan & Unhelkar, Reference Lan and Unhelkar2015). The cloud platform will help manage the data they generate. The following articles illustrate this point (Chen, Gao, & Ma, Reference Chen, Gao and Ma2021a; Chen & Sivakumar, Reference Chen and Sivakumar2021).
Ionescu and Andronie (Reference Ionescu and Andronie2021) aimed to explain and illustrate the difficulties regarding financial consequences resulting from BDM and cloud-computing's effect in the digital world. They employed a combination of qualitative and quantitative investigation to identify the benefits of using BDM with a direct favorable influence on corporate performance. Their research looked into the financial implications of cloud-computing and digital solutions for businesses in the digital era and the impact of cloud technology usage on business growth. There are several benefits to integrating cloud computing and big data, but the most significant is increasing company efficiency and improving the global economy. Additionally, Terrazas, Ferry, and Ratchev (Reference Terrazas, Ferry and Ratchev2019) demonstrated a new big-data strategy and analytics architecture for the cloud administration and analysis of machine-produced data. It combined open source technology with the use of elastic computing to create a system that can be modified to and deployed on a variety of cloud-computing platforms. The outcome is a distributable, versatile, and scalable solution that allows for easy incorporation of technologies that can adapt to various manufacturing settings and cloud-computing suppliers. It allowed for lower easier deployment, infrastructure costs, and on-demand accessibility to a nearly limitless pool of storage, computing, and network resources.
Huang, Guo, Xie, and Meng (Reference Huang, Guo, Xie and Meng2015) merged e-commerce with conventional business models utilizing network technology, database technology, cloud-computing technology, and marketing management technology to create an incorporated cloud services platform for advanced livestock marketing management to meet the actual necessities of contemporary livestock marketing management. The platform combines e-commerce and conventional business models to supply outsourcing services for livestock enterprises, such as customer relationship management, e-commerce, inventory management, and more. It assists livestock enterprises in selling products and enhancing production management levels by incorporating e-commerce and conventional business models. Promoting traditional to contemporary transformations, improving management levels, increasing competitiveness, and promoting economic advantages benefit the livestock sector. Furthermore, Wang and Zhao (Reference Wang and Zhao2016) provided experimental research on leveraging big data in cloud computing to optimize business processes. The study focused on a large-scale Chinese private company that aspires to be a worldwide player in the manufacturing business. The completed investigation was based on real data obtained from the collaborating partner. The fundamental outcomes of their study were as follows: the attempts to use big data differed according to the operating levels; adopting cloud-computing solutions for the Chinese private sector was exploratory due to some constraints. The outcomes revealed the current cloud computing and big-data deployments in Chinese private enterprises.
Four articles are reviewed in this section; the most important points of this study are summarized in Table 8.
Results and discussion
In the previous section, 23 articles were studied in smart city, education, health, and business. The results of the studies were expressed in tables. According to articles in the previous section, managing a large amount of big data, which may involve data selection, monitoring, deployment, and analysis, is unquestionably difficult (Wang, Wang, & Li, Reference Wang, Wang and Li2021). More crucially, real-time data processing is generally necessary with the smart grid. Any delay in the system might have significant consequences, which must be prevented as much as feasible (Baek et al., Reference Baek, Vu, Liu, Huang and Xiang2014).
Some frameworks, databases, and other research information were also extracted. Some have used regression or ML or fuzzy logic to manage big data (Liu, Zhang, & Lu, Reference Liu, Zhang and Lu2020; Zhong, Fang, Liu, Yuan, Zhang, & Lu, Reference Zhong, Fang, Liu, Yuan, Zhang and Lu2021). In addition to the proposed framework, the MapReduce and Hadoop frameworks are often used. MapReduce is a distributed computing system for dealing with huge unorganized data collections. To put it another way, MapReduce splits input files into pieces and processes them in stages. Hadoop, an open-source version of MapReduce, was also presented. Massive data sets may be processed with Hadoop clusters. Large data sets may be processed utilizing the MapReduce architecture and ‘cloud’ resources. Cloud computing provided a wide range of applications and decreased IT expenses, resulting in a substantial increase in efficiency. Remote control or software virtualization is a basic and general description of cloud computing (Roshandeh, Poormirzaee, & Ansari, Reference Roshandeh, Poormirzaee and Ansari2014). The variety of big-data analytics platforms available in the cloud makes emergency decision-making difficult for solution architects, software developers, and infrastructure managers (Puthal, Nepal, Ranjan, & Chen, Reference Puthal, Nepal, Ranjan and Chen2016).
The MongoDB database is also used. Actually, similar to other NoSQL databases, MongoDB can flexibly hold unorganized data and quickly retrieve large amounts of data (Celesti et al., Reference Celesti, Fazio, Romano and Villari2016). Although SQL has numerous benefits, such as transaction security, NoSQL systems may be built for less than 10 times the cost of SQL systems (Sreekanth, Rao, & Nanduri, Reference Sreekanth, Rao and Nanduri2015). Table 9 illustrates the summary of the articles.
Challenges
Because cloud computing provides the platform, software, and infrastructure as a service (Ayala, Vega, & Vargas-Lombardo, Reference Ayala, Vega and Vargas-Lombardo2013; Shahid, Ashraf, Ghani, Ghayyur, Shamshirband, & Salwana, Reference Shahid, Ashraf, Ghani, Ghayyur, Shamshirband and Salwana2020) and hosts apps through computer resources, platforms, or the internet, it faces a number of problems. Compromises in service quality, security, privacy, virtualization, scalability, integrity, and data debugging problems are among the concerns (Cheng, Shojafar, Alazab, Tafazolli, & Liu, Reference Cheng, Shojafar, Alazab, Tafazolli and Liu2021; Zhang, Chen, & Susilo, Reference Zhang, Chen and Susilo2020). Never before have distributed storage and data management systems had to deal with problems such as data quantities and processing throughput related to the rise of big data to such a degree. Cloud storage systems are still in their infancy and are continuously evolving (Chen, Liu, Xiang, & Sood, Reference Chen, Liu, Xiang and Sood2021b). Until now, they have mostly concentrated on the requirements of commercial applications to deliver basic functionality dependably and securely (Shen, Zhang, Wang, Guo, & Susilo, Reference Shen, Zhang, Wang, Guo and Susilo2021). Implementing data-intensive applications in the cloud at scale necessitates addressing the following issues.
• In many situations, streaming data transfers might be unreliable. On a daily basis, data sources create petabytes to terabytes of data (Hu, Wen, Chua, & Li, Reference Hu, Wen, Chua and Li2014). Real-time computing has become a big issue due to the collected volume (Puthal et al., Reference Puthal, Nepal, Ranjan and Chen2016).
• Data staging is one of the most critical issues that must be addressed because data from sensors, mobile phones, and social networking sites are diverse. They lack any specific structure. In other words, sometimes the data accessible to analyze are unorganized data such as videos, text, and so on, necessitating extra work in cleaning and converting such data for processing, making the process sluggish and inefficient (Agarwal & Srivastava, Reference Agarwal and Srivastava2019).
• Although cloud computing brings ease to businesses and individuals due to its structural qualities, it also unavoidably brings security threats from the computer network environment, creating a danger to the security of archived information resources (Shamshirband, Fathi, Chronopoulos, Montieri, Palumbo, & Pescapè, Reference Shamshirband, Fathi, Chronopoulos, Montieri, Palumbo and Pescapè2020; Sun, Reference Sun2021).
• Manufacturing has become much easier because of the advent of sensor technologies that allow machines to communicate and gather data (Lee, Lapira, Bagheri, & Kao, Reference Lee, Lapira, Bagheri and Kao2013). As a result, the industrial information system has a huge problem figuring out utilizing and organizing massive data to help make better decisions (Li, Song, & Huang, Reference Li, Song and Huang2016).
• One of the present data management problems is to deliver a service with no data loss and minimal throughput latency. Nevertheless, even after activating and incorporating a cloud management system, servicing all data streams and transactions remains a challenge (Hussien & Sulaiman, Reference Hussien and Sulaiman2016).
Future directions
Although much research is done on the BDM of modern cloud systems, issues should be addressed. The following are important suggestions for the future:
• The urge to close the gap between data gathering and business action is growing. For instance, a shop could want to base next week's promotions on this week's information. It is desired for online shops to take action based on data even more rapidly. Available methods rely on log-based streaming, shipping, and other extracts, transform, and load approaches. However, this discipline is still in its early stages of growth (Chaudhuri, Reference Chaudhuri2012).
• Despite the fact that enforcing service level agreements (SLAs) is a difficult undertaking, numerous academics have worked to build systems that might ensure that various services' QoS needs are met. In cloud computing, many ways to SLA violation have been presented. Even though resource allocation management is utilized to select appropriate resources for provider profit, cloud client demands, and cloud-hosted big-data analytic applications, it has not been properly examined (Sahal, Khafagy, & Omara, Reference Sahal, Khafagy and Omara2016).
• Scholarly data are a massive data repository that is constantly updated and contains a wide range of data. Hence, it is sometimes referred to as ‘big scholarly data.’ The analysis and display of these data may be used to create various applications (Hu et al., Reference Hu, Wang, She, Zhang, Huang, Cui and Wang2021). Difficulties and limits occur at every level of the data analytics procedure, particularly about big scholarly data platforms. Specific elements of this platform are undergoing study, which must be combined in order to build a comprehensive system (Khan, Shakil, & Alam, Reference Khan, Shakil and Alam2016).
• As the popularity of cloud-computing settings grows, so do the safety concerns that arise from this technology's adaption. As a result, there is necessary to invest in comprehending the loopholes, problems, and components that are vulnerable to attacks in cloud computing and developing a platform and architecture that is less vulnerable to assaults (Jain, Reference Jain2020).
• Due to the abundance of wearable gadgets, smart sensors, smartphones, and other connected devices (Yi, Reference Yi2021), fog/IoT will become the most researched subject in the subsequent decade (Heidari et al., Reference Heidari, Jabraeil Jamali, Jafari Navimipour and Akbarpour2020). As a result, data processing applications will most likely be deployed in a distributed manner. Nevertheless, sending all of the data to cloud data centers for processing is inefficient. It might result in unnecessary network, transmission, or bandwidth overhead across the system and increased data center energy usage. Hence, energy-efficient software solutions that can handle and analyze data at the fog/edge level must be created to minimize energy usage and improve the performance of time-critical applications (Yang et al., Reference Yang, Ghadamyari, Khorramdel, Alizadeh, Pirouzi, Milani and Ghadimi2021). Additionally, multi-tiered resource management across the fog nodes, cloud data center, and mobile devices will aid in meeting the SLA need (Bagheri, Nurmanova, Abedinia, Naderi, Ghadimi, & Naderi, Reference Bagheri, Nurmanova, Abedinia, Naderi, Ghadimi and Naderi2018; Islam & Buyya, Reference Islam and Buyya2019).
• Because the entire data cannot be transferred or processed, new techniques for filtering big data for processing must be created. Analyzing various data types attracts a wide range of studies (Anuradha & Bhuvaneshwari, Reference Anuradha and Bhuvaneshwari2014).
• The creation of a benchmark suite aimed at determining the highest throughput through configuration optimization would be a promising future study topic (Ullah, Awan, & Sikander Hayat Khiyal, Reference Ullah, Awan and Sikander Hayat Khiyal2018).
Conclusion
Companies are confronting issues such as optimizing resource allocations, cost control, managing quick storage growth necessities, coping with dynamic concurrency requests, and the lack of underlying infrastructures that can dynamically allocate the needed computing and storage resources for big data. As a result, the greatest answer is for a company to adopt new technology. Cloud computing is one such domain that has a major influence on how large data are handled, deployed, and consumed. Modern management is built in this work employing cloud and big-data technology to produce a system capable of handling the vast and rapidly expanding diversity of data-produced devices. Papers were thoroughly reviewed in this study. Hence, adopting the data deduplication idea in the cloud has enabled users to reduce large data memory needs, lowering storage costs efficiently. Cloud computing is a prospective computing utility paradigm for delivering IT services to lower user costs. On the other hand, cloud computing is insecure. Attackers may penetrate the SaaS layer on cloud computing, exposing sensitive data and opening the door to a new form of hazardous assault. In addition, cloud-based big-data analytics has become a prominent study subject, posing new problems across the data processing life cycle, from data collection through integration and analytics to data security and privacy. These problems necessitate a novel system structure for data collecting, transmitting, storing, and large-scale data processing, replete with data privacy and security safeguards. Nevertheless, some progress has been made in this area. This paper attempts to create a more secure, scalable, fault-tolerant, and cost-effective environment for analyzing big data in companies using cloud-computing services. English sources are used in this study. There may be other valuable resources in other languages that are not listed here. In addition to the keywords we are looking for, there may also be other useful articles that are not selected through our selected keywords. In conducting this study, we have tried to use all sources without bias and justice. However, it is natural that some sources are inadvertently left out or, due to the diversity of research in this field, it is impossible to refer to them in this study. There may also be valuable resources other than English losses, which we ignore in this review.
Finally, the abbreviations used in the article are described in Table 10.
Acknowledgements
The authors acknowledge Scientific research project of Jilin Provincial Department of Education (JJKH20210377SK).
Conflict of interest
The authors declare no conflict of interest.
Data availability statement
All data are reported in the paper.