cyber security challenges for society literature review

TechRepublic

Cyber security: challenges for society-literature review.

Cyber security is the activity of protecting information and information systems (networks, computers, data bases, data centers and applications) with appropriate procedural and technological security measures. Firewalls, antivirus software, and other technological solutions for safeguarding personal data and computer networks are essential but not sufficient to ensure security. As the authors’ nation rapidly building its Cyber-Infrastructure, it is equally important that they educate their population to work properly with this infrastructure. Cyber-Ethics, Cyber-Safety, and Cyber-Security issues need to be integrated in the educational process beginning at an early age.

Subscribe to the Cybersecurity Insider Newsletter

Strengthen your organization's IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices. Delivered Tuesdays and Thursdays

Resource Details

Create a techrepublic account.

Get the web's best business technology news, tutorials, reviews, trends, and analysis—in your inbox. Let's start with the basics.

* - indicates required fields

Sign in to TechRepublic

Lost your password? Request a new password

Reset Password

Please enter your email adress. You will receive an email message with instructions on how to reset your password.

Check your email for a password reset link. If you didn't receive an email don't forgot to check your spam folder, otherwise contact support .

Welcome. Tell us a little bit about you.

This will help us provide you with customized content.

Want to receive more TechRepublic news?

You're all set.

Thanks for signing up! Keep an eye out for a confirmation email from our team. To ensure any newsletters you subscribed to hit your inbox, make sure to add [email protected] to your contacts list.

Cyber security: challenges for society- literature review

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

A systematic literature review of how cybersecurity-related behavior has been assessed

Information and Computer Security

ISSN : 2056-4961

Article publication date: 20 April 2023

Issue publication date: 30 October 2023

Cybersecurity attacks on critical infrastructures, businesses and nations are rising and have reached the interest of mainstream media and the public’s consciousness. Despite this increased awareness, humans are still considered the weakest link in the defense against an unknown attacker. Whatever the reason, naïve-, unintentional- or intentional behavior of a member of an organization, the result of an incident can have a considerable impact. A security policy with guidelines for best practices and rules should guide the behavior of the organization’s members. However, this is often not the case. This paper aims to provide answers to how cybersecurity-related behavior is assessed.

Design/methodology/approach

Research questions were formulated, and a systematic literature review (SLR) was performed by following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. The SLR initially identified 2,153 articles, and the paper reviews and reports on 26 articles.

The assessment of cybersecurity-related behavior can be classified into three components, namely, data collection, measurement scale and analysis. The findings show that subjective measurements from self-assessment questionnaires are the most frequently used method. Measurement scales are often composed based on existing literature and adapted by the researchers. Partial least square analysis is the most frequently used analysis technique. Even though useful insight and noteworthy findings regarding possible differences between manager and employee behavior have appeared in some publications, conclusive answers to whether such differences exist cannot be drawn.

Research limitations/implications

Research gaps have been identified, that indicate areas of interest for future work. These include the development and employment of methods for reducing subjectivity in the assessment of cybersecurity-related behavior.

Originality/value

To the best of the authors’ knowledge, this is the first SLR on how cybersecurity-related behavior can be assessed. The SLR analyzes relevant publications and identifies current practices as well as their shortcomings, and outlines gaps that future research may bridge.

Cybersecurity
Human behavior
Assessment process

Kannelønning, K. and Katsikas, S.K. (2023), "A systematic literature review of how cybersecurity-related behavior has been assessed", Information and Computer Security , Vol. 31 No. 4, pp. 463-477. https://doi.org/10.1108/ICS-08-2022-0139

Emerald Publishing Limited

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial & non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

The importance of information systems (IS) security has increased because the number of unwanted incidents continues to rise in the last decades. Several avenues or paths can be taken by organizations to secure their IS. Technical solutions like whitelisting, firewalls and antivirus software enhance security, but research has shown that when people within the organization do not follow policies and guidelines these technical safeguards will be in vain.

1.1 Aims of the paper

Of the 26 articles included in this review, 10 used some variations of the phrase humans are the weakest link in cybersecurity in either the abstract or introduction. All articles cite multiple authors, accumulating a significant number of previous works, all claiming the same statement. One might agree with Kruger et al. (2020) that it is common knowledge that humans are the weakest link in information security.

Given the premise that humans are the weakest link and the acknowledgment that technology cannot be the single solution for security ( McCormac et al. , 2017 ), research should investigate how organizations can assess the cybersecurity-related behavior of their employees. Identifying, evaluating and summarizing the methods and findings of all relevant literature resources addressing the issue, thereby systematizing the available knowledge and making it more accessible to researchers, while also identifying relevant research gaps, are the aims of this systematic literature review (SLR).

1.2 Background

Recent years have shown that cyberattacks are a global issue, such as the extensive power outage causing a blackout across Argentina and Uruguay in 2019 ( Kilskar, 2020 ). In January 2018, nearly 3 million, or roughly 50% of the Norwegian population’s medical records, were compromised by a cyberattack. Threats can vary from viruses, worms, trojan horses, denial of service, botnets, man-in-the-middle and zero-day ones ( Pirbhulal et al. , 2021 ). The above-listed threats include technical terms with a distinctive flair and uniqueness that is hard to comprehend for employees without a technical background. Moreover, most information security issues are complicated and fully understanding them requires advanced technical knowledge.

Definition of information security;

information security objectives or the framework for setting information security objectives;

principles to guide all activities relating to information security;

commitment to satisfy applicable requirements related to information security;

commitment to continual improvement of the information security management system;

assignment of responsibilities for information security management to defined roles; and

procedures for handling exemptions and exceptions. ( ISO, 2022 )

The extent to which an employee is aware of and complies with information security policy defines the extent of their information security awareness (ISA). ISA is critical in mitigating the risks associated with cybersecurity and is defined by two components, namely, understanding and compliance . Compliance is the employees’ commitment to follow best-practice rules defined by the organization ( Reeves et al. , 2020 ). Ajzen (1991) defines a person’s intention to comply as the individual’s motivation to perform a described behavior. The intention to comply captures the motivational factors that influence behavior. As a general rule, the stronger the effort, the willingness to perform a behavior, the more likely it will be performed.

Several frameworks or theories can be applied to research human behavior. For cybersecurity, behavior can be viewed through lenses and theories borrowed from disciplines such as criminology (e.g. deterrence theory), psychology (e.g. theory of planned behavior) and health psychology (e.g. protection motivation theory) ( Moody et al. , 2018 ; Herath and Rao, 2009 ). The most commonly used models in the context of cybersecurity are the general deterrence theory, the theory of planned behavior and the protection motivation theory ( Alassaf and Alkhalifah, 2021 ).

Staff’s attitude and awareness can pose a security problem. In those settings, it is relevant to consider why the situation exists and what can be done about it. In many cases, a key reason will be the limited extent to which security is understood, accepted and practiced across the organization ( Furnell and Thomson, 2009 ). As a mitigating step toward compliance, decision-makers will need guidance on achieving compliance and discouraging misuse when developing information security policies ( Sommestad et al. , 2014 ). Therefore, the ability to assess behavior is a prerequisite for decision-makers in their quest to develop the organizations’ information security policies. The development and responsibility for implementing policies lie within the purview of management ( Höne and Eloff, 2002 ). Accordingly, understanding the differences in cybersecurity-related behavior between management and employees will benefit the development of more secure organizations.

1.3 Structure of the paper

The rest of this paper is organized as follows: Section 2 describes the methodology for conducting the SLR; the research questions; the record search process; and the assessment criteria. In Section 3, the results and the findings are presented. A discussion of the findings is presented in Section 4. Section 5 summarizes our conclusions and outlines directions for future research.

This section discusses the fundamental stages of conducting an SLR. The SLR constructs are obtained by following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement ( Page et al. , 2021 ) and ( Fink, 2019 ; Weidt and Silva, 2016 ).

The foremost step is to investigate if a similar review has already been conducted. Searching for and studying other reviews help refine both research questions and search strings. The search did not discover any similar reviews. Keywords, search strings and research questions were collected and categorized in a literature index tool and used to optimize search strings and verify that this review’s chosen research questions are relevant and valuable to the body of knowledge.

A research review is explicit about the research questions, search strategy, inclusion and exclusion criteria, data extraction method and steps taken for analysis. Research reviews are, unlike subjective reviews, comprehensible and easily reproducible ( Fink, 2019 ). The remainder of this section elaborates on the components of the performed SLR.

2.1 Research questions

How is cybersecurity-related behavior assessed?

Are there differences between manager and employee behavior in a cybersecurity context?

2.2 Record searching process

Various search strings were used in this SLR, depending on the database. The keywords were kept unchanged, but the syntax of each database differs; hence, the search strings have minor differences. This study includes the following databases: Scopus, IEEE, Springer, Engineering Village, ScienceDirect and ACM. In some form of syntax, the keywords (exact and stemmed words) were used: Cyber, Security, Information, policy, compliance, measure, behavior. As an example, the following is the search used in Scopus: TITLE-ABS-KEY ((information AND security AND policy OR information AND security AND compliance OR policy AND compliance) AND (information AND security AND behavior)) AND PUBYEAR > 2001. To increase the precision of the searches, title, abstract and keywords were used as a limiter in all the databases.

2.3 Assessment criteria

studies from organization reports, guidelines, technical opinion reports;

research design – exclude reviews, editorials and testimonials, as using secondary data (data from other reviews, etc.) would make this review a tertiary one; and

nonresearch literature.

written in English;

published in 2001–2022;

original studies using theoretical or empirical data; and

studies published in Journals, Conference Proceedings and books/book sections.

2.4 Analysis of included articles

The result presented in this review is based on the abstraction of data from the articles. The descriptive synthesized results are based on the reviewers’ experience and the quality and content of the available literature ( Fink, 2019 ). All results are based on an abstraction of data except for those in Section 3.3.4, where the NVIVO software was used to uncover the most frequently used words from a compiled text of all analysis sections from each and every article in the review.

3.1 Identification, screening, eligibility and inclusion mechanism

This research returned 2,153 records. The first step before any analysis is to remove any duplicates. After removing duplicates, a total of 1,611 unique records remained. Following the recommendation from Weidt and Silva (2016) , the first analysis step is screening by title and abstract. A total of 1,517 records were found to be irrelevant for this review, leaving 94 articles for additional screening. The (optional) second screening, depending on the number of articles, involves an analysis of each article’s introduction and conclusion. For this study, an analysis of the method section was also included in the second screening step. This narrowed the number down to 28, where another 2 articles were excluded because of the lack of empirical data and irrelevance to the topic being reviewed, leaving the total number of 26 articles for complete text analysis. Figure 1 , adapted from Page et al. (2021) depicts the screening process.

3.2 Trend and classification of included studies

Of the 26 selected articles, 19 were published in journals, and the remaining 7 in conferences, or 73% and 27%, respectively (see Figure 2 ). The figure also demonstrates the increased interest in the subject in the past two years.

3.3 Findings

3.3.1 how is cybersecurity-related behavior assessed.

Of the selected 26 articles in this review, 24 or 92% provide insight into how cybersecurity-related behavior is assessed. A three-step process emerges as the way to assess such behavior: First, information from subjects needs to be collected. This is referred to as data collection . Second, a measurement scale is deployed to ensure that the data collected is relevant and encompasses the research topic. The final step is the data analysis.

3.3.2 Data collection.

Two forms of data can be collected, qualitative or quantitative. Both of these types of data can be subjective or objective; neither is exclusive to the other. The most common way to collect subjective data is using a questionnaire with questions whose answers fit into a five- or seven-point Likert scale. Within a survey, questions may be asked that are subjective, biased or misleading when viewed alone, but the results can easily be used quantitatively ( O'Brien, 1999 ). With the ubiquity of qualitative data, the interest in quantifying and being able to assign “good” numerical values and make the data susceptible to more meaningful analysis has been a topic for research since the first methods for quantification first began to appear around 1940 ( Young, 1981 ).

Subjective data can lead to inaccurate or skewed results. In contrast, objective data are free from the subject’s opinions. This can be, for example, the number of attacks prevented or the number of employees clicking the link in a phishing campaign ( Black et al. , 2008 ).

The SLR revealed six types of data collection methods, namely, self-assessment questionnaire (SAQ); interview; vignette; experiment with vignettes; affective computing and sentiment analysis; and clicking data from a phishing campaign. An overview of all articles and the data collection method used in each is presented in Table 1 .

The most prominent form of data collection is self-assessment (SA). This subjective data collection method is defined by Boekaerts (1991) as a form of appraisal that compares one’s behavioral outcomes to an internal or external standard. In total, 22 of the 24 articles used SA as the primary data collection method. The most common way to collect data is through a questionnaire (SAQ). A total of 17 or 71% of the articles used an SAQ as their sole method for data collection.

Of the remaining five articles with results stemming from subjective data, two used vignettes in combination with a regular SAQ. Vignettes are hypothetical scenarios in which the subject reads and forms an opinion based on the information. Barlow et al. (2013) performed a factorial survey method (FSM) experiment with vignettes by using randomly manipulated elements into sentences in the scenarios instead of static text. Both regular questionnaires and vignettes use the same Likert scale.

The average number of respondents in the included papers is n = 356, with 52% males and 48% females. The most common way to deploy the SAQ is through online Web platforms, e.g. a by-invitation-only webpage at a market research company. Pen and paper were only used twice. Market research companies and management distribution are the two most used recruitment strategies. The two methods are used in 73% of the papers, or 84% of the time, if articles that did not specify recruitment are excluded.

Two studies used interviews to collect information: one used interviews with an SAQ, and the other used interviews as the sole input. Interviews provide in-depth information and are suitable for uncovering the “how” and “why” of critical events as well as the insights reflecting the participants’ relativist perspectives ( Yin, 2018 ).

Only two studies used objective, quantitative data: Kruger et al. (2020) used affective computing and sentiment analysis. With the help of a deep learning neural network, the study accurately classified opinions as positive, neutral or negative based on facial expressions. Jalali et al. (2020) used a phishing campaign in conjunction with an SAQ to investigate whether there were any differences between intention to comply and actual compliance.

3.3.3 Measurement scale.

A measurement scale ensures that the collected data encompass a topic or subject and do not miss any crucial facets. The role of a measurement scale is to ensure that the data collected is holistic and reproducible. Researchers can use predefined scales developed by others or self-developed ones. Those of the reviewed articles that use the latter form of scale are often not fully transparent about the content of the scale.

This SLR shows that 13 of the 22 articles that used a measurement scale used an unspecified scale. The most frequently (in seven papers) used specified scale is the Human Aspect of Information Security Questionnaire (HAIS-Q), developed by Parsons et al. (2014) . When used in conjunction with other scales, HAIS-Q is often the most prominent.

Several pitfalls exist and must be considered when researchers select their measurement scale. If choosing to develop an unspecified scale, as found to be the most deployed alternative in this SLR, length, wording, familiarity with the topic, natural sequence of time and questions in a logical order are some of the topics that researchers should be mindful of ( Fink, 2015 ). Especially the length of the questionnaire is significant; how much time do the respondents have to spend answering the survey? Another critical element when designing a measurement scale instead of using an existing one is validity and reliability. Proper pilot testing is required when choosing not to use an already-validated survey ( Fink, 2015 ).

The HAIS-Q is designed to measure information security awareness related to information security in the workplace ( McCormac et al. , 2017 ). The Knowledge, Attitude and Behavior (KAB) model is at the center of HAIS-Q. The hypothesis is that when computer users gain more knowledge, their attitude toward policies will improve, translating into more risk-averse behavior ( Pollini et al. , 2021 ). The HAIS-Q comprises 63 questions covering 7 focus areas (internet use, email use, social networking site use, password management, incident reporting, information handling and mobile computing). Each focus area is divided into equal parts for KAB, resulting in 21 questions for each KAB element divided by the seven focus areas. For a detailed overview of the other scales used in conjunction with HAIS-Q, see the last column in Table 1 .

The KAB model that underpins HAIS-Q has been criticized by researchers when used in, e.g. health and climate research. Both Parsons et al. (2014) and McCormac et al. (2016) cite McGuire (1969) who suggest that the problem is not with the model itself but with how it is applied. Parsons et al. (2014) highlight essential differences between environmental and health studies and the field of information security. Much ambiguity and unclear or contradictory information exist in the two former topics, while most organizations have an information security policy, either written or informal, indicating what is expected from employees ( Parsons et al. , 2014 ). Barlow et al. (2013) advocate using scenarios instead of direct questions, like in HAIS-Q, because it is difficult to assess actual deviant behavior by observation or direct questioning.

Another critique of the HAIS-Q is the length of the questionnaire. With 63 questions, respondents might lose interest, be inattentive to the questions and sometimes give false answers ( Velki et al. , 2019 ). On the contrary, Parsons et al. (2017) show that the HAIS-Q questionnaire is a reliable and validated measurement scale and accommodates some of the concerns raised by Fink (2015) .

Pollini et al. (2021) advise that, when using one, the questionnaire only considers the individual level and may not capture a holistic and accurate measurement of the organizations. Therefore, in their study, HAIS-Q questionnaires were deployed at the individual level, and interviews were used to assess the organizational level.

3.3.4 Analysis.

To uncover how the included articles had analyzed their results, NVIVO, a qualitative data analysis software, was used to identify the most frequently used words in each article. An accumulative document from each article’s analysis section was analyzed in NVIVO. All articles use some sort of validation and statistical verification of the collected data. The use of word count provides both a structured presentation and an unbiased account of how often keywords affiliated with the technical part of the analysis are used. The result from NVIVO shows that partial least square (PLS) is the most frequently used method. Herman Wold first coined PLS in 1975; it can be preferable in cases where constructs are measured primarily by formative indicators, e.g. managerial research, or when the sample size is small ( Haenlein and Kaplan, 2004 ). This result is also in line with the finding in Kurowski (2019) : “Most of policy compliance research uses partial least squares, regression modeling or correlation analyses.”

3.3.4.1 Are there differences between manager and employee intention and behavior in a cybersecurity context?

Only five articles, or 19%, provide insight into the second research question. However, none provides a clear-cut response to this research question. There is a consensus in all five articles that organizational culture is a cornerstone for security and policy-compliant behavior ( Reeves et al. , 2020 ; Hwang et al. , 2017 ; Alzahrani, 2021 ; Parsons et al. , 2015 ; Li et al. , 2019 ).

Among the articles, there is also a broad agreement that peers’ behavior, the influence that peers have on our behavior, is vital for a positive cybersecurity outcome ( Li et al. , 2019 ; Alzahrani, 2021 ; Hwang et al. , 2017 ). Peer- and policy-compliant behavior can only be achieved when the organization has a positive cybersecurity culture. The development of organizational culture often comes from the top management; hence, the development and continued improvement of culture will be assigned to management ( Li et al. , 2019 ; Reeves et al. , 2020 ). One interesting finding in the context of developing or harnessing a security culture is that managers have a much lower information security awareness; Reeves et al. (2020) therefore recommend that future training should be targeted to management. This small paradox is at least something to dwell on, given that culture is built from the top.

All the articles provide reasons for noncompliance in their findings. In a hectic environment, employee workload has been shown to negatively impact compliance ( Jalali et al. , 2020 ). Connected to workload are work goals. Security will draw the shortest straw when goals and security do not align. If security is viewed as a hindrance, noncompliant behavior will arise ( Reeves et al. , 2020 ; Hwang et al. , 2017 ; Alzahrani, 2021 ; Parsons et al. , 2015 ). Also, when employees lack knowledge or have not been given sufficient information about the organization’s security policies, compliant behavior will be impacted ( Hwang et al. , 2017 ; Alzahrani, 2021 ; Parsons et al. , 2015 ; Li et al. , 2019 ).

4. Discussion

The findings of this SLR have shown that there is an overweight of subjective data collected to measure cybersecurity. Over 90% of the included articles use subjective data to measure behavior. Only one article relies solely on objective measurements. The availability and ease of use regarding subjective methods might be the reason. An interview can be done without much cost or planning, whereas using objective methods will require more resources, e.g. a phishing campaign.

However, the use of subjective data can lead to biased responses from the subjects. This bias can be problematic. According to Kurowski (2019) , “For instance, survey reports of church attendance and rates of exercise are found to be double the actual frequency when self-reported.” Almost all articles address the issue of biased measurement. Many refer to Podsakoff et al. (2003) and the recommendation therein to assure respondents that their identity will be kept anonymous. It seems like anonymization is an acceptable way to remove the risk of bias for several researchers. However, as Kurowski (2019) finds, there does exist bias in today’s research. In his paper, to test for a biased response, two questionnaires were used, one using standard, straightforward compliance questions and one using vignettes, see Table 1 . Kurowski (2019) found that generic questionnaires may capture biased policy compliance measures. If an individual reports policy compliance on the literature-based scale, it may mean any of the following: An individual is indeed compliant; an individual does not know the policy and does not act compliant; or an individual thinks they are compliant with the policy because they behave securely, but do not know the policy. This does not imply that existing research fails to measure policy compliance entirely, but it fails to measure it reliably ( Kurowski, 2019 ).

Jalali et al. (2020) included objective and subjective measurements. They compared the employees’ intention to comply with their actual compliance by examining whether the employees had clicked the link in the phishing campaign or not. They found no significant relationship between the intention to comply and the actual behavior. This result is not in line with previous studies that used self-reported data, a method that leaves room for socially desirable answers ( Podsakoff et al. , 2003 ), or previous answers could influence later answers ( Jalali, 2014 ).

Even the HAIS-Q, the single most used questionnaire, used seven times in this SLR, does not refrain from biased responses. Even though the questionnaire was validated and tested by Parsons et al. (2017) , when researched to uncover biased responses by McCormac et al. (2017) , showed that social desirability bias can be present. This means that further research is needed to exclude biased responses from HAIS-Q.

5. Conclusion

This SLR, which started with 2,153 unique articles and was reduced during several analysis steps to 26 articles, provides insights into the predefined research questions.

When excluding all preparational work before a study is performed, the assessment of behavior can be classified into three components: data collection , measurement scale and lastly, analysis . This research found that subjective data are collected to a much larger extent than objective data, in the context of cybersecurity, with online SAQ as the most prominent way to collect data. Measurement scales are often composed based on existing literature and adapted by the researchers. The most commonly used questionnaire is HAIS-Q, developed by Parsons et al. (2014) . Finally, an analysis is performed to test for internal and external validation of the collected data. PLS analysis is the most frequent technique in selected articles. Although a clear path to assess behavior is uncovered, the proposed self-assessment method can produce biased data. Thus, future research should address the problem of objectively assessing cybersecurity-related behavior and the factors affecting it.

The second research question, i.e. whether there exist differences between manager and employee behavior, was not conclusively answered. Of the relatively small number of articles, several provide insights and noteworthy findings but not conclusive answers to this research question. In light of the significance of the matter for improving the cybersecurity culture in an organization, this constitutes another interesting research gap.

Future research should bridge the above research gaps, and studies should include employees and management from the same organization. This will require more planning and coordination than simply deploying a questionnaire online. Extra effort in anonymizing personal data must be in place because subjects come from the same organization. The uncertainty surrounding anonymization and the risk of biased responses concerning anonymization must be mitigated. This can be obtained by, e.g. using a hybrid method consisting of objective and subjective data collection, e.g. self-assessment questionnaires and phishing campaigns. Future research should collect holistic data within a market, country, segment or similar, as research into compliance is context-dependable ( Jalali et al. , 2020 ).

cyber security challenges for society literature review

The SLR screening process

Trend and classification of included studies

Overview of reviewed articles

Ajzen , I. ( 1991 ), “ The theory of planned behavior ”, Organizational Behavior and Human Decision Processes , Vol. 50 No. 2 , pp. 179 - 211 .

Alassaf , M. and Alkhalifah , A. ( 2021 ), “ Exploring the influence of direct and indirect factors on information security policy compliance: a systematic literature review ”, IEEE Access , Vol. 9 , pp. 162687 - 162705 .

Al-Omari , A. , El-Gayar , O. and Deokar , A. ( 2012 ), “ Security policy compliance: user acceptance perspective ”, 2012 45th HI International Conference on System Sciences , IEEE , pp. 3317 - 3326 .

Alzahrani , L. ( 2021 ), “ Factors impacting users’ compliance with information security policies: an empirical study ”, International Journal of Advanced Computer Science and Applications , Vol. 12 No. 10 .

Ameen , N. , Tarhini , A. , Shah , M.H. , Madichie , N. , Paul , J. and Choudrie , J. ( 2021 ), “ Keeping customers’ data secure: a cross-cultural study of cybersecurity compliance among the Gen-Mobile workforce ”, Computers in Human Behavior , Vol. 114 , p. 106531 , doi: 10.1016/j.chb.2020.106531 .

Barlow , J.B. , Warkentin , M. , Ormond , D. and Dennis , A.R. ( 2013 ), “ Don’t make excuses! Discouraging neutralization to reduce IT policy violation ”, Computers and Security , Vol. 39 , pp. 145 - 159 , doi: 10.1016/j.cose.2013.05.006 .

Black , P.E. , Scarfone , K. and Souppaya , M. ( 2008 ), “ Cyber security metrics and measures ”, Wiley Handbook of Science and Technology for Homeland Security , Wiley , NH , pp. 1 - 15 .

Boekaerts , M. ( 1991 ), “ Subjective competence, appraisals and self-assessment ”, Learning and Instruction , Vol. 1 No. 1 , pp. 1 - 17 , doi: 10.1016/0959-4752(91)90016-2 .

Chen , Y. , Galletta , D.F. , Lowry , P.B. , Luo , X. , Moody , G.D. and Willison , R. ( 2021a ), “ Understanding inconsistent employee compliance with information security policies through the lens of the extended parallel process model ”, Information Systems Research , Vol. 32 No. 3 , pp. 1043 - 1065 , doi: 10.1287/isre.2021.1014 .

Chen , Y. , Xia , W. and Cousins , K. ( 2021b ), “ Voluntary and instrumental information security policy compliance: an integrated view of prosocial motivation, self-regulation and deterrence ”, Computers and Security , Vol. 113 , p. 102568 , doi: 10.1016/j.cose.2021.102568 .

Cindana , A. and Ruldeviyani , Y. ( 2018 ), “ Measuring information security awareness on employee using HAIS-Q: case study at XYZ firm ”, 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS) , pp. 289 - 294 .

Fink , A. ( 2015 ), How to Conduct Surveys: A Step-by-Step Guide , Sage Publications , London .

Fink , A. ( 2019 ), Conducting Research Literature Reviews: From the Internet to Paper , Sage publications , London .

Furnell , S. and Thomson , K.L. ( 2009 ), “ From culture to disobedience: recognising the varying user acceptance of IT security ”, Computer Fraud and Security , Vol. 2009 No. 2 , pp. 5 - 10 , doi: 10.1016/S1361-3723(09)70019-3 .

Gangire , Y. , Da Veiga , A. and Herselman , M. ( 2020 ), “ Information security behavior: development of a measurement instrument based on the self-determination theory ”, International Symposium on Human Aspects of Information Security and Assurance , Springer , Cham , pp. 144 - 157 .

Goo , J. , Yim , M. and Kim , D.J. ( 2014 ), “ A path to successful management of employee security compliance: an empirical study of information security climate ”, IEEE Transactions on Professional Communication , Vol. 57 No. 4 , pp. 286 - 308 , doi: 10.1109/TPC.2014.2374011 .

Guhr , N. , Lebek , B. and Breitner , M.H. ( 2018 ), “ The impact of leadership on employees’ intended information security behaviour: an examination of the full-range leadership theory ”, Information Systems Journal , Vol. 29 No. 2 , pp. 340 - 362 , doi: 10.1111/isj.12202 .

Haenlein , M. and Kaplan , A.M. ( 2004 ), “ A beginner’s guide to partial least squares analysis ”, Understanding Statistics , Vol. 3 No. 4 , pp. 283 - 297 .

Herath , T. and Rao , H.R. ( 2009 ), “ Protection motivation and deterrence: a framework for security policy compliance in organisations ”, European Journal of Information Systems , Vol. 18 No. 2 , pp. 106 - 125 .

Höne , K. and Eloff , J.H.P. ( 2002 ), “ Information security policy – what do international information security standards say ?”, Computers and Security , Vol. 21 No. 5 , pp. 402 - 409 , doi: 10.1016/S0167-4048(02)00504-7 .

Hwang , I. , Kim , D. , Kim , T. and Kim , S. ( 2017 ), “ Why not comply with information security? An empirical approach for the causes of non-compliance ”, Online Information Review , Vol. 41 No. 1 , pp. 2 - 18 .

International Standardization Organization ( 2022 ), “ ISO/IEC 27002:2022, information security, cybersecurity and privacy protection – information security controls ”.

Jalali , M.S. ( 2014 ), “ How individuals weigh their previous estimates to make a new estimate in the presence or absence of social influence ”, International Social Computing, Behavioral-Cultural Modeling and Prediction , Springer , Cham , pp. 67 - 74 .

Jalali , M.S. , Bruckes , M. , Westmattelmann , D. and Schewe , G. ( 2020 ), “ Why employees (still) click on phishing links: investigation in hospitals ”, Journal of Medical Internet Research , Vol. 22 No. 1 , p. E16775 , doi: 10.2196/16775 .

Kilskar , S.S. ( 2020 ), “ Socio-technical perspectives on cyber security and definitions of digital transformation – a literature review ”, Proceedings of the 30th European Safety and Reliability Conference and the 15th Probabilistic Safety Assessment and Management Conference , Venice .

Kruger , H. , Du Toit , T. , Drevin , L. and Maree , N. ( 2020 ), “ Acquiring sentiment towards information security policies through affective computing ”, 2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC) , 25-27 Nov. 2020 , pp. 1 - 6 .

Kurowski , S. ( 2019 ), “ Response biases in policy compliance research ”, Information and Computer Security , Vol. 28 No. 3 , pp. 445 - 465 , doi: 10.1108/ICS-02-2019-0025 .

Li , L. , He , W. , Xu , L. , Ash , I. , Anwar , M. and Yuan , X. ( 2019 ), “ Investigating the impact of cybersecurity policy awareness on employees’ cybersecurity behavior ”, International Journal of Information Management , Vol. 45 , pp. 13 - 24 , doi: 10.1016/j.ijinfomgt.2018.10.017 .

Liu , C. , Wang , N. and Liang , H. ( 2020 ), “ Motivating information security policy compliance: the critical role of supervisor-subordinate guanxi and organizational commitment ”, International Journal of Information Management , Vol. 54 , p. 102152 , doi: 10.1016/j.ijinfomgt.2020.102152 .

McCormac , A. , Calic , D. , Butavicius , M.A. , Parsons , K. , Zwaans , T. and Pattinson , M.R. ( 2017 ), “ A reliable measure of information security awareness and the identification of bias in responses ”, Australian Journal of Information Systems , Vol. 21 .

McCormac , A. , Zwaans , T. , Parsons , K. , Calic , D. , Butavicius , M. and Pattinson , M. ( 2016 ), “ Individual differences and information security awareness ”, Computers in Human Behavior , Vol. 69 , pp. 151 - 156 , doi: 10.1016/j.chb.2016.11.065 .

McGuire , W. ( 1969 ), ‘The Nature of Attitudes and Attitude Change , Vol. 3 , Addison-Wesley , Reading .

Merhi , M. and Ahluwalia , P. ( 2019 ), “ Examining the impact of deterrence factors and norms on resistance to information systems security ”, Computers in Human Behavior , Vol. 92 , pp. 37 - 46 , doi: 10.1016/j.chb.2018.10.031 .

Moody , G.D. , Siponen , M. and Pahnila , S. ( 2018 ), “ Toward a unified model of information security policy compliance ”, MIS Quarterly , Vol. 42 No. 1 .

Niemimaa , M. , Laaksonen , A.E. and Harnesk , D. ( 2013 ), “ Interpreting information security policy outcomes: a frames of reference perspective ”, 2013 46th HI International Conference on System Sciences , IEEE , pp. 4541 - 4550 .

O’Brien , D.P. ( 1999 ), “ Quantitative vs Subjective ”, Business Measurements for Safety Performance , CRC Press , Boca Raton , p. 51 .

Page , M. , McKenzie , J. , Bossuyt , P. , Boutron , I. , Hoffmann , T. , Mulrow , C. , Shamseer , L. , Tetzlaff , J. , Akl , E. , Brennan , S. , Chou , R. , Glanville , J. , Grimshaw , J. , Hróbjartsson , A. , Lalu , M. , Li , T. , Loder , E. , Mayo-Wilson , E. , McDonald , S. and Moher , D. ( 2021 ), “ The PRISMA 2020 statement: an updated guideline for reporting systematic reviews ”, Bmj , Vol. 372 , p. N71 , doi: 10.1136/bmj.n71 .

Parsons , K. , Calic , D. , Pattinson , M. , Butavicius , M. , McCormac , A. and Zwaans , T. ( 2017 ), “ The human aspects of information security questionnaire (HAIS-Q): two further validation studies ”, Computers and Security , Vol. 66 , pp. 40 - 51 .

Parsons , K. , McCormac , A. , Butavicius , M. , Pattinson , M. and Jerram , C. ( 2014 ), “ Determining employee awareness using the human aspects of information security questionnaire (HAIS-Q ’ ) ”, Computers and Security , Vol. 42 , pp. 165 - 176 , doi: 10.1016/j.cose.2013.12.003 .

Parsons , K.M. , Young , E. , Butavicius , M.A. , McCormac , A. , Pattinson , M.R. and Jerram , C. ( 2015 ), “ The influence of organizational information security culture on information security decision making ”, Journal of Cognitive Engineering and Decision Making , Vol. 9 No. 2 , pp. 117 - 129 , doi: 10.1177/1555343415575152 .

Pirbhulal , S. , Gkioulos , V. and Katsikas , S. ( 2021 ), “ A systematic literature review on RAMS analysis for critical infrastructures protection ”, International Journal of Critical Infrastructure Protection , Vol. 33 , p. 100427 .

Podsakoff , P.M. , MacKenzie , S.B. , Lee , J.-Y. and Podsakoff , N.P. ( 2003 ), “ Common method biases in behavioral research: a critical review of the literature and recommended remedies ”, Journal of Applied Psychology , Vol. 88 No. 5 , pp. 879 - 903 .

Pollini , A. , Callari , T.C. , Tedeschi , A. , Ruscio , D. , Save , L. , Chiarugi , F. and Guerri , D. ( 2021 ), “ Leveraging human factors in cybersecurity: an integrated methodological approach ”, Cognition, Technology and Work , Vol. 24 No. 2 , pp. 371 - 390 , doi: 10.1007/s10111-021-00683-y .

Reeves , A. , Parsons , K. and Calic , D. ( 2020 ), “ Whose risk Is it anyway: how do risk perception and organisational commitment affect employee information security awareness? ”, International Conference on Human-Computer Interaction , Springer , Cham , pp. 232 - 249 .

Sommestad , T. , Hallberg , J. , Lundholm , K. and Bengtsson , J. ( 2014 ), “ Variables influencing information security policy compliance: a systematic review of quantitative studies ”, Information Management and Computer Security , Vol. 22 No. 1 , pp. 42 - 75 .

Velki , T. , Mayer , A. and Norget , J. ( 2019 ), “ Development of a new international behavioral-cognitive internet security questionnaire: preliminary results from Croatian and German samples ”, 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , IEEE , pp. 1209 - 1212 .

Weidt , F. and Silva , R. ( 2016 ), “ Systematic literature review in computer science-a practical ‘guide ”, Relatórios Técnicos Do DCC/UFJF , Vol. 1 No. 8 , doi: 10.13140/RG.2.2.35453.87524 .

Yin , R.K. ( 2018 ), Case Study Research and Applications , 6th ed ., Sage , London .

Young , F.W. ( 1981 ), “ Quantitative analysis of qualitative data ”, Psychometrika , Vol. 46 No. 4 , pp. 357 - 388 , doi: 10.1007/BF02293796 .

Acknowledgements

This work was supported by the Research Council of Norway under Project nr 323131 “How to improve Cyber Security performance by researching human behavior and improve processes in an industrial environment” and Project nr 310105 “Norwegian Centre for Cyber Security in Critical Sectors (NORCICS).”

Corresponding author

Related articles, we’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

Open access
Published: 06 April 2023
Volume 1 , article number 4 , ( 2023 )

Cite this article

You have full access to this open access article

Paul K. Mvula nAff1 ,
Paula Branco na1 nAff1 ,
Guy-Vincent Jourdan na1 nAff1 &
Herna L. Viktor na1 nAff1

3282 Accesses

4 Citations

Explore all metrics

In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.

On the Variability in the Application and Measurement of Supervised Machine Learning in Cyber Security

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Iqbal H. Sarker

Review: machine learning techniques applied to cybersecurity

Javier Martínez Torres, Carla Iglesias Comesaña & Paulino J. García-Nieto

Avoid common mistakes on your manuscript.

1 Introduction

As a result of the significant technological advancements made throughout the years, people’s lifestyles are shifting from traditional to more electronic. This shift has resulted in an increase in cybercrimes on the Internet. Therefore, adequate measures have to be put in place to secure computer systems. Moreover, computer security or cyber-security systems must be capable of detecting and preventing cyber-attacks in real-time. The intersection of the Machine Learning (ML) and cyber-security fields has recently been rapidly growing as researchers make use of either fully labelled datasets with Supervised Learning (SL), unlabeled datasets with Unsupervised Learning (UL) or combining labelled and unlabeled data with Semi-Supervised Learning (SSL) to identify the various types of cyber-attacks. Due to the high cost and scarcity of labelled data in the cyber-security domain, SSL applications for cyber-security tasks have gained traction. Several datasets have been made available to the public to build ML-based defensive mechanisms. In ML, the quality of the output is determined by the quality of the input [ 1 ]; in other words, for ML models to generalize effectively, the datasets upon which they are built must be representative of real-world data. Therefore, surveys on the available datasets and performance evaluation metrics used to build and evaluate SSL models are required to give up-to-date information on recent cyber-security datasets and suitable performance metrics used in SSL frameworks to provide a starting point for new researchers who wish to investigate this vital subject.

Several works focusing on cyber-security provide discussions of datasets and data repositories that can be used for building ML models. For instance, Ring et al. [ 2 ] presented an extensive survey on network-based intrusion detection datasets discussing datasets containing packet-based, flow-based and neither packet- nor flow-based data while Glass-Vanderlan et al. [ 3 ] focused on Host-Based Intrusion Detection Systems (HIDS) and touched upon datasets and sources mainly related to HIDS. Other articles described datasets for (i) intrusion, malware and spam detection (e.g. [ 4 , 5 , 6 , 7 , 8 ]); (ii) network anomaly detection (e.g. [ 9 ]); or (iii) phishing URL detection (e.g. [ 10 ]). However, these works often focus on a particular cyber-security domain and do not examine in detail the characteristics of the available datasets and the performance evaluation metrics that are suitable for the various research challenges.

Because of the expanding interest in this area and the rapid speed of research, these surveys quickly become outdated; there is, therefore, an obvious need for a comprehensive survey to present the most recent datasets and evaluation metrics and their usage in the literature. To fill this gap, we present an exhaustive evaluation of the cyber-security datasets used to build SSL models. In this paper, we conduct a systematic literature review (SLR) of publicly available cyber-security datasets and performance assessment metrics used for building and evaluating SSL models. To this end, we provide a summary of datasets used to construct models for cyber-security-related tasks; the covered areas include not only network- and host-based intrusion detection, but also spam and phishing detection, Sybil and botnet detection, Internet traffic and domain name classification, malware detection and categorization, and power grid attacks detection. Additionally, we examine the performance assessment metrics used to evaluate the SSL models and discuss their usage in the selected papers. Furthermore, we provide a list of datasets, tools, and resources used to collect and analyze the data that have been made publicly available in the literature. Finally, we provide a discussion on the open research challenges and a list of observations with regard to datasets and performance metrics. This is, to the best of our knowledge, the first SLR analyzing a wide array of cyber-security datasets and performance evaluation metrics for SSL tasks, as well as providing easy access to publicly available datasets.

Our key contributions are the following:

We provide a description of the most commonly used SSL techniques.

We provide insights on the major cybercrimes for which SSL solutions have been explored.

We present a systematic literature review of the publicly available cyber-security datasets, repositories and performance evaluation metrics used.

We analyze the open challenges found in the literature and provide a set of recommendations for future research.

The remaining sections are organized as follows. Section 2 presents the definitions, important concepts, and basic assumptions of SSL, as well as a brief introduction to the methods utilized in the literature we reviewed and an overview of the different cybercrimes the included articles’ authors propose to counter. Additionally, we provide examples that highlight successful industrial deployments of ML for countering cyber threats, demonstrating the practical applications of the methods discussed in the literature. In Sect. 3 , we present the methodology we used to construct our survey and in Sect. 4 , an in-depth analysis of the publicly available datasets and the different evaluation metrics used in the selected papers is presented. Section 5 discusses the open challenges faced by the reviewed methods applying SSL for cyber-security, with respect to the datasets and evaluation metrics, presents a set of observations and the lessons learned, and highlights strategies for bridging the gap between research and practice. Finally, Sect. 6 concludes the work.

2 Background on SSL and cyber-security

Machine Learning (ML), the core subset of Artificial Intelligence (AI), may be defined as the systematic study of computer algorithms and systems that allow computer programs to automatically improve their knowledge or performance through experience [ 11 ]. It is a branch of computer science where the goal is to teach computers with sample data, i.e., training data, to make predictions or decisions on unseen data. ML algorithms can be categorized into three main types: SL, UL, and Reinforcement Learning (RL). In SL, the task, i.e., the inference of the function to map input data points from an instance space to their corresponding labels in the output space using labelled examples [ 12 , 13 ], can either be classification where the function being learned is discrete, i.e., input data points in the input space are mapped to categorical values, or regression where the function being learned is continuous, i.e., input data points are mapped to real values. In contrast to SL, in UL, there are no labels available, therefore the goal of UL algorithms is to capture important patterns or extract relationships from untagged (unlabeled) data as probability density distributions [ 14 ] and in RL, the algorithms’ goal is to attempt to maximize the feedback (reward) they are provided with. SSL conceptually stands between SL and UL, [ 15 , 16 , 17 ]. Out-of-core Learning (OL), or Incremental, or Online Learning, is a learning technique where the data becomes available in a sequential, one at a time, manner [ 18 ]. In OL, the model can learn from newly available data, in addition to making predictions from it. Information Technology (IT) security, Computer security or simply cyber-security is the protection of computer systems and networks from cyber-attacks, i.e., information disclosure, loss, theft, or damage to their hardware, software, or electronic data, as well as from the disruption or misdirection of the services they offer [ 19 ].

SSL and ML, in general, have brought significant benefits to the cyber-security domain, including improved detection capabilities, adaptive learning, automation, and threat intelligence [ 20 ] (see Sect. 2.3 for industrial examples). However, there are also challenges that need to be addressed, including the lack of quality data, adversarial attacks, model explainability, and bias and discrimination [ 21 , 22 ]. Addressing these challenges will be critical to ensuring that ML remains a useful tool in the fight against cyber threats.

In the remainder of this section, we introduce the key principles and techniques of SSL, provide a summary of cybercrimes examined in the literature, and present examples that demonstrate the potential of ML in mitigating cyber threats in the real world.

2.1 SSL concepts and methods

We will first introduce some notations. Let $\mathcal {D}_L=(x_i,l(x_i ))_{i=1}^k$ denote a labelled dataset where each sample $(x_i,l(x_i))$ consists of data point $x_i$ from the instance space $\mathcal {X}$ and a target variable $l(x_i)$ in the output space $\mathcal {Y}$ . Let $\mathcal {D}_U=(x_i)_{i=k+1}^{k+u}$ denote an unlabeled dataset. In SL, when $l(x_i)$ consists of categorical values we face a classification task and when it consists of real values we have a regression task. In UL, the model is only provided with unlabeled data, i.e., $\mathcal {D}_U$ . SL can build strong models to predict labels for unlabeled samples, but it requires $\mathcal {D}_L$ to contain diverse samples manually labelled by domain experts, which may not only be too costly but may also contain inaccurate labels due to human mistakes. Therefore, in practice, $u \gg k$ . On the other hand, even though UL does not require labelled samples to infer patterns, it is prone to overfitting. SSL makes use of both $\mathcal {D}_L$ and $\mathcal {D}_U$ to infer a function whose performance surpasses one built with either SL or UL by making use of at least one of the main learning assumptions, i.e., smoothness, low-density, manifold, [ 23 ], and cluster, [ 24 ], assumption.

The smoothness assumption is based on the notion that if two data points, $x_1$ and $x_2$ , lie close in the instance space, $\mathcal {X}$ , their corresponding class labels, $l(x_1)$ and $l(x_2)$ , should also be close (the same), in the output space $\mathcal {Y}$ ; the transitivity assumption, that states that if $x_1$ lies close to $x_2$ and $x_2$ lies close to $x_3$ , then $x_1$ lies transitively close to $x_3$ , is an important idea in the smoothness assumption because “close points in $\mathcal {X}$ have the same label,” thus this assumption implies that if $x_2$ is a noisy version of $x_1$ , they should still have the same predicted label. In the low-density assumption, it is implied that data points with the same label are clustered in high-density sections of the instance space, i.e., the decision boundary must pass through a low-density region, $\mathcal {R} \subset \mathcal {X}$ , and the probability of any data point, $p(x_i)$ , being in the low-density region is low, i.e., $p(x_i)$ in $\mathcal {R}$ is low. This also verifies that the smoothness assumption is satisfied. In the manifold assumption, the instance space, $\mathcal {X}$ , consists of one or more Riemannian manifolds $\mathcal {M}$ on which samples share the same label. According to the cluster assumption, which can be seen as a generalization of the other three assumptions mentioned earlier [ 16 ], if data points are in the same cluster, they are likely to share the same label, and there may be several clusters constituting the same class [ 15 ].

Based on [ 16 , 25 , 26 ], the taxonomy in Fig. 1 provides a general overview of the SSL approaches which will be described in more detail in Sects. 2.1.1 and 2.1.2 . An overview of the key concepts in the taxonomy is presented next.

Taxonomy of SSL techniques (adapted from [ 16 , 25 , 26 ])

SS Classification and Regression methods can either be transductive or inductive [ 15 , 27 , 28 ]. In inductive SSL, the model is first built using information from $\mathcal {D}_L$ and $\mathcal {D}_U$ and it can then be used as one built with SL to generate predictions for previously unseen, unlabeled samples; there exists a clear distinction between a training phase and a testing phase. In transductive SSL, on the other hand, the goal is to generate labels for the unlabeled samples fed to the learner, therefore there is no clear distinction between a training and testing phase. Frequently, transductive approaches create a graph across all data points, including labelled and unlabeled, expressing the pairwise similarity of data points with weighted edges and are incapable of handling additional unseen data [ 17 ]. We group both SS Classification and Regression because they predict output values for input samples but note that most SS Classification approaches are incompatible with SS Regression, and we, therefore, specify when they may be compatible in Sect. 2.1.1 .

In the SS Clustering assumption, the learner’s goal is clustering but a small amount of knowledge is available in the form of constraints, must-link constraints (two samples must be within the same cluster) and cannot-link constraints (two data points cannot be within the same cluster). It differs from traditional clustering in the way the constraints are accommodated: either by biasing the search for relevant clusters or altering a distance or similarity metric [ 29 ]. When it is not possible for an SL method to work, even in a transductive form, because the available knowledge is too far from being representative of a target classification of the items, the cluster assumption may allow the use of the available knowledge to guide the clustering process [ 30 ]. Bair [ 25 ] provides a survey on SS Clustering methods and groups them into constraint-based, partial-labels, SS hierarchical clustering and outcome variable associated methods.

A plethora of SSL approaches have been proposed in the literature, each making use of at least one of the SSL assumptions described. The following sections briefly describe the frequently used SSL methods showing how they relate to the SSL assumptions.

2.1.1 SSL for classification and regression

We divide the classification and regression methods between the two main classes: inductive SSL and transductive SSL.

2.1.1.1 Inductive methods

The goal of inductive methods is to build a model from labelled and unlabeled data and use the model as a built-in SL (only with labelled data) to make predictions on unlabeled data. Inductive methods can further be divided into wrapper methods, unsupervised preprocessing, and intrinsically semi-supervised methods. In wrapper methods, one or more supervised-based learners are first trained based on the labelled data only, then the learner or set of learners are applied to the unlabeled data to generate pseudo-labels which are used for training in the next iterations. Pseudo-labels, $l(x_i)$ , $k<i<k+u$ , are simply the most confident labels produced by the learner or set of learners for a set of unlabeled samples, $\mathcal {X}_U \subset \mathcal {D}_U$ , [ 31 ]. The wrapper methods we will consider are self-training and co-training. According to the way they make use of the unlabeled data, unsupervised preprocessing methods can be divided into feature extraction, unsupervised clustering and parameter initialization or pre-training.

2.1.1.1.1 Wrapper methods In wrapper methods, a model is first trained on labelled data to generate pseudo-labels for an unlabeled subset, $\mathcal {X}_U \subset \mathcal {D}_U$ , then the model is iteratively re-trained, until all unlabeled data are labelled or some stopping criterion is met, with a new dataset containing both the labelled dataset, $\mathcal {D}_L$ , and the pseudo-labels, $l{(x_i)}$ , $k<i<k+u$ , of the subset $\mathcal {X}_U$ , generated in previous iterations. They are the most well-known and oldest SSL methods [ 27 , 31 ]. Wrapper methods may be used for classifictaion and regression and are divided into three categories: self-training, co-training, and boosting.

Self-training. Self-training [ 32 ] also referred to as self-learning, are wrapper methods that consist of a single base SL learner that is iteratively trained on a training set consisting of the original labelled data and the high-confidence predictions, pseudo-label, from the previous iterations. They are the most basic wrapper methods [ 31 ] and may be applied to most, if not all, SL algorithms such as Random Forests (RF) [ 33 ], Support Vector Machines (SVM) [ 34 ], etc.

Co-training. Co-training methods, [ 35 , 36 ], assume that (i) features can be split into two or more distinct sets or views; (ii) each feature subset is sufficient to train a good classifier; (iii) the views are conditionally independent given the class label. Co-training extends the principle of self-training to multiple SL learners that are each iteratively trained with the pseudo-labels from the other learners, in other words, learners “teach” each other with the added pseudo-labels to improve global performance. For co-training to work well, the sufficiency (ii) and independence (iii) assumptions should be satisfied [ 35 ]. Multi-view co-training, the basic form of co-training, constructs two learners on distinct feature sets or views. When no natural feature split is known a priori, single-view co-training may be used to build two or more weak learners with different hyper-parameters on the same feature set. There exist several approaches based on single-view co-training such as tri-training [ 37 ], co-forest [ 38 ], co-regularization [ 39 ], etc. In co-regularization, the two terms of the objective function minimize the error rate and optimize the disagreement between base learners [ 39 ].

2.1.1.1.2 Unsupervised preprocessing The unsupervised preprocessing methods use $\mathcal {D}_U$ and $\mathcal {D}_L$ at two different steps. The first step often consists of extraction (feature extraction) or transformation (unsupervised clustering) of the feature space or for initialization of a model’s parameters (pre-training) while the second step consists of using knowledge from $\mathcal {D}_L$ to label the unlabeled data points in $\mathcal {D}_U$ . We briefly describe the methods in the next points.

Feature Extraction: Feature extraction is one of the most critical steps to take in ML. It consists of extracting a set of relevant features for ML models to work. Typically, SSL feature extraction methods consist of either finding lower-dimensional feature spaces, from $\mathcal {X}$ , without sacrificing significant amounts of information or finding lower-dimensional vector representations of highly dimensional data objects by considering the relationships between the inputs. Examples of SSL feature extraction methods are autoencoder (AE) [ 14 ] and a few of its variants, such as denoising autoencoder [ 40 ] and contractive autoencoder [ 41 ], and methods in NLP (Natural Language Processing) such as Word2Vec [ 42 ], GloVe [ 43 ], etc.

Unsupervised clustering: Also referred to as cluster-then-label methods, these methods explicitly join the SL or SSL classification or regression algorithms and UL or SSL clustering algorithms. The UL or SSL clustering algorithm first clusters all the data points, then those clusters are fed to the SL or SSL classifier or regressor for label inference [ 44 , 45 , 46 ].

2.1.1.1.3 Intrinsically semi-supervised Intrinsically semi-supervised methods are typically extensions of existing SL methods to directly include the information from unlabeled data points in the loss function. Regarding the SSL assumption they rely on, these methods can be further grouped into four categories: (i) maximum-margin methods, where the goal is to maximize the distance between data points and the decision boundary (low density-assumption), (ii) perturbation-based methods, often implemented with neural networks (NN), rely directly on the smoothness assumption (a noisy, or perturbated, version of a data point should have the same predicted label, as the original data point), (iii) manifold-based methods either explicitly or implicitly estimate the manifolds on which the data points lie and (iv) generative models whose primary goal is to infer a function that can generate samples, similar to the available samples, from random noise.

2.1.1.2 Transductive methods

A learner is said to be transductive if it only works on the labelled and unlabeled data available at training and cannot handle unseen data [ 17 ]. The goal of a transductive learner is to infer labels for an unlabeled dataset $\mathcal {D}_U$ , using $\mathcal {D}_L$ . If a new unlabeled data point, $x_u \notin \mathcal {D}_U$ , is given, the learner must be reapplied, from scratch to all the data, i.e., $\mathcal {D}_L$ , $\mathcal {D}_U$ , and $x_u$ . Graph-based methods, which are often transductive in nature, define a graph where the nodes are labelled and unlabeled samples in the dataset, and edges (weighted) reflect the similarity of the samples. These methods usually assume label smoothness over the graph. Graph methods are non-parametric and discriminative [ 17 ]. The defined loss function is optimized to achieve two goals: (i) for already labelled samples, from $\mathcal {D}_L$ , the inferred labels should correspond to their true labels and (ii) the predicted labels of similar samples on the graph be the same. A transductive learner’s task may be classification or regression.

2.1.2 SSL for clustering

Semi-supervised clustering methods can be used with partially labelled data as well as other types of outcome measures. When cluster assignments, or partial labels, for a subset of the data, are known beforehand, the objective is to classify the unlabeled samples using the known cluster assignments [ 47 ], this is, in a sense, equivalent to an SL problem. When more complex relationships among the samples are known in the form of constraints, the problem becomes a generalization of the previous objective and is either called constrained clustering [ 48 ], i.e., an existing clustering method is modified to satisfy the constraints, or distance-based (metric-based) clustering, i.e., an alternative distance metric is used to satisfy the constraints [ 49 , 50 ].

Hierarchical and partitional clustering techniques are the two main types of clustering algorithms. Hierarchical clustering methods recursively locate nested clusters in either agglomerative or divisive mode. In agglomerative mode, they start with each data point in its own cluster and merge the most similar clusters successively to form a cluster hierarchy and in divisive or top-down mode, they start with all the data points in one cluster and recursively divide each cluster into smaller clusters [ 51 ]. SS Hierarchical clustering methods group samples using a tree-like architecture, known as a hierarchy. They either built separate hierarchies for must-link and cannot-link constrained samples [ 52 , 53 , 54 , 55 ] or use other types of constraints [ 56 , 57 , 58 , 59 , 60 ]. Finally, SS Clustering may be used to build clusters related to a given outcome variable [ 61 ].

We refer the interested reader to [ 16 , 25 , 26 , 29 , 62 ] for detailed descriptions of the methods mentioned in this section.

2.2 Cybercrimes

As mentioned in Sect. 1 , a cyber-attack is any offensive maneuver that targets computer systems aiming at information disclosure, theft of or damage to their hardware, software, or electronic data, as well as from the disruption or misdirection of the services they provide, and cyber-security can be defined as the protection of computer systems against cyber-attacks [ 19 ]. Cybercrimes are criminal activities that involve the use of digital technologies such as computers, smartphones, the internet, and other digital devices [ 63 ]. From a legal perspective, cybercrimes can be defined as criminal offences that involve the use of a computer or a computer network [ 64 ]. The cyber-attacks covered in this article can all be seen as specific types of cybercrime, we, therefore, use the two terms interchangeably. Note that different jurisdictions may have different laws regarding what constitutes a cybercrime or cyber-attack. Therefore, an activity that is considered a cyber-attack in one jurisdiction may not be considered a cybercrime in another jurisdiction, depending on the specific laws in each location but cybercrimes typically involve the illegal or unauthorized use of digital technologies such as computers [ 63 , 64 ]. Additionally, some activities that are not considered cyber attacks in some jurisdictions may still be considered cybercrimes if they violate specific laws related to computer systems and networks [ 63 ]. Cybercrimes may also be viewed from technical [ 65 ] and procedural [ 66 , 67 ] perspectives.

The IBM X-Force Incident Response and Intelligence Services (IRIS) estimated the profit made by a group of attackers to be over US$123 million in 2020 [ 68 ] and the Cost of a Data Breach report published in 2021 by IBM Security estimates the global average cost per incident to US$4.24 million [ 69 ]. Cybercriminals are always taking advantage of catastrophes, disasters, and hot events for their own gains. A clear example is the surge in cybercrimes of all sorts witnessed at the beginning of the pandemic.

The following subsections briefly describe the cybercrimes countered in the covered literature.

2.2.1 Network intrusion

Any unlawful action on a digital network is referred to as network intrusion. Network intrusions or breaches can be thought of as a succession of acts carried out one after the other, each dependent on the success of the last. The stages of the intrusion are sequential, beginning with reconnaissance and ending with the compromising of sensitive data [ 70 ]. These principles are useful for managing proactive measures and finding bad actors’ behaviour. Network intrusions often include the theft of valuable network resources and virtually always compromise network and/or data security [ 71 , 72 ]. Living off the land, multi-routing, buffer overwriting, covert CGI scripts, protocol-specific attacks, traffic flooding, Trojan horse malware, and worms are the most frequent intrusion attacks.

Some intruders will attempt to implant code that cracks passwords, logs keystrokes, or imitates a website in order to lead unaware users to their own. Others will infiltrate the network and steal data on a regular basis or alter websites accessible to the public with a range of messages. Intruders may get access to a computer system in a number of ways, including internally, externally, or even physically.

2.2.2 Phishing

IBM X-Force identified phishing as one of the most used attack vectors in 2021 because of their ease of use and low resource requirements [ 73 ]. Phishing is a form of cybercrime where the attackers’ aim is to trick users into revealing sensitive data, including personal information, banking, and credit card details, IDs, passwords, and more valuable information via replicas of legitimate websites of trusted organizations. Phishing attacks can be grouped into deceptive phishing and technical subterfuge [ 74 ]. Deceptive phishing is often performed via emails, SMS, calendar invitations, using telephony, etc., and technical subterfuge is the act of tricking individuals into disclosing their sensitive information through technical subterfuge by downloading malicious code into the victim’s system. We refer the reader to a recent in-depth study on phishing attacks [ 74 ].

Spam, not to be mistaken for canned meat, may be defined as unsolicited and unwanted messages, typically sent in bulk, that can take several forms such as email, text messages, phone calls, or social media messages. The content of spam messages can vary widely, but they are often commercial in nature and aim to advertise a product or service or promote a fraudulent scheme or solicit donations [ 75 ].

2.2.4 Malware

Malware or malicious software is defined as any software that intentionally executes malicious payloads on victim machines (computers, smartphones, computer networks, and so on) to cause disruptions. There exist several varieties of malware, such as computer viruses, worms, Trojan horses, ransomware, spyware, adware, rogue software, wipers, and scareware. In the 2022 Threat Intelligence Index, IBM X-Force reported that ransomware, a type of malware, was again the top attack type in 2021, although decreasing from 23%, in 2020, to 21% [ 73 ]. Defensive tactics vary depending on the type of malware, but most may be avoided by installing antivirus software and firewalls, applying regular patches to decrease zero-day threats, safeguarding networks from intrusion, performing regular backups, and isolating infected devices.

2.2.5 Other cyber-attacks

In addition to intrusions, spam, phishing and malware, we also discuss SSL applications for:

Traffic classification - traffic classification may be used to detect patterns suggestive of denial-of-service attacks, prompt automated re-allocation of network resources for priority customers, or identify customer use of network resources that in some manner violates the operator’s terms of service [ 76 ];

Sybil detection —a Sybil attack may be defined as an attack against identity in which an individual entity masquerades as numerous identities at the same time [ 77 ];

Stock market manipulation detection —market manipulation may be defined as an illegal practice in an attempt to boost or reduce stock prices by generating an illusion of an active trading [ 78 , 79 ];

Social bot detection —a social bot may be defined as a social media account that is operated by a computer algorithm to automatically generate content and interact with humans (or other bot users) on social media, in an attempt to mimic and possibly modify their behaviour [ 80 , 81 ];

Shilling attack detection —a Shilling attack is a particular type of attack in which a malicious user profile is injected into an existing collaborative filtering dataset to influence the recommender system’s outcome. The injected profiles explicitly rate items in a way that either promotes or demotes the target items [ 82 ];

Pathogenic social media account detection —Pathogenic Social Media (PSM) accounts refer to accounts that have the capability to spread harmful misinformation on social media to viral proportions. Terrorist supporters, water armies, and fake news writers are among the accounts in this category [ 83 , 84 ];

Fraud detection —in the banking industry such as credit card fraud detection. Credit card fraud may happen when unauthorized individuals obtain access to a person’s credit card information and use it to make purchases, other transactions, or open new accounts [ 85 ]; and

Detection of attacks on other platforms such as the power grid - the smart grid enables energy customers and providers to manage and generate electricity more effectively. The smart grid, like other emerging technology, raises new security issues [ 86 ].

2.3 Examples of industry deployments of ML in cyber-security

This section presents examples of successful industrial deployments of ML for countering cyber threats. The first example is “IBM X-Force Threat Management” [ 87 ], an ML platform deployed to counter cyber threats. IBM X-Force Threat Management is a cloud-based security platform that leverages ML to provide advanced threat detection and response capabilities. It analyzes massive amounts of security data, including network traffic, system logs, and user behaviour, to identify and respond to potential threats in real-time using ML algorithms. The ML models are trained on large datasets of historical security events, allowing the system to learn and adapt to new threats over time. Depending on the use case and data available, it is possible that IBM X-Force Threat Management may use a combination of ML techniques, such as SSL and Reinforcement Learning, in addition to other optimization methods for enhancing security policies. However, it should be noted that without specific information from IBM, it cannot be definitively confirmed whether these techniques are actually employed. Nonetheless, the platform has demonstrated success in detecting various types of cyber threats, including banking Trojans such as IcedID, Footnote 1 TrickBot and QakBot.

The second example is the Deep Packet Inspection (DPI) system developed by Darktrace, a cyber-security company. The system uses unsupervised ML algorithms to learn the expected behaviour of a network and detect anomalies that may indicate malicious activity. The system can also automatically respond to detected threats by initiating a range of actions, such as quarantining a device or blocking network traffic. Darktrace has deployed its DPI system in various industries, including healthcare, finance, and energy. In one instance, a UK construction company used the system to detect and respond to a ransomware attack. Footnote 2 The system identified the attack within minutes of it starting and initiated a range of responses, including blocking the attacker’s IP address and quarantining affected devices. The company was able to contain the attack and avoid paying the ransom demanded by the attackers.

Our third example is Feedzai, an ML platform that provides fraud prevention and anti-money laundering for financial institutions and businesses. Feedzai employs a variety of ML techniques, including Deep Learning and combining SL and UL (SSL), Footnote 3 to detect and prevent fraudulent activity in real-time. After partnering with a large European bank, Feedzai’s platform reduced false positives and accurately identified fraudulent activity, resulting in lower losses due to fraud. Footnote 4

Overall, IBM X-Force Threat Management, Darktrace, and Feedzai demonstrate how ML can be successfully deployed in the industry to counter cyber threats and provide advanced threat detection and response capabilities.

3 Review methodology

This section provides the details of the methodology we followed. To achieve our goal of reviewing the datasets and evaluation metrics used in the applications of SSL techniques to cyber-security, we followed the standard systematic literature review guidelines outlined in [ 88 ] for assessing the search’s completeness. The entire process was done on Covidence [ 89 ], an online tool for systematic review management and production. We first defined our three research questions shown below. These are motivated by the need to examine the efforts being made to safeguard users and computer systems against attacks using SSL. This stems from the fact that attacks are far more harmful than vulnerability scans or related operations. We intend to review the datasets as well as the evaluation metrics used in the literature identifying the cyber-attacks as soon as possible to take the necessary actions to reverse them.

With the introduction and use of SSL in cyber-security, what are the assessment metrics used to evaluate the built models?

What datasets are the proposed SSL approaches built upon? What are the most used datasets?

What are the open challenges with respect to the datasets and performance assessment metrics?

Our inclusion and exclusion criteria were then defined from the above research questions. A paper is included if it directly applies SSL for detecting at least one of the cyber-attacks mentioned in Sect. 2.2 . with enough details to address our research questions. On the other hand, a paper is excluded if (i) another paper of the same authors superseded the work, in which case the latest work is considered, (ii) it does not use SSL for the inclusion criteria and (iii) the approach is discussed at a high level, with insufficient information to fulfill the research questions. The entire process was done on Covidence [ 89 ], an online tool for systematic review management and production. We then queried IEEE Xplore and ACM Digital Library for articles having (“semi-supervised learning” AND “cyber-security”), (“semi-supervised” AND “cyber-security”) and (“semi-supervised” AND “security”) anywhere within the article.

The keywords (“semi-supervised learning” AND “cyber-security”) have been chosen because SSL has been increasingly used in cyber-security to improve the accuracy of detection and classification systems [ 90 ]. This combination has been used to find articles that specifically focus on using SSL in cyber-security tasks such as intrusion detection, malware detection, network traffic analysis, etc. Similarly, the combination of (“semi-supervised” AND “cyber-security”) has been used to find articles that discuss semi-supervised learning in a cyber-security context, even if they do not explicitly mention the phrase “semi-supervised learning”. Finally, the combination (“semi-supervised” AND “security”) has been used to broaden the search beyond just cyber-security and potentially include other domains where SSL has been applied to security-related tasks.

Note that we did not limit the search to the title, abstract or keywords because it was essential to making sure to find all the articles discussing and applying SSL methods for cyber-security for screening. The reason we chose these databases is that they are among the top databases suggested by our university library for conducting Computer Science research and they also contain papers published in top-tier venues. To complement the results obtained from IEEE Xplore and ACM Digital Library, we submitted the same search queries to Google Scholar and extracted the top 200 search results sorted by relevance. The combinations mentioned earlier and this search strategy allowed us to find articles that are relevant to using SSL in cyber-security, and gain a better understanding of how it is being/has been used to improve security systems.

As seen in Fig. 2 , in total, 1914 studies were imported for screening; 267 duplicates were automatically removed, and the remaining 1647 studies’ titles and abstracts were manually screened for relevance. Based on our inclusion and exclusion criteria, 1319 studies were found irrelevant, because they either did not discuss SSL methods or cyber-attack defences. The remaining 328 studies’ full texts were further assessed as they were either partially or fully related to our inclusion criteria, and finally, 210 relevant studies were included for data extraction. Furthermore, we used state-of-the-art surveys and review articles on SSL [ 16 , 27 ] and ML for cyber-security [ 4 ] to construct this extensive review of cyber-security datasets and performance evaluation metrics for SSL models.

Review methodology

4 Datasets and performance assessment metrics

In this section, we summarize and analyze the public datasets and performance assessment metrics used in the selected papers.

4.1 Datasets and repositories

AI, especially ML, has proven itself a particularly useful tool in cyber-security as well as other fields of computer science and has extensively featured in the literature for cybercrime or malicious activity detection. “Cost of a Data Breach” [ 69 ], published by IBM Security, reported a US$3.81 million, or almost 80% difference between breach costs of companies with fully deployed security AI/ML and automation and companies without security AI/ML and automation. We present the public datasets used in the covered literature in this section, grouped by type of attack and show their usage in the selected papers in Figs. 3 , 4 , 5 , and 6 . Note that we acknowledge the difference between Spam and Phishing in Subsections 2.2.3 and 2.2.2 as they are different attack vectors but due to the scarcity of these datasets, we have combined them in a single section.

4.1.1 Network intrusion datasets and sources

In terms of network intrusion, we found a total of 18 public datasets and sources in the papers we reviewed. We begin by providing a brief description of each dataset; we, then, provide a summary of their main characteristics as well as some key data usage statistics.

KDD’99 and NSL-KDD . The KDD’99 dataset is a statistically preprocessed dataset which has been available since 1999 from DARPA [ 91 ], it is an updated version of the DARPA98. It is the most used dataset in the selected papers. The dataset has three components, basic, content and traffic features, making a total of 41 features for normal and simulated attack traffic. The NSL-KDD dataset, proposed by Tavallaee [ 92 ], is a version of the KDD’99 dataset in which redundant records are removed to enable the classifiers to produce unbiased results. The two datasets contain various attack types such as Neptune-DoS, pod-DoS, Smurf-DoS, and buffer-overflow. Table 1 gives a brief composition of the KDD’99 and NSL-KDD datasets.

Moore Set . The Moore Set [ 93 ] was prepared in 2005 by researchers at Intel Research. It comprises real-world traces collected by the high-performance network monitor. Each object in the Moore set represents a single flow of TCP packets between client and server, which consists of 248 characteristics. The information in the features is derived using packet header information alone, while the classification- class has been derived using content-based analysis. Table 2 shows a brief composition of the Moore Set.

LBNL2005. The Lawrence Berkeley National Laboratory (LBNL) 2005 traffic traces were collected at the LBNL/ICSI under the Enterprise Tracing Project over a period of three months in 2004 and 2005 on two routers [ 94 ]. It contains full header network traffic recorded at a medium-sized enterprise covering 22 subnets and includes trace data for a wide range of traffic including web, email, backup, and streaming media. Because the traffic traces are completely anonymized, all the packets do not have a payload. As seen in Table 3 , the LBNL trace consists of five datasets labelled: D0–D4. The “Per Tap” row specifies the number of traces collected on each monitored router port while the “Snaplen” row gives the maximum number of bytes recorded for each packet.

CAIDA Datasets . The Centre for Applied Internet Data Analysis (CAIDA), based at the University of California’s San Diego Supercomputer Center, collects a variety of data from geographically and topologically diverse locations and makes it available to the research community to the extent possible while respecting the privacy of individuals and organizations who donate data or network access. The CAIDA-DDoS Dataset [ 95 ], comprises approximately one hour of anonymized traffic from a DDoS attack on August 4, 2007 (20:50:08 UTC to 21:56:16 UTC). This type of denial-of-service attack tries to prevent access to the targeted server by using all of the server’s computational power and all of the bandwidth on the network linking the server to the Internet. The traces only include attack traffic to the victim and responses to the attack from the victim. Non-attack traffic has been eliminated to the greatest extent practicable.

Kyoto2006+ . The Kyoto2006+ is a publicly available benchmark dataset, consisting of 24 statistical features, that is built on three years of network traffic, from November 2006 to August 2009 [ 96 ]. It covers both regular servers and honeypots deployed at Kyoto University in Japan labelled as normal (no attack), attack (known attack) and unknown attack. It includes a variety of attacks performed against the honeypots such as shellcode, exploits, DoS, port scans, backscatter, and malware, shown in Table 4 . An updated version of the dataset contains additional data collected from November 2006 to December 2015 [ 97 ].

UNIBS2009 . The UNIBS-2009 trace [ 98 ], was compiled by the University of Brescia in 2009. It consists of traffic traces collected by running Tcpdump on the edge router of the university’s campus network on three consecutive working days (2009.9.30, 2009.10.1 and 2009.10.02) connecting the network to the Internet through a 100 Mbps uplink. As shown in Table 5 , the dataset supplies the true labels, and the traffic trace includes Web (HTTP and HTTPS), Mail (POP3, IMAP4, SMTP and their Secure Sockets Layer variants), Skype, P2P (BitTorrent, Edonkey), SSH (Secure Shell), FTP (File Transfer Protocol) and MSN.

UNB ISCX-2012 . The Installation Support Center of Expertise (ISCX)-2012 dataset has been prepared at the ISCX at the University of New Brunswick [ 99 ]. It is built on 7 days of network traffic, shown in Table 6 , and consists of over two million traffic packets characterized by 20 features taking nominal, integer, or float values. The dataset includes full packet payloads in pcap format.

CTU-13 . The CTU-13 dataset was compiled by the Czech Technical University [ 100 ]. It consists of botnet traffic captured in the university in 2011. The dataset includes thirteen scenarios, shown in Table 7 , covering different botnet attacks, that use a variety of protocols and performing different actions, mixed with normal traffic and background traffic. The dataset is available in the forms of unidirectional flow, bidirectional flow, and packet capture.

SCADA 2014 . The Supervisory Control And Data Acquisition (SCADA) [ 101 ] is a database proposed by Mississippi State University Key Infrastructure Protection Center in 2014 to evaluate the industrial network intrusion detection model. It is one of the standard databases in the current industrial control network intrusion detection commonly used in experiments. It includes the Gas system dataset and Water storage system dataset from the Industrial Control System network layer.

UNSW-NB15 . The UNSW-NB15 dataset was compiled in 2015 by the University of New South Wales Canberra at the School of Engineering and IT, UNSW Canberra at ADFA, using a small, emulated network over 31 h by getting normal and malicious raw network packets. It consists of nine attack types: analysis, backdoors, DoS, exploits, generic, fuzzers, reconnaissance, shell code and worms. It consists of over two million records each characterized by 49 features taking nominal, integer, or float values. The dataset’s data distribution is shown in Table 8 .

AWID 2015 . The Aegean Wi-Fi Intrusion Dataset (AWID), published in 2015 [ 102 ], comprises the largest amount of Wi-Fi network data (normal and attack) collected from real network environments. The 16 attack types can be grouped into flooding, impersonation, and injection. As seen in Table 9 , the dataset contains over 5 million samples each characterized by 154 features, representing the WLAN frame fields along with physical layer meta-data.

ISCXVPN2016 . The ISCXVPN2016 [ 103 ], published by the UNB in 2016, comprises traffic captured using Wireshark and tcpdump, generating a total amount of 28GB of data. For the VPN, an external VPN service provider connected to using OpenVPN (UDP mode) was used. To generate SFTP and FTPS traffic an external service provider and Filezilla as a client was used. Table 10 shows the data distribution in the ISCXVPN2016 dataset.

CIDDS . The Coburg Intrusion Detection Datasets (CIDDS), prepared at Coburg University of Applied Sciences (Hochschule Coburg), consist of several labelled flow-based datasets created in virtual environments using OpenStack. The CIDDS database’s most used dataset, CIDDS-001, released in 2017, covers four weeks of unidirectional traffic flows each characterized by 19 features taking nominal, integer, or float values. As seen in Table 11 , the dataset includes attacks such as DoS, port scan and SSH brute force.

CICIDS2017 . The Canadian Institute for Cyber-security Intrusion - Evaluation Dataset (CIC-IDS)-2017 was produced in an emulated network environment at the CIC [ 104 ]. It is built on 5 days (July 3 to July 7, 2017) of network traffic, shown in Table 12 , and includes a variety of most common attack types including FTP patator, SSH patator, DoS slowloris, DoS Slowhttptest, DoS Hulk, DoS GoldenEye, Heartbleed, Brute force, XSS, SQL Injection, Infiltration, Bot, DDoS (Distributed denial of service), and Port Scan each characterized by 80 features extracted using CICFlowMeter [ 103 , 105 ]. The dataset also includes full packet payloads in pcap format.

UGR’16 . The UGR’16 dataset, proposed in 2018 by Maciá-Fernández et al. [ 106 ], comprises NetFlow network traces collected from a real Tier 3 ISP network made up of several organizations’ and clients’ virtualized and hosted services including WordPress, Joomla, email, FTP, etc. NetFlow sensors were installed in the network’s border routers to capture all incoming and outgoing traffic from the ISP. As seen in Table 13 , two sets of data are provided: one for training models (calibration set) and the other for testing the models’ outputs (test set).

Kitsune2019 . The Kitsune Network Attack Dataset, Kitsune2019, has been prepared at Ben-Gurion University of the Negev, Israel and was released in May 2018 [ 107 ]. The dataset is composed of 9 files covering 9 distinct attacks situations on a commercial IP-based video surveillance system and an IoT network: OS (Operating System) Scan, Fuzzing, Video Injection, ARP Man in the Middle, Active Wiretap, SSDP Flood, SYN DoS, Secure Sockets Layer Renegotiation and Mirai Botnet. It contains 27,170,754 samples each characterized by 115 real features. The violation column in Table 14 indicates the attacker’s security violation on the network’s confidentiality (C), integrity (I), and availability (A).

NETRESEC is a software company that specializes in network security monitoring and forensics. They also maintain.pcap repository files gathered from various Internet sources [ 108 ]. It is a list of freely accessible public packet capture repositories on the Internet. Most of the websites listed on their website provide Full Packet Capture (FPC) files, however, others only provide truncated frames.

MAWI archive . The MAWI archive [ 109 ] consists of an ongoing collection of daily Internet traffic traces captured within the WIDE backbone network at several sampling points. Tcpdump is used to retrieve traffic traces, and the IP (Internet Protocol) addresses in the traces are encrypted using a modified version of Tcpdpriv (MAWI Working Group Traffic Archive ( http://www.wide.ad.jp )). The samplepoint-F consists of daily traces at the transit link of WIDE to the upstream ISP and has been in operation since 01/07/2006.

Kaggle Footnote 5 is an online data sharing and publishing platform. It includes security-based datasets such as KDD’99 and NSL-KDD. Registered users can also upload and explore data analysis models.

A breakdown of the usage of the Intrusion Detection datasets in the selected papers is shown in Fig. 3 , we also provide an overview of the Network Intrusion datasets in Table 15 . As seen in Fig. 3 , the KDD’99 dataset, despite being old and containing redundant and noisy records, is the most used of the 17 intrusion detection datasets described in this section. 45 out of the 100 selected papers used either the KDD’99 alone or in conjunction with some other intrusion detection dataset. This dataset is followed by the NSL-KDD dataset which is only a smaller version without the redundant and noisy records present in KDD’99. Additionally, none of these datasets are balanced, therefore suitable evaluation metrics should be used when evaluating models built on these datasets. We must highlight that the four most recent datasets used in the papers reviewed were already published in 2017 and 2018 and they have not been extensively explored in an SSL context. Finally, we refer the interested reader to a recent comprehensive survey of Network-based Intrusion datasets [ 2 ].

Usage of intrusion detection datasets and sources in selected papers

4.1.2 Spam and phishing datasets and sources

Spam Email . The SPAM Email Dataset contains a total of 4601 emails including 1813 spam emails and 2788 legitimate emails each characterized by 58 attributes. It was donated to the UCI Machine Learning Repository by Hewlett Packard in 1999 [ 110 ].

Ling-Spam . The Ling-Spam dataset, proposed by Androutsopoulos et al. [ 111 ] in 2000, contains both spam and legitimate emails retrieved from an email distribution list, the Linguistic list, focusing on linguistic interests around research opportunities, job postings, and software discussion. The dataset contains 2,893 different emails, of which 2,412 are genuine emails collected from the list’s digests and 481 are spam emails retrieved from one of the corpus’ authors.

WEBSPAM-UK2006 . The WEBSPAM-UK2006 dataset was obtained using a set of.UK pages downloaded by the Laboratory of Web Algorithmics of the University of Milan (Università degli Studi di Milano) and manually assessed by a group of volunteers in 2006. The dataset consists of labels, URLs and hyperlinks and HTML page contents of 77,741,046 Web pages [ 112 ].

SpamAssassin (spamassassin.apache.org). Apache SpamAssassin is an Open-Source anti-spam platform providing a filter to classify email and block spam. The SpamAssassin Public mail corpus is a selection of 6,047 emails prepared by SpamAssassin in 2006. Of the total count, there are 1,897 spam messages and 4,150 legitimate emails.

TREC2007 Public Corpus . The TREC 2007 Public Corpus contains all email messages delivered to a particular server. The server contained several accounts, fallen into disuse and several ‘honeypot’ accounts published on the web, which were used to sign up for a few services, some legitimate and some not. The TREC dataset contains 75,419 messages, of which 25,220 are legitimate emails and 50,199 are junk messages; the messages are divided into three subcorpora [ 113 ].

SMS Spam Collection . The SMS Spam Collection Dataset is a publicly available dataset created by Almeida et al. [ 114 , 115 , 116 ] in 2011. It is a labelled dataset of 5574 SMS messages, 747 spam and 4827 ham, collected from mobile phones.

“Gold standard” opinion spam . The “gold standard” opinion spam dataset was proposed by Ott et al. [ 117 ] in 2011. The corpus comprises 1,600 review texts, 800 deceptive and 800 genuine, on 20 hotels in the Chicago area. The genuine reviews were obtained from reviewing websites such as TripAdvisor, Expedia and Yelp and the deceptive ones were rendered using Amazon Mechanical Turk (AMT). In the dataset, 400 reviews are written with a negative sentimental polarity and 400 depict a positive sentimental polarity.

Spear phishing email dataset (2011) & Benign email dataset (2013) . These two datasets have been prepared by Symantec’s enterprise mail scanning service. The spear phishing email dataset contains 1,467 emails from 8 campaigns and the benign email dataset contains 14,043 emails. The emails were sent between 2011 and 2013, and have attachments, anonymous customer information and PII. The extraction process is described in [ 118 , 119 ].

MovieLens Dataset . The GroupLens Research has collected and made available rating datasets from the MovieLens website ( https://movielens.org ). The datasets were collected over various periods of time, depending on the size of the set. The MovieLens 20 M contains 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users collected from January 1995 to March 2015 [ 120 ].

Netflix . The Netflix dataset Footnote 6 consists of listings of all the movies and TV shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.

Twitter and Sina Weibo are two of the most influential social network media platforms in the world. Authors in the selected papers have either used crawlers or APIs to get sample data from these sources.

PhishTank , Footnote 7 DeltaPhish [ 121 ], Phish-Labls Footnote 8 and Anti-Phishing Working Group (APWG Footnote 9 ) are anti-phishing resources that publicly report phishing web pages in an effort to reduce fraud and identity theft caused by phishing and related incidents.

YELP Footnote 10 and delicious.com Footnote 11 publish crowd-sourced reviews about businesses. Similar to Twitter and Sina Weibo, APIs and crawlers may be used to extract data from these sources.

Usage of spam and phishing datasets and sources in selected papers

A breakdown of the usage of the described Spam and Phishing datasets in the selected papers is shown in Fig. 4 , we also provide an overview of the Spam and Phishing datasets in Table 16 . We observe that, in the revised works, there is no tendency towards using one or two specific datasets when tackling spam and/or phishing. In effect, the majority of the datasets are used in a single publication and only four, i.e. WEBSPAM-UK2006, Spam Email, SinaWeibo and “gold standard,” out of nineteen are used in two papers as shown in Fig. 4 . Additionally, except for the “gold standard” dataset, none of these datasets is balanced.

4.1.3 Malware datasets and sources

Georgia Tech Packed-Executable Dataset . The Georgia Tech Packed-Executable dataset [ 122 ] was published in 2008. It consists of 2598 packed viruses collected from the Malfease Project dataset (http://malfease.oarci.net), and 2231 non-packed benign executables collected from a clean installation of Windows XP Home plus several common user applications. The authors also generated 669 packed benign executables by applying 17 different executable packing tools freely available on the Internet to the executables in the Windows XP start menu. Of the 3267 packed executables in their collection, PEiD ( http://peid.has.it ), one of the most used signature-based detectors for packed executables, was able to detect only 2262 of them, whereas 1005 remained undetected. Therefore, those 1005 undetected samples were kept in the test and the train set contains 4493 samples: 2231 samples related to the non-packed benign executables and 2262 patterns related to the packed executables detected using PEiD.

The Malimg Dataset [ 123 ], proposed in 2011 by the University of California, Santa Barbara, contains 9458 malware images from 25 families.

The Malware Genome Project [ 124 ], proposed by researchers at the North Carolina State University in 2011, contains 1260 Android Malware samples belonging to 49 different malware families collected from August 2010 to October 2011.

Malheur [ 125 , 126 ], proposed in 2011, is a tool for the automatic analysis of malware behaviour in a sandbox environment.

Malicia Dataset . The Malicia dataset [ 127 , 128 ], published in 2013, comprises 11,688 malware binaries collected from 500 drive-by download servers over a period of 11 months in Windows Portable Executable format. The objective of their work was to identify hosts which spread malware in the wild and to collect samples of malware. In order to collect the samples of malware they set up a honeypot and clients in this honeypot were referring to the malware URL database for downloading and milking the website by resolving the IP address.

CTU-Malware . The CTU-Malware dataset [ 129 ], also compiled by the Czech Technical University, consists of hundreds of captures (called scenarios) of different malware communication samples. Both malware and normal samples are included in the dataset as shown in Table 17 .

In 2015, Microsoft launched the Microsoft Malware Classification Challenge , along with the release of a dataset [ 130 ] consisting of over 20,000 malware samples belonging to nine families. Each malware file includes an identifier, which is a 20-character hash value that uniquely identifies the file, and a class label, which is an integer that represents one of the nine families to which the malware may belong.

USTC-TFC2016 . The USTC-TFC2016 dataset [ 131 ], published in 2017, consists of ten types of malware traffic from public websites which were collected from a real network environment from 2011 to 2015. Along with such malicious traffic, the benign part contains ten types of normal traffic which were collected using IXIA BPS, a professional network traffic simulation equipment. The dataset’s size is 3.71 GB in the pcap format. The dataset's composition is shown in Table 18 .

CICAndMal2017 . The CICAndMal2017 android malware dataset, published in 2018 by the CIC [ 132 ], consists of four malware categories namely Adware, Ransomware, Scareware, and SMS Malware and 80 traffic features extracted using CICFlowMeter [ 103 , 105 ]. The dataset includes 5,065 benign apps from the Google play market published in 2015, 2016, and 2017 and 426 malware samples belonging to 42 unique malware families. The dataset is fully labelled and contains network traffic, logs, API/SYS calls, phone statistics, and memory dumps of malware families shown in Table 19 .

CICMalDroid2020 . Also published by the CIC in 2020, the CICMalDroid2020 dataset [ 133 , 134 ] consists of more than 17,341 Android samples from several sources collected from December 2017 to December 2018. It includes complete capture of static and dynamic features and contains samples spanning between five distinct categories: Adware, Banking malware, SMS malware, Riskware and Benign. Out of 17,341 samples, 13,077 samples ran successfully while the rest failed due to errors such as time-out, invalid APK files, and memory allocation failures. Of the 13,077 samples, 12% failed to be opened mostly due to an “unterminated string” error. From the 11,598 remaining samples, 470 extracted features comprise frequencies of system calls, binders, and composite behaviours, 139 extracted features comprise frequencies of system calls and 50,621 extracted features comprise static information, such as intent actions, permissions, permissions, sensitive APIs, receivers, etc. A brief composition of the dataset is shown in Table 20 .

VxHeavens Footnote 12 is a website dedicated to providing information about malware. The archive comprises over 17,000 programs belonging to 585 malware families (Trojan, viruses, worms).

Usage of malware datasets and sources in selected papers

We provide an overview of the Malware datasets in Table 21 . In Fig. 5 , we also show a breakdown of the usage of the described Malware datasets in the selected papers. For these datasets, we observe that out of eleven datasets four have been used in three publications, one was used in two publications and the remaining six have been used only once. In addition, none of these datasets are balanced.

4.1.4 Additional datasets and sources

IEEE Test Feeders . For nearly two decades, the Distribution System Analysis (DSA) Subcommittee’s Test Feeder Working Group (TFWG) has been constructing publicly available distribution test feeders for use by academics. These test feeders aim to create distribution system models that reflect a wide range of design options and analytic issues. The 13-bus and 123-bus Feeders are part of the Test Feeder systems created in 1992 to evaluate and benchmark algorithms in solving unbalanced three-phase radial systems. The DSA Subcommittee approved them during the 2000 Power and Energy Society (PES) Summer Meeting. Schneider et al. [ 135 ] summarize the TFWG efforts and intended uses of Test Feeders.

XSSed Footnote 13 project was created in February 2007. It is an archive of cross-site scripting (XSS) vulnerable websites and provides information on things related to XSS vulnerabilities.

The NeCTAR (National eResearch Collaboration Tools and Resources) cloud platform, Footnote 14 launched in 2012 by the Australian Research Data Commons, provides Australia’s research community with fast, interactive, self-service access to large-scale computing infrastructure, software and data.

The Mobile-Sandbox [ 136 ] proposed by the University of Erlangen-Nurember, Germany, in 2014 is a static and dynamic analyzer system designed to support analysts detect malicious behaviours of malware.

Credit Card Fraud . The dataset has been collected and analyzed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection [ 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 ]. The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) accounts for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, the original features and more background information about the data are not provided due to confidentiality issues. The only features not transformed with PCA are ’Time’ and ’Amount’. Feature ’Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ’Amount’ is the transaction Amount.

Twitter ISIS Dataset . The Twitter ISIS dataset [ 84 ], published in 2018, consists of ISIS-related tweets/retweets in Arabic gathered from Feb. 2016 to May 2016. The dataset includes tweets and the associated information such as user ID, re-tweet ID, hashtags, number of followers, number of followees, content, date, and time. About 53 M tweets are collected based on the 290 hashtags such as State of the Islamic-Caliphate, and Islamic State. Table 22 provides a brief overview of the Twitter ISIS dataset composition.

Italian Retweets Timeseries . The Italian Retweets Timeseries dataset [ 146 ], published in 2019, contains temporal data of about 5,121,132 retweets from 47,947 users taken from the Italian Twittersphere published between 18/06/2018 and 01/07/2018.

Usage of additional datasets and sources in selected papers

The breakdown of the usage of the additional datasets in the selected papers is shown in Fig. 6 .

4.2 Performance assessment metrics

Frequently, a model’s performance is evaluated by constructing a confusion matrix [ 147 ], shown in Table 23 , and calculating several metrics from the values of the confusion matrix. Table 24 shows the metrics commonly used to evaluate the performance of ML models. TP represents the true positives, the samples predicted as malicious or attacks that were truly malicious, TN the true negatives, the samples predicted as benign that were truly benign, FP the false positives, the samples predicted as attacks that were in fact benign, and FN the false negatives, the samples predicted as benign that were in fact attacks or malicious.

The accuracy score represents the fraction of correctly predicted samples, benign and malicious, and the error rate considers the misclassified samples. The accuracy metric may be misleading, especially when classes are highly imbalanced. The precision rate is the ratio of correctly predicted benign samples to all samples predicted as benign, and the sensitivity is the ratio of correctly predicted benign samples to samples to all benign samples. The Negative Predictive Value relates to the precision but considers the malicious samples; similarly, the specificity relates to the sensitivity but also considers the malicious samples. The False Positive (Negative) Rate is the ratio of malicious (benign) samples predicted as benign (malicious) to all the malicious (benign) samples. The $F_1$ -score is the harmonic mean of the precision and recall scores. This metric aggregates two metrics to provide a more global view of the performance. The Geometric-Mean measures how balanced the prediction performances are on both the majority and minority classes.

The kappa ( $\kappa $ ) statistic, introduced in [ 148 ], considers a model prequential accuracy, $p_0$ , and the probability of randomly guessing a correct prediction, $p_c$ . If the model is always correct, $\kappa =1$ , and if the predictions are similar to random guessing, then $\kappa =0$ . A $\kappa < 0$ indicates less agreement than would be expected by chance alone. The Matthews Correlation Coefficient (also known as phi coefficient or mean square contingency coefficient), introduced in [ 149 ], may be seen as a discretization of the Pearson Correlation Coefficient [ 150 ], or Pearson’s r , for a binary confusion matrix. It measures the difference between predicted and actual values and returns a value between $-1$ and $+1$ , where $-1$ indicates a completely incorrect classifier and 1 indicates the exact opposite.

Researchers also use graphical-based metrics to observe the performance. However, these metrics make the comparison between different models more complex. For this reason, summarizations of graphical-based metrics are used. An example of such metrics is the receiver operating characteristic curve, or ROC curve, which provides a graphical representation of a binary classifier system’s diagnostic performance when its discrimination threshold is modified. The Area Under the ROC (AUC ROC or AUROC) represents the probability that a uniformly drawn random positive sample is ranked higher than a consistently drawn random negative sample. Like the ROC, the Precision-Recall Curve (PRC) employs multiple thresholds on the model’s predictions to compute distinct scores for precision and recall. Because computing the Area Under the PRC (AUPRC) is not as straightforward as the AUROC computation process, the interested reader is referred to [ 151 ] where a review of the main solutions proposed to compute the AUPRC is presented. Finally, training time and inference time are the time required to build a model and provide predictions, respectively.

As seen in Fig. 7 , where we present a breakdown of the usage of the evaluation metrics in the selected papers, the ACC is the most used of the 15 metrics considered for evaluation in the selected papers. In 108 out of the 210 selected papers, or 22.1%, the ACC is used for evaluation. It is followed by the DR which has been used in 100 papers, or 20.5%, the PPV which has been used in 69 papers, or 14.1% and the $F_1$ -score which has been used in 61 papers, or 12.5%. As highlighted in Sect. 4.1 , except for the “gold standard” dataset, none of the presented datasets is balanced, which points that the ACC measure is not a suitable metric for performance assessment. The DR, PPV and $F_1$ -score, however, are more suitable metrics than the accuracy as they consider the class imbalance in datasets. In cyber-security, the DR is useful as there is a high cost associated to attacks, similarly the PPV is an important metric to consider as a low PPV indicates that benign samples or transactions are being flagged as attacks which renders the ML model useless. Due to the imbalanced nature of cyber-security datasets as seen in Sect. 4.1 , the $F_1$ -score is a useful assessment metric as it simply balances the DR and PPV. The least used metrics are the NPV and $\kappa $ -score, which have both been used only once in the selected papers. The NPV is proportional to the frequency of attacks in the dataset, in other words, it is sensitive to imbalanced datasets. As a result, if the prevalence of attacks in the training dataset differs from the prevalence of attacks in the actual world, the computed NPV may be inaccurate. That is, as the prevalence of attacks decreases, the NPV increases because there are more true negatives for every false negative. This is because a false negative would imply that a data point is actually an attack, which is improbable given the scarcity of attacks [ 152 ]. Similarly, the $\kappa $ -score is also sensitive to imbalanced datasets, therefore it is not suitable in the cyber-security domain where attacks are less frequent than benign samples or transactions. Finally, the time complexity (training and inference) is only reported in 2.7% of the selected papers.

Usage of evaluation metrics in selected papers

5 Open issues and challenges

This section answers our third research question and presents the open challenges found in the literature. We cover open issues and challenges in the areas of the datasets and assessment metrics used, review the learnt lessons and recommend future research directions. Finally, we also discuss the challenge of the gap between research and practice in the field of cyber-security, particularly in the application of ML.

5.1 Datasets and repositories

In Sect. 4.1 , we have described 45 datasets, repositories and sources. We summarize the key issues found related to the datasets in this subsection.

Over 70 of the 100 reviewed articles focusing on intrusion detection used either the KDD’99 or the NSL-KDD datasets which are closed, anonymized, and outdated (over 20 years old) datasets. Similarly, the most recent Spam and Phishing email dataset used in the selected papers is from 2013. Therefore it is possible that some of the parts under consideration are no longer relevant due to changes in attack vectors and additional factors such as availability and comparability. Additionally, the use of outdated datasets hinders the ability to generalize the results to current real-world scenarios [ 153 ].

Besides being outdated, both the Spam and Phishing datasets used in the selected papers, except for the TREC and WEBSPAM-UK, contain less data when compared to the intrusion datasets. They comprise 5000 or fewer samples, with the “gold standard” dataset containing only 1600 samples.

Moreover, in addition to not only containing synthetically generated but also manually labelled data, the class imbalance in these datasets is not representative when compared to real-world scenarios, rendering the proposed approaches ineffective when applied to real data. This is one of the primary reasons why most academic methods are not implemented in practice.

As shown in Table 15 , apart from the KDD’99, NSL-KDD, UNIBS2009, AWID2015, UNSW-NB15 and UGR’16 datasets, the datasets in the selected papers are not originally split into train and test partitions, but even then, authors train and test their proposed approaches on random and narrower partitions of these datasets or train/test partitions

Most of the data collected from traffic or spam and/or phishing feeds are frequently kept private, making it impossible for other authors to reproduce results.

There are no updated, standard and public benchmark datasets for the different cyber-security problems. Due to these facts, accurate comparisons of the approaches are impossible without having to re-implement them and obtain the data from sources such as traffic or phishing feeds.

In computer science, the quality of the output is decided by the quality of the input, as stated by George Fuechsel in the concept “Garbage in, Garbage out.”. We acknowledge the limitations of the reviewed datasets and repositories and advocate the need for the development of more up-to-date, standardized, and open benchmark cyber-security datasets that reflect the current state of cyber threats and attack vectors, those datasets should also be adequately separated into training/testing and validation partitions. Additionally, we recommend that future studies should consider using multiple datasets and testing the models on a variety of scenarios to improve the generalizability of the results and allow proper evaluation, comparison, and real-world applications.

5.2 Performance assessment metrics

In Sect. 4.2 , we presented the 15 metrics used in the selected papers for assessing the performance of the SSL models built on the datasets presented in Sect. 4.1 . In this subsection, we present an overview of the significant issues identified in relation to the performance assessment metrics.

Throughout the selected papers, we have noticed that certain important assessment metrics are not used in most of the papers. For example, in [ 154 ], only the AUROC is reported, in [ 155 , 156 ], only DR and FPR or FAR are reported, and in [ 157 ] only DR and ACC are reported. This shows that authors are giving more importance to certain metrics while overlooking others, such as PPV and $F_1$ -score, which should be used in conjunction as they consider the class imbalance in datasets.

The accuracy is a misleading metric in imbalanced settings, however, it has been used alone in [ 158 , 159 , 160 , 161 ]. Furthermore, the accuracy can be inadequate for use in the real world, where data is typically unbalanced. In light of this, it is important to conduct assessments using realistic deployment situations with unbalanced data and adequate assessment frameworks. The chosen metrics must accommodate the needs of the target audience.

Only 2.7% reported time complexity measurements, which is an important metric in the cyber-security domain where attack should be detected as soon as possible and static models often need to be rebuilt from scratch to detect unseen attacks, more importance should be given to this assessment metric as it is imperative to detect and mitigate those attacks in a timely manner.

An excessive amount of false positives may be detrimental to cyber-security because they increase the likelihood that users will ignore or dismiss alarms, leaving them vulnerable to serious cyber threats that they might otherwise have caught. The fact that out of the 210 selected papers, only 59, or 12.1%, measure the FAR–an assessment metric that should be given more weight–demonstrates that it is not being prioritized enough.

The issue of imbalanced data in cyber-security has been the subject of several recent studies. In particular, researchers have explored alternative techniques to address this issue such as cost-sensitive learning [ 162 ], which assigns higher costs to the minority class (i.e., the class with fewer instances) than the majority class to encourage the model to focus more on correctly classifying instances of the minority class, thus improving the performance on the rare class. Additional techniques include data augmentation which can be done through methods such as over-/under-sampling, ensemble methods such as bagging and boosting, or using scalar and graphical metrics which are adequate for imbalanced settings [ 163 ].

5.3 Bridging the gap between ML-based cyber-security research and practice

The field of cyber-security faces a significant challenge due to the gap between research and practice, especially in the applications of ML [ 153 , 164 ]. While several industries have successfully deployed ML-based solutions in the field of cyber-security (Sect. 2.3 ), and research has made significant advances in developing new ML algorithms, the ML algorithms developed by academia are often not practical to implement in real-world scenarios due to scalability, data availability, and regulatory compliance issues. Moreover, the lack of communication and collaboration between academic researchers and industry practitioners adds to the disconnect. As a result, several ML-based cyber-security solutions have not been widely adopted in the industry. This gap underscores the need for increased knowledge sharing and cooperation between researchers and practitioners, a better understanding of the industrial requirements and constraints from academia, as well as a good understanding of ML concepts from both academia and practitioners [ 165 , 166 ].

To address this gap, there is a need for more interdisciplinary collaboration and partnerships between academia and industry. Collaboration can help researchers better understand the practical challenges faced by practitioners, while practitioners can provide researchers with access to real-world data and feedback on the effectiveness of ML algorithms in practice [ 164 ]. Another way to bridge the gap is through the development of standardized evaluation frameworks for ML-based cyber-security solutions as discussed in Sect. 5.2 . Standardization can help ensure that ML algorithms are evaluated in a consistent and transparent manner, making it easier for practitioners to understand the effectiveness of a particular solution.

Moreover, it is important to develop ML algorithms that are explainable and interpretable. Several AI algorithms used in cyber-security and other fields, in general, are considered “black boxes” [ 167 ], meaning it can be difficult to understand how they make decisions. This lack of transparency can be a barrier to adoption, as it can be difficult for practitioners to trust and validate the results produced by these algorithms. The development of more explainable and interpretable ML algorithms can help address this issue [ 168 , 169 , 170 ].

In summary, bridging the gap between research and practice in ML-based cyber-security requires interdisciplinary collaboration, standardized evaluation frameworks, and the development of explainable and interpretable ML algorithms.

6 Conclusion

In this survey, we have reviewed the datasets, repositories and performance assessment metrics used in the state-of-the-art applications of SSL methods in the field of cyber-security, namely network intrusion detection, spam and phishing detection, malware detection and categorization, and additional cyber-security areas. Good datasets are necessary for building and evaluating strong SSL models. Our main contribution is an extensive analysis of the cyber-security datasets and repositories. This in-depth analysis attempts to assist readers in identifying datasets and sources that are appropriate for their needs. The review of the datasets reveals that the research community has recognized that there is a lack of publicly available cyber-security datasets and has recently attempted to address this gap by publishing several datasets. Because multiple research organizations are working in this field, further intrusion detection datasets and advancements can be expected in the near future.

We investigated the datasets used in the different papers applying SSL methods for cyber-attack prevention as improvements over conventional security systems and either fully SL or UL methods which would not be adequate in the cyber-security field, where labelled data is often scarce and difficult to obtain. We have reviewed the subcategories of SSL methods and provided a taxonomy based on previous studies. To the best of our knowledge, this is the first work that analyzes the datasets used in the literature applying SSL methods for intrusion, spam, phishing, and malware detection. We have also summarized multiple performance evaluation metrics used for assessing the build models. In addition, where applicable, we have provided brief descriptions, compositions and trends of the datasets used in the reviewed literature. There are no up-to-date and representative benchmark datasets available for each threat domain. However, the datasets reviewed, despite being outdated, are still heavily used in research. Furthermore, most of the publicly available datasets are either imbalanced or not initially split into train/test/validation datasets, making comparing results a tedious task. Moreover, we have outlined the primary open challenges and issues identified in the literature, highlighted strategies for bridging the gap between research and practice, and compiled a comprehensive bibliography in this area. The aforementioned issues and challenges deserve particular attention in future research. Finally, we acknowledge the potential constraints associated with literature reviews, such as limitations on search thoroughness and content selection, which may influence our research; therefore, we made our best efforts to minimize these limitations.

https://securityintelligence.com/new-banking-trojan-icedid-discovered-by-ibm-x-force-research/ .

https://darktrace.com/news/darktrace-stops-ransomware-attack-at-uk-construction-company .

https://feedzai.com/blog/machine-learning-rules-vs-models-in-anti-money-laundering-platforms/ .

https://bwnews.pr/3YEZbhg .

https://www.kaggle.com .

https://www.kaggle.com/datasets/shivamb/netflix-shows .

https://www.phishtank.com .

https://www.phishlabs.com .

https://apwg.org .

https://www.yelp.com .

https://www.delicious.com.au .

https://vxug.fakedoma.in/archive/VxHeaven/index.html .

http://www.xssed.com/archive .

http://www.nectar.org.au/ .

Babbage C. Passages from the life of a philosopher. Longman, Green, Longman, Roberts, Green. OCLC: 258982

Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A. A survey of network-based intrusion detection data sets. Comput Secur. 2019;86:147–67. https://doi.org/10.1016/j.cose.2019.06.005 .

Article Google Scholar

Glass-Vanderlan TR, Iannacone MD, Vincent MS, Chen Qian, Bridges RA. A survey of intrusion detection systems leveraging host data. arXiv. 2018 . https://doi.org/10.48550/arXiv.1805.06070 .

Shaukat K, Luo S, Varadharajan V, Hameed IA, Xu M. A survey on machine learning techniques for cyber security in the last decade. IEEE Access. 2020;8:222310–54. https://doi.org/10.1109/ACCESS.2020.3041951 .

Aslan A, Samet R. A comprehensive review on malware detection approaches. IEEE Access. 2020;8:6249–71. https://doi.org/10.1109/ACCESS.2019.2963724 .

Nisioti A, Mylonas A, Yoo PD, Katos V. From intrusion detection to attacker attribution: a comprehensive survey of unsupervised methods. IEEE Commun Surv Tutor. 2018;20(4):3369–88. https://doi.org/10.1109/COMST.2018.2854724 .

Ucci D, Aniello L, Baldoni R. Survey of machine learning techniques for malware analysis. Comp Sec. 2019;81:123–47. https://doi.org/10.1016/j.cose.2018.11.001 .

Martins N, Cruz JM, Cruz T, Henriques Abreu P. Adversarial machine learning applied to intrusion and malware scenarios: a systematic review. IEEE Access. 2020;8:35403–19. https://doi.org/10.1109/ACCESS.2020.2974752 .

Bhuyan MH, Bhattacharyya DK, Kalita JK. Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor. 2014;16(1):303–36. https://doi.org/10.1109/SURV.2013.052213.00046 .

Jalil S, Usman M. A review of phishing URL detection using machine learning classifiers. In: Arai K, Kapoor S, Bhatia R, editors. Intelligent systems and applications. Advances in intelligentadvances in intelligent systems and computing. Amsterdam: Springer; 2021. p. 646–65. https://doi.org/10.1007/978-3-030-55187-2_47 .

Chapter Google Scholar

Mitchell TM. Machine learning. McGraw-Hill series in computer science. New York: McGraw-Hill; 1997.

Google Scholar

Flach P. Machine learning: the art and science of algorithms that make sense of data. New York: Cambridge University Press; 2012.

Book MATH Google Scholar

Russell SJ, Norvig P. Artificial intelligence: a modern approach. Englewood Cliffs: Prentice Hall series in artificial intelligence. Prentice Hall; 1995.

MATH Google Scholar

Hinton GE, Sejnowski TJ, editors. Unsupervised learning: foundations of neural computation. Computational neuroscience. Cambridge: MIT Press; 1999.

Chapelle O, Schölkopf B, Zien A. Semi-supervised Learning. Adaptive computation and machine learning. Cambridge: MIT Press; 2006.

van Engelen JE, Hoos HH. A survey on semi-supervised learning. Mach Learn. 2020;109(2):373–440. https://doi.org/10.1007/s10994-019-05855-6 .

Article MathSciNet MATH Google Scholar

Zhu X. Semi-supervised learning with graphs. PhD thesis (May 2005).

Hoi SCH, Sahoo D, Lu J, Zhao P. Online learning: a comprehensive survey. arXiv:1802.02871 . 2018.

Schatz D, Bashroush R, Wall J. Towards a more representative definition of cyber security. J Digital Foren Sec Law. 2017. https://doi.org/10.15394/jdfsl.2017.1476 .

Alazab M, Tang M. Deep learning applications for cyber security. Advanced sciences and technologies for security applications. Amsterdam: Springer; 2019. https://doi.org/10.1007/978-3-030-13057-2 .

Book Google Scholar

Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G, Roli F. Evasion attacks against machine learning at test time. In: Blockeel H, Kersting K, Nijssen S, Elezn F, editors. Machine learning and knowledge discovery in databases. Lecture notes in computer science. Amsterdam: Springer; 2013. https://doi.org/10.1007/978-3-642-40994-3_25 .

Lipton ZC. The mythos of model interpretability. arXiv. 2017;10:11 . https://doi.org/10.48550/arXiv.1606.03490 .

Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7(85):2399–434.

MathSciNet MATH Google Scholar

Chapelle O, Weston J, Schölkopf B. Cluster kernels for semi-supervised learning. In Becker S, Thrun S, Obermayer K, editors. Advances in neural information processing systems, vol. 15. MIT Press; 2002. 8 pp. https://doi.org/10.5555/2968618.2968693 .

Bair E. Semi-supervised clustering methods: semi-supervised clustering methods. Wiley Interdisc Rev Comput Stat. 2013;5(5):349–61. https://doi.org/10.1002/wics.1270 .

Song Z, Yang X, Xu Z, King I. Graph-based semi-supervised learning: a comprehensive review. arXiv. 2021 . https://doi.org/10.48550/arXiv.2102.13303 .

Zhu X. Semi-supervised learning literature survey, 2005;60.

Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artific Intell Mach Learn. 2009;3(1):1–130. https://doi.org/10.2200/S00196ED1V01Y200906AIM006 .

Article MATH Google Scholar

Basu S, Bilenko M, Mooney RJ. Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering. 2003;8.

Grira N, Crucianu M, Boujemaa N. Unsupervised and semi-supervised clustering: a brief survey. 12; 2004.

Triguero I, García S, Herrera F. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst. 2015;42(2):245–84. https://doi.org/10.1007/s10115-013-0706-y .

Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. Cambridge: Association for Computational Linguistics; 1995. https://doi.org/10.3115/981658.981684 .

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324 .

Vapnik VN. Statistical learning theory. Adaptive and learning systems for signal processing, communications, and control. New York: Wiley; 1998.

Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. Madison: ACM Press; 1998. p. 92–100. https://doi.org/10.1145/279943.279962 .

Mitchell TM. The role of unlabeled data in supervised learning. In Larrazabal J, Miranda LAP, editors. The role of unlabeled data in supervised learning. Dordrecht: Springer Netherlands; 2004. pp 103–111

Zhou Z-H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng. 2005;17(11):1529–41. https://doi.org/10.1109/TKDE.2005.186 .

Li M, Zhou Z-H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern A Syst Human. 2007;37(6):1088–98. https://doi.org/10.1109/TSMCA.2007.904745 .

Yu S, Krishnapuram B, Rosales R, Rao RB. Bayesian co-training. J Mach Learn Res. 2011;12(80):2649–80.

Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. Helsinki: ACM Press; 2008. https://doi.org/10.1145/1390156.1390294 .

Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: explicit invariance during feature extraction. International conference on machine learning. 2011; 8.

Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv. 2013 . https://doi.org/10.48550/arXiv.1301.3781 .

Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Doha: Association for Computational Linguistics; 2014. p. 1532–43. https://doi.org/10.3115/v1/D14-1162 .

Dara R, Kremer SC, Stacey DA. Clustering unlabeled data with soms improves classification of labeled real-world data. Comp Sec. 2002;3:2237–22423. https://doi.org/10.1109/IJCNN.2002.1007489 .

Demiriz A, Bennett KP, Embrechts MJ. Semi-supervised clustering using genetic algorithms. 1999, 809–814.

Goldberg A, Zhu X, Singh A, Xu Z, Nowak R. Multi-manifold semi-supervised learning. In: van Dyk, D., Welling, M. (eds.) Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 5, pp. 169–176. PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida. 2009.

Basu S, Banerjee A, Mooney RJ. Semi-supervised clustering by seeding. International conference on machine learning. 2002.

Wagstaff K, Cardie C, Rogers S, Schrödl S. Constrained k-means clustering with background knowledge. ICML ’01. San Francisco: Morgan Kaufmann Publishers Inc.; 2001. p. 577–84. https://doi.org/10.5555/645530.655669 .

Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. Proc Int Conf Data Mining. 2004. https://doi.org/10.1137/1.9781611972740.31 .

Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. International conference on machine learning. 2002;8.

Jain AK. Data clustering: 50 years beyond k-means. Pattern Recogn Lett. 2010;31(8):651–66. https://doi.org/10.1016/j.patrec.2009.09.011 .

Davidson I, Ravi SS. Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Jorge AM, Torgo L, Brazdil P, Camacho R, Gama J, editors. Knowledge discovery in databases: PKDD. Berlin: Springer; 2005. p. 59–70. https://doi.org/10.1007/11564126_11 .

Davidson I, Ravi SS. Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Mining Knowl Discov. 2009;18(2):257–82. https://doi.org/10.1007/s10618-008-0103-4 .

Article MathSciNet Google Scholar

Miyamoto S, Terami A. Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints. 2010; pp. 1–6.

Miyamoto S, Terami A. Constrained agglomerative hierarchical clustering algorithms with penalties. 2011, pp. 422–427.

Zhao H, Qi Z. Hierarchical agglomerative clustering with ordering constraints. IEEE. 2010. https://doi.org/10.1109/WKDD.2010.123 .

Hamasuna Y, Endo Y, Miyamoto S. Semi-supervised agglomerative hierarchical clustering with ward method using clusterwise tolerance. MDAI’11. Berlin: Springer; 2011. p. 103–13.

Hamasuna Y, Endo Y, Miyamoto S. On agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints. J Adv Comput Intell Intell Inform. 2012;16(1):174–9. https://doi.org/10.20965/jaciii.2012.p0174 .

Bade K, Nurnberger A. Personalized hierarchical clustering. Hong Kong: IEEE; 2006. p. 181–7. https://doi.org/10.1109/WI.2006.131 .

Zheng L, Li T. Semi-supervised hierarchical clustering. 2011 IEEE 11th international conference on data mining. 2011, pp. 982–991.

Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004;2(4):108. https://doi.org/10.1371/journal.pbio.0020108 .

Chong Y, Ding Y, Yan Q, Pan S. Graph-based semi-supervised learning: a review. Neurocomputing. 2020;408:216–30. https://doi.org/10.1016/j.neucom.2019.12.130 .

Moore R. Cybercrime: investigating high-technology computer crime, 2nd edn. Anderson Pub. OCLC: ocn659239788.

Sharma DSK. Cyber security: a legal perspective. 2017. https://www.ripublication.com/irph/ijcis17/ijcisv9n1_01.pdf .

Gladden M. The handbook of information security for advanced neuroprosthetics. 2017.

Daniel L, Daniel L. Digital forensics for legal professionals: understanding digital evidence from the warrant to the courtroom. Amsterdam: Elsevier; 2012. https://doi.org/10.1016/C2010-0-67122-7 .

Casey E. Handbook of digital forensics and investigation. Academic. 2010. https://doi.org/10.1016/C2009-0-01683-3 .

Security IBM. X-Force threat intelligence index. 2021;2021:50.

IBM Security: cost of a data breach report 2021. Risk quantification, 73. 2021.

Pirc J, DeSanto D, Davison I, Gragido W. 8—kill chain modeling. In: Pirc J, DeSanto D, Davison I, Gragido W (eds) Threat forecasting, pp. 115–127. Syngress.

Mukkamala S, Janoski G, Sung A. Intrusion detection using neural networks and support vector machines. In: Proceedings of the 2002 international joint conference on neural networks. IJCNN’02 (Cat. No.02CH37290), vol. 2, pp. 1702–17072. https://doi.org/10.1109/IJCNN.2002.1007774 . ISSN: 1098-7576

García-Teodoro P, Díaz-Verdejo J, Maciá-Fernández G, Vázquez E. Anomaly-based network intrusion detection: techniques, systems and challenges. 28(1): 18–28. https://doi.org/10.1016/j.cose.2008.08.003 .

Security IBM. IBM Security X-Force Threat Intelligence Index. 2022;2022:59. https://www.ibm.com/downloads/cas/ADLMYLAZ

Alkhalil Z, Hewage C, Nawaf L, Khan I. Phishing attacks: a recent comprehensive study and a new anatomy. 2021. https://doi.org/10.3389/fcomp.2021.563060 .

Jáñez-Martino F, Alaiz-Rodríguez R, González-Castro V, Fidalgo E, Alegre E. A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artif Intell Rev. 2023;56(2):1145–73. https://doi.org/10.1007/s10462-022-10195-4 .

Nguyen TTT, Armitage G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor. 2008;10(4):56–76. https://doi.org/10.1109/SURV.2008.080406 .

Levine BN, Shields C, Margolin NB. A survey of solutions to the sybil attack. Amherst: University of Massachusetts Amherst; 2006. p. 224.

Riyanto A, Arifin Z. Pump-dump manipulation analysis: the influence of market capitalization and its impact on stock price volatility at indonesia stock exchange. Rev Integr Bus Econ Res. 2018;7(3):129–142. https://www.proquest.com/docview/2088916427 .

Akram T, RamaKrishnan S, Naveed M. Assessing four decades of global research studies on stock market manipulations: a sceintometric analysis. J Financ Crime. 2021. https://doi.org/10.1108/JFC-08-2020-0163 .

Ferrara E, Varol O, Davis C, Menczer F, Flammini A. The rise of social bots. Commun ACM. 2016;59(7):96–104. https://doi.org/10.1145/2818717 .

Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newslett. 2017;19(1):22–36. https://doi.org/10.1145/3137597.3137600 .

Sundar AP, Li F, Zou X, Gao T, Russomanno ED. Understanding shilling attacks and their detection traits: a comprehensive survey. IEEE Access. 2020;8:171703–15. https://doi.org/10.1109/ACCESS.2020.3022962 .

Alvari H, Shaabani E, Shakarian P. Early identification of pathogenic social media accounts. 2018, pp. 169–174. https://doi.org/10.1109/ISI.2018.8587339 .

Shaabani E, Guo R, Shakarian P. Detecting pathogenic social media accounts without content or network structure. South Padre Island: IEEE; 2018. p. 57–64. https://doi.org/10.1109/ICDIS.2018.00016 .

Consumer Action: Credit card fraud training manual, 12; 2009. https://www.consumer-action.org/downloads/english/2009_CCF_Lesson_Plan_web.pdf . Accessed 24 Oct 2022.

McDaniel P, McLaughlin S. Security and privacy challenges in the smart grid. 2009;7(3):75–7. https://doi.org/10.1109/MSP.2009.76 .

IBM Security: IBM security X-force threat intelligence index 2023. 2023. https://www.ibm.com/downloads/cas/DB4GL8YM

Kitchenham B, Charters S. Guidelines for performing systematic literature reviews in software engineering. 2007. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf .

Veritas Health Innovation: Covidence, Melbourne, Australia 2022. https://www.covidence.org/

Fitriani S, Mandala S, Murti MA. Review of semi-supervised method for intrusion detection system. In: 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), pp. 36–41. https://doi.org/10.1145/382912.382914 .

Lee W, Stolfo SJ. A framework for constructing features and models for intrusion detection systems. Trans Inf Syst Secur. 2000; 3(4): 227–261. https://doi.org/10.1109/APMediaCast.2016.7878168 .

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. IEEE. 2009. https://doi.org/10.1109/CISDA.2009.5356528 .

Moore AW, Zuev D. Internet traffic classification using bayesian analysis techniques, 11. 2005. https://dl.acm.org/doi/10.1145/1064212.1064220

Pang R, Allman M, Bennett M, Lee J, Paxson V, Tierney B. A first look at modern enterprise traffic. ACM Press. 2005;2005:1. https://doi.org/10.1145/1330107.1330110 .

UCSD—Center for Applied Internet Data Analysis: CAIDA DDoS 2007 Attack Dataset (2007-08-04 to 2007-08-04). IMPACT, 2007. https://www.impactcybertrust.org/dataset_view?idDataset=117

Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K. Statistical analysis of honeypot data and building of kyoto 2006+ dataset for nids evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. BADGERS ’11, pp. 29–36. Association for Computing Machinery, New York, NY, USA, 2011. https://doi.org/10.1145/1978672.1978676 .

Sangkatsanee P, Wattanapongsakorn N, Charnsripinyo C. Practical real-time intrusion detection using machine learning approaches. Comput Commun. 2011;34:2227–35. https://doi.org/10.1016/j.comcom.2011.07.001 .

Gringoli F, Salgarelli L, Dusi M, Cascarano N, Risso F, Claffy CK. Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput Commun Rev. 2009;39(5):12–8. https://doi.org/10.1145/1629607.1629610 .

Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. 2012;31(3):357–74. https://doi.org/10.1016/j.cose.2011.12.012 .

García S, Grill M, Stiborek J, Zunino A. An empirical comparison of botnet detection methods. Comp Sec. 2014;45:100–23. https://doi.org/10.1016/j.cose.2014.05.011 .

Morris T, Vaughn R, Dandass YS. A testbed for scada control system cybersecurity research and pedagogy. Oak Ridge: ACM Press; 2011. p. 1. https://doi.org/10.1145/2179298.2179327 .

Kolias C, Kambourakis G, Stavrou A, Gritzalis S. Intrusion detection in 80211 networks: empirical evaluation of threats and a public dataset. IEEE Commun Surv Tutor. 2016;18(1):184–208. https://doi.org/10.1109/COMST.2015.2402161 .

Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA. Characterization of encrypted and VPN traffic using time-related features, Funchal, Madeira, Portugal, pp. 407–414. https://doi.org/10.5220/0005740704070414 .

Sharafaldin I, Habibi Lashkari A, Ghorbani AA. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Funchal: Science and Technology Publications; 2018. p. 108–16. https://doi.org/10.5220/0006639801080116 .

Habibi Lashkari A, Draper Gil G, Mamun M, Ghorbani A. Characterization of tor traffic using time based features. https://doi.org/10.5220/0006105602530262 .

Maciá-Fernández G, Camacho J, Magán-Carrión R, García-Teodoro P, Therón R. Ugr16: a new dataset for the evaluation of cyclostationarity-based network IDSs. 2018; 73: 411–424. https://doi.org/10.1016/j.cose.2017.11.004 .

Mirsky Y, Doitshman T, Elovici Y, Shabtai A. Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv. 2018;10:11. https://doi.org/10.48550/arXiv.1802.09089 .

Netresec: Public PCAP files for download, Olstavagen 6, 74961 Orsundsbro, Sweden. 2022. https://www.netresec.com/?page=PcapFiles .

Cho K, Mitsuya K, Kato A. Traffic data repository at the wide project, 8. 2000. https://dl.acm.org/doi/10.5555/1267724.1267775 .

Hopkins M, Reeber E, Forman G, Suermondt J. Spambase Data Set. 1999. http://archive.ics.uci.edu/ml/datasets/Spambase .

Androutsopoulos I, Koutsias J, Chandrinos KV, Paliouras G, Spyropoulos CD. An evaluation of naive bayesian anti-spam filtering. 2000. https://arxiv.org/abs/cs/0006013 .

Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S. A reference collection for web spam. SIGIR Forum. 2006;40:2006.

Cormack GV. Trec 2006 spam track overview. Text Retrieval Conference.2006.

Almeida TA, Gómez JM, Yamakami A. Contributions to the study of sms spam filtering: new collection and results, pp. 259–262. 2011.

Almeida TA, Hidalgo JMG, Silva TP. Towards SMS spam filtering: results under a new dataset. Int J Inf Secur Sci. 2013;2:1–18.

Hidalgo JMG, Almeida TA, Yamakami A. On the validity of a new sms spam collection. Boca Raton: IEEE; 2012. p. 240–5. https://doi.org/10.1109/ICMLA.2012.211 .

Ott M, Choi Y, Cardie C, Hancock J. Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA. Association for Computational Linguistics; 2011. pp 309–319. https://aclanthology.org/P11-1032 .

Lee M, Lewis D. Clustering disparate attacks: mapping the activities of the advanced persistent threat. 22. 2011. https://www.virusbulletin.com/uploads/pdf/conference_slides/2011/Lee-VB2011.pdf .

Thonnard O, Bilge L, O’Gorman G, Kiernan S, Lee M. Industrial espionage and targeted attacks: understanding the characteristics of an escalating threat. In: Balzarotti D, Stolfo SJ, Cova M, editors. Research in attacks, intrusions, and defenses. Berlin: Springer; 2012. p. 64–85. https://doi.org/10.1007/978-3-642-33338-5_4 .

Harper FM, Konstan JA. The movielens datasets: history and context. ACM Trans Int Intell Syst. 2016;5(4):1–19. https://doi.org/10.1145/2827872 .

Corona I, Biggio B, Contini M, Piras L, Corda R, Mereu M, Mureddu G, Ariu D, Roli F. DeltaPhish: detecting phishing webpages in compromised websites. In: Foley SN, Gollmann D, Snekkenes E, editors. Computer security—ESORICS. Berlin: Springer; 2017. p. 370–88. https://doi.org/10.1007/978-3-319-66402-6_22 .

Perdisci R, Lanzi A, Lee W. Classification of packed executables for accurate computer virus detection. Pattern Recog Lett. 2008;29(14):1941–6. https://doi.org/10.1016/j.patrec.2008.06.016 .

Nataraj L, Karthikeyan S, Jacob G, Manjunath BS. Malware images: visualization and automatic classification VizSec ’11. New York: Association for llhinery; 2011. p. 1–7. https://doi.org/10.1145/2016904.2016908 .

Zhou Y, Jiang X. Dissecting android malware: characterization and evolution. San Francisco: IEEE; 2012. p. 95–109. https://doi.org/10.1109/SP.2012.16 .

Rieck K, Trinius P, Willems C, Holz T. Automatic analysis of malware behavior using machine learning. J Comput Sec. 2011;19(4):639–68. https://doi.org/10.3233/JCS-2010-0410 .

Rieck K. Malheur—automatic analysis of malware behavior. 2022. https://github.com/rieck/malheur .

Nappa A, Rafique MZ, Caballero J. Driving in the cloud: an analysis of drive-by download operations and abuse reporting. In: Rieck K, Stewin P, Seifert J-P, editors. Detection of intrusions and malware, and vulnerability assessment. Berlin: Springer; 2013. p. 1–20. https://doi.org/10.1007/978-3-642-39235-1_1 .

Nappa A, Rafique MZ, Caballero J. The malicia dataset: identification and analysis of drive-by download operations. Intl J Inf Sec. 2015;14(1):15–33. https://doi.org/10.1007/s10207-014-0248-7 .

Stratosphere: Stratosphere Laboratory Datasets. https://www.stratosphereips.org/datasets-overview . 2015. 24 Oct 2022.

Ronen R, Radu M, Feuerstein C, Yom-Tov E, Ahmadi M. Microsoft malware classification challenge. arXiv. 2018 . https://doi.org/10.48550/ARXIV.1802.10135 .

Wang W, Zhu M, Zeng X, Ye X, Sheng Y. Malware traffic classification using convolutional neural network for representation learning, pp. 712–717 (2017). https://doi.org/10.1109/ICOIN.2017.7899588 .

Lashkari AH, Kadir AFA, Taheri L, Ghorbani AA. Toward developing a systematic approach to generate benchmark android malware datasets and classification. Montreal: IEEE; 2018. p. 1–7. https://doi.org/10.1109/CCST.2018.8585560 .

Mahdavifar S, Abdul Kadir AF, Fatemi R, Alhadidi D, Ghorbani AA. Dynamic android malware category classification using semi-supervised deep learning, pp. 515–522 (2020). https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00094 .

Mahdavifar S, Alhadidi D, Ghorbani AA. Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder. J Netw Syst Manag. 2022;30(1):22. https://doi.org/10.1007/s10922-021-09634-4 .

Schneider KP, Mather BA, Pal BC, Ten C-W, Shirek GJ, Zhu H, Fuller JC, Pereira JLR, Ochoa LF, de Araujo LR, Dugan RC, Matthias S, Paudyal S, McDermott TE, Kersting W. Analytic considerations and design basis for the ieee distribution test feeders. IEEE Trans Power Syst. 2018;33(3):3181–8. https://doi.org/10.1109/TPWRS.2017.2760011 .

Spreitzenbarth M, Schreck T, Echtler F, Arp D, Hoffmann J. Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int J Inf Sec. 2015;14(2):141–53. https://doi.org/10.1007/s10207-014-0250-0 .

Carcillo F, Dal Pozzolo A, Le Borgne Y-A, Caelen O, Mazzer Y, Bontempi G. Scarff : a scalable framework for streaming credit card fraud detection with spark. Inf Fusion. 2018;41:182–94. https://doi.org/10.1016/j.inffus.2017.09.005 .

Carcillo F, Le Borgne Y-A, Caelen O, Bontempi G. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal. 2018;5(4):285–300. https://doi.org/10.1007/s41060-018-0116-z .

Carcillo F, Le Borgne Y-A, Caelen O, Kessaci Y, Oblé F, Bontempi G. Combining unsupervised and supervised learning in credit card fraud detection. Inf Sci. 2021;557:317–31. https://doi.org/10.1016/j.ins.2019.05.042 .

Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G. Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learn Syst. 2018;29(8):3784–97. https://doi.org/10.1109/TNNLS.2017.2736643 .

Dal Pozzolo A, Caelen O, Le Borgne Y-A, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. 2014;41(10):4915–28. https://doi.org/10.1016/j.eswa.2014.02.026 .

Lebichot B, Le Borgne Y-A, He-Guelton L, Oblé F, Bontempi G. Deep-learning domain adaptation techniques for credit cards fraud detection. In: Oneto L, Navarin N, Sperduti A, Anguita D, editors. Recent advances in big data and deep learning. Cham: Springer; 2020. https://doi.org/10.1016/j.eswa.2014.02.026 .

Lebichot B, Paldino GM, Siblini W, He-Guelton L, Oblé F, Bontempi G. Incremental learning strategies for credit cards fraud detection. Int J Data Sci Anal. 2021;12(2):165–74. https://doi.org/10.1007/s41060-021-00258-0 .

Pozzolo AD, Bontempi G. Adaptive machine learning for credit card fraud detection. PhD thesis. 2015.

Pozzolo AD, Caelen O, Johnson RA, Bontempi G. Calibrating probability with undersampling for unbalanced classification. Cape Town: IEEE; 2015. p. 159–66. https://doi.org/10.1109/SSCI.2015.33 .

Mazza M, Cresci S, Avvenuti M, Quattrociocchi W, Tesconi M. Italian retweets timeseries. Zenodo. 2019. https://zenodo.org/record/2653137 .

Swets JA. Measuring the accuracy of diagnostic systems. Science 1988;240(4857):1285–93. https://doi.org/10.1177/001316446002000104 .

Cohen J. A coefficient of agreement for nominal scales. Edu Psychol Meas. 1960;20(1):37–46. https://doi.org/10.1177/001316446002000104 .

Matthews BW. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim Biophys Acta (BBA) Protein Struct 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9 .

Pearson K. Note on regression and inheritance in the case of two parents. Proc R Soci Lond Ser. 1895;I(58):240–2.

Gaudreault J-G, Branco P, Gama J. An analysis of performance metrics for imbalanced classification. In: Soares C, Torgo L, editors. Discovery Science, vol. 12986. Berlin: Springer; 2021. p. 67–77. https://doi.org/10.1007/978-3-030-88942-5_6 .

Iverson GL. Negative predictive power. In: Kreutzer JS, DeLuca J, Caplan B, editors. Encyclopedia of clinical neuropsychology. Berlin: Springer; 2011. p. 1720–2. https://doi.org/10.1007/978-0-387-79948-3_1219 .

Bertoli GdC, Junior LAP, Verri FAN, Santos ALd, Saotome O. Bridging the gap to real-world for network intrusion detection systems with data-centric approach. 2021

Zavrak S, İskefiyeli M. Anomaly-based intrusion detection from network flow features using variational autoencoder. IEEE Access. 2020;8:108346–58. https://doi.org/10.1109/ACCESS.2020.3001350 .

Angiulli F, Argento L, Furfaro A. Exploiting n-gram location for intrusion detection. 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1093–1098. https://doi.org/10.1109/ICTAI.2015.155

Xian G. Cyber intrusion prevention for large-scale semi-supervised deep learning based on local and non-local regularization. IEEE Access. 2020;8:55526–39. https://doi.org/10.1109/ACCESS.2020.2981162 .

Chen L, Zhang M, Yang C-y, Sahita R. POSTER: Semi-supervised classification for dynamic android malware detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS ’17, pp. 2479–2481. Association for Computing Machinery, Dallas, Texas, USA. 2017.

Zhang S, Du C. Semi-supervised deep learning based network intrusion detection. 2020, pp. 35–40.

Yao H, Fu D, Zhang P, Li M, Liu Y. Msml: a novel multilevel semi-supervised machine learning framework for intrusion detection system. IEEE Int Things J. 2019;6(2):1949–59. https://doi.org/10.1109/JIOT.2018.2873125 .

Chen C, Gong Y, Tian Y. Semi-supervised learning methods for network intrusion detection. 2008 IEEE International Conference on Systems, Man and Cybernetics, 2008, pp. 2603–2608. https://doi.org/10.1109/ICSMC.2008.4811688 .

Yang J, Yang P, Jin X, Ma Q. Multi-classification for malicious url based on improved semi-supervised algorithm. 2017 IEEE international conference on computational science and engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) 2017;1:143–50. https://doi.org/10.1109/CSE-EUC.2017.34 .

Elkan C. The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2. IJCAI’01, pp. 973–978. Morgan Kaufmann Publishers Inc.

Branco P, Torgo L, Ribeiro RP. A survey of predictive modeling on imbalanced domains. Comput Surv. 2017;49(2):1–50. https://doi.org/10.1145/2907070

Apruzzese G, Anderson HS, Dambra S, Freeman D, Pierazzi F, Roundy KA. “Real attackers don’t compute gradients”: bridging the gap between adversarial ML research and practice. arXiv. 2022. https://doi.org/10.48550/arXiv.2212.14315 .

Grosse K, Bieringer L, Besold TR, Biggio B, Krombholz K. “Why do so?”—a practical perspective on machine learning security. arXiv. 2022. https://doi.org/10.48550/arXiv.2207.05164 .

Bieringer L, Grosse K, Backes M, Biggio B, Krombholz K. Industrial practitioners’ mental models of adversarial machine learning, pp. 97–116. https://www.usenix.org/conference/soups2022/presentation/bieringer .

Rudin C, Radin J. Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition.

Van Lent M, Fisher W, Mancuso M. An explainable artificial intelligence system for small-unit tactical behavior. In: Proceedings of the National Conference on Artificial Intelligence, pp. 900–907 (2004). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

Vollert S, Atzmueller M, Theissler A. Interpretable machine learning: a brief survey from the predictive maintenance perspective. In: 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ), pp. 01–08.

Nakagawa PI, Ferreira Pires L, Rebelo Moreira JL, Olavo Bonino L. Towards semantic description of explainable machine learning workflows. In: 2021 IEEE 25th International Enterprise Distributed Object Computing Workshop (EDOCW), pp. 236–244. ISSN: 2325-6605.

Download references

Acknowledgements

We thank the anonymous reviewers, the editor and the assistant editor for their constructive comments and suggestions. We are also thankful to Professor Daniel Amyot for providing his valuable guidance throughout the development of the literature review.

This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Vector Institute, and The IBM Center for Advanced Studies (CAS) Canada within Research Project 1059.

Author information

Paul K. Mvula, Paula Branco, Guy-Vincent Jourdan & Herna L. Viktor

Present address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5, ON, Canada

Paula Branco, Guy-Vincent Jourdan and Herna L. Viktor contributed equally to this work

Authors and Affiliations

You can also search for this author in PubMed Google Scholar

Contributions

P.M. worked on the conceptualization, methodology, software, visualization, and writing of the original draft. P.B., G.-V. J. and H.V. aided in the conceptualization, supervision, validation, reviewing and editing of the final manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Paul K. Mvula .

Ethics declarations

Competing interests.

The authors would like to declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Mvula, P.K., Branco, P., Jourdan, GV. et al. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. Discov Data 1 , 4 (2023). https://doi.org/10.1007/s44248-023-00003-x

Download citation

Received : 27 January 2023

Accepted : 21 March 2023

Published : 06 April 2023

DOI : https://doi.org/10.1007/s44248-023-00003-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Cyber-security
Performance metrics
Phishing detection
Intrusion detection
Malware detection
Find a journal
Publish with us
Track your research

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

Cyber security: challenges for society-literature review

Related Papers

IOSR Journals

Vikramajeet Khatri , Leo Hippeläinen , Monshizadeh Mehrnoosh

Cloud computing has got attention of telecommunications operators as a potential cost saver, because it enables sharing computing resources within network infrastructure and between operators. The concept of Telecommunications network as a Service (TaaS) has been proposed as a renovation direction of mobile operators. However, information security which is one of the major challenges of the cloud computing should be seriously investigated and discussed in order to realize TaaS in practice. For this purpose, we review new threats introduced by TaaS and discuss prevention mechanisms to resist them. Based on the cloud deployment model, we further propose a security framework, “Cloud Security Framework for Operators (CSFO)” in order to support TaaS. We also go through open research issues about security related to TaaS and propose future research focus.

JAYENDRA KUMAR

mankali priyadarshini

Emmanuel S Dandaura

Proceedings of the INTERNATIONAL CONFERENCE ON "CYBERSPACE GOVERNANCE:The Imperative For National & Economic Security"

سليمان كريستيانو

Sangita Kolekar

Jai Shankar

Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Internet of Things (IoT) is the interconnection of heterogeneous smart devices through the Internet with diverse application areas. The huge number of smart devices and the complexity of networks has made it impossible to secure the data and communication between devices. Various conventional security controls are insufficient to prevent numerous attacks against these information-rich devices. Along with enhancing existing approaches, a peripheral defence, Intrusion Detection System (IDS), proved efficient in most scenarios. However, conventional IDS approaches are unsuitable to mitigate continuously emerging zero-day attacks. Intelligent mechanisms that can detect unfamiliar intrusions seems a prospective solution. This article explores popular attacks against IoT architecture and its relevant defence mechanisms to identify an appropriate protective measure for different networking practices and attack categories. Besides, a security framework for IoT architecture is provided with a list of security enhancement techniques.

REVIEW article

Integrated cybersecurity for metaverse systems operating with artificial intelligence, blockchains, and cloud computing provisionally accepted.

1 University of Oxford, United Kingdom

The final, formatted version of the article will be published soon.

In the ever-evolving realm of cybersecurity, the increasing integration of Metaverse systems with cutting-edge technologies such as Artificial Intelligence (AI), Blockchain, and Cloud Computing presents a host of new opportunities alongside significant challenges. This article employs a methodological approach that combines an extensive literature review with focused case study analyses to examine the changing cybersecurity landscape within these intersecting domains. The emphasis is particularly on the Metaverse, exploring its current state of cybersecurity, potential future developments, and the influential roles of AI, blockchain, and cloud technologies.Our thorough investigation assesses a range of cybersecurity standards and frameworks to determine their effectiveness in managing the risks associated with these emerging 2 technologies. Special focus is directed towards the rapidly evolving digital economy of the Metaverse, investigating how AI and blockchain can enhance its cybersecurity infrastructure whilst acknowledging the complexities introduced by cloud computing.The results highlight significant gaps in existing standards and a clear necessity for regulatory advancements, particularly concerning blockchain's capability for self-governance and the early-stage development of the Metaverse. The article underscores the need for proactive regulatory involvement, stressing the importance of cybersecurity experts and policymakers adapting and preparing for the swift advancement of these technologies. Ultimately, this study offers a comprehensive overview of the current scenario, foresees future challenges, and suggests strategic directions for integrated cybersecurity within Metaverse systems utilising AI, blockchain, and cloud computing.

Keywords: artificial intelligence, cybersecurity, Cyber assurance, Cyber risk, cybersecurity standards, Cybersecurity frameworks, Cloud Security, BLOCKCHAIN SECURITY

Received: 20 Dec 2023; Accepted: 02 Apr 2024.

Copyright: © 2024 Radanliev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Petar Radanliev, University of Oxford, Oxford, United Kingdom

People also looked at

IMAGES

(PDF) Cyber security: challenges for society-literature review
(PDF) A LITERATURE REVIEW ON SECURITY ISSUES IN CLOUD COMPUTING
(PDF) A review on the readiness level and cyber-security challenges in
(PDF) A Systematic Literature Review on the Cyber Security
Cybersecurity: Literature Review and Iot Cyber Risk Management
(PDF) Cybersecurity awareness for children: A systematic literature review

COMMENTS

Cyber security: challenges for society- literature review
Cyber security: challenges for society- literature review. This paper describes the challenges due to lack of coordination between Security agencies and the Critical IT Infrastructure, and focuses on cyber security emerging trends while adopting new technologies such as mobile computing, cloud computing, e-commerce, and social networking. Expand.
PDF Cyber security: challenges for society- literature review
3) Security of accounts while using social-networking sitesagainst hijacking. 4) One key to improved cyber security is a betterunderstanding of the threat and of the vectors used by theattacker to circumvent cyber defences [5]. 6) Need of separate unit handling security of theorganization.
Cyber security: challenges for society- literature review
Download Citation | On Jan 1, 2013, Atul M Tonge published Cyber security: challenges for society- literature review | Find, read and cite all the research you need on ResearchGate
Cyber security: State of the art, challenges and future directions
This article provides an overview of the state of the art in cyber security, challenges, and tactics, current conditions, and global trends of cyber security. ... The recent trends in cyber security: a review. J. King Saud Univ.- Comput. Inform. ... Cyber security: challenges for society- literature review. IOSR J. Comput. Eng., 12 (2) ...
Cyber Security: Challenges for Society-Literature Review
Cyber Security: Challenges for Society-Literature Review. Cyber security is the activity of protecting information and information systems (networks, computers, data bases, data centers and ...
(PDF) A Systematic Literature Review on the Cyber Security
A Systematic Literature Review on the Cyber Security. 1 Dr.Yusuf Perwej, 2 Prof. (Dr.) Syed Qamar Abbas, 3 Jai Pratap Dixit, 4 Dr. Nikhat Akhtar, 5Anurag. Kumar Jaiswal. 1 Professor, Department of ...
Societal Impacts of Cyber Security in Academic Literature
The 2020 Allianz Risk Barometer, with 39% of responses, ranked cyber incidents as the number one risk threatening business continuity. Any organisation may face a number of challenges e.g. costly ...
Cyber security: challenges for society- literature review
Atul M. Tonge | IOSR Journal of Computer Engineering | security is the activity of protecting information and information systems (networks, computers, dat 10.6084/m9.figshare.1104181 Cyber security: challenges for society- literature review
A Review on Cybersecurity: Challenges & Emerging Threats
With this research study, the aim was to identify challenges to cybersecurity in aspects of governance, risk management, culture and awareness as well as emerging threats. Through analysis of 33 research articles we have defined the critical components of cybersecurity and the significance of challenges and emerging threats.
Research communities in cyber security: A comprehensive literature review
The analysis discovered twelve top-level communities: access control, authentication, biometrics, cryptography (I & II), cyber-physical systems, information hiding, intrusion detection, malwares, quantum cryptography, sensor networks, and usable security. These top-level communities were in turn composed of a total of 80 sub-communities.
(PDF) Societal Impacts of Cyber Security in Academic Literature
A Systematic Literature Review on the Cyber Security. 2021 • ... 4.0 environments with a broad mix of technologies have their own set of security and privacy challenges and typical cybersecurity challenges. The current cybersecurity trends that Industry 4.0 technologies face will be discussed in this chapter. ... society, cyber security ...
A systematic literature review of how cybersecurity-related behavior
The extent to which an employee is aware of and complies with information security policy defines the extent of their information security awareness (ISA). ISA is critical in mitigating the risks associated with cybersecurity and is defined by two components, namely, understanding and compliance.Compliance is the employees' commitment to follow best-practice rules defined by the organization ...
A systematic literature review of mitigating cyber security risk
A systematic review encourages the production of quality evidence with more substantial results (Mallet et al., 2012 ). Although the current cyber security literature corpus is vast, systematic reviews of cyber security studies, pattern identification, and potential theme development are limited. Specifically, the procedures of a review ...
PDF Cybersecurity Literature Review and Efforts Report
• Performing a strategic literature review and investigation of ongoing security efforts • Review state of the art technologies across multiple disciplines • Assess representative TMS systems and equipment • Suggest "Red Team" high risk equipment • Develop Guidance for state and local agencies that aids in identifying:
PDF Societal Impacts of Cyber Security in Academic Literature: Systematic
The purpose of this literature review is to identify how societal impacts of cyber security, and how the impacts of cyber security issues to individuals, communities, organizations or societies ...
Cyber Security: A Review of Cyber Crimes, Security Challenges and
Cyber Security: A Review of Cyber Crimes, Security Challenges and Measures to Control. ... An Official Publication of the Society for Risk Analysis. Crossref. PubMed. Google Scholar. Cimpanu C. (2020, 19 January. ... & Hossain M. (2016). A literature review on phishing crime, prevention review and investigation of gaps. In 10th International ...
Internet governance and cyber-security: a systematic literature review
The study also found that as the Internet and its governance issues offload the privacy and security burden and supervision concerns characterized the telecommunications are heightened in the context of social awareness in cyberspace, cybersecurity has become necessary with businesses and the government spending much time and resources to ...
A Systematic Literature Review on the Cyber Security
security framework, and section 10 cyber security tools. Finally, in section 11 cyber security challenges. II. Related Work IT security includes cyber security as a subset. Cyber security protects the digital data on your networks, computers, and devices from unauthorized access, attack, and destruction. While IT security protects both
A systematic literature review of cyber-security data repositories and
This section provides the details of the methodology we followed. To achieve our goal of reviewing the datasets and evaluation metrics used in the applications of SSL techniques to cyber-security, we followed the standard systematic literature review guidelines outlined in [] for assessing the search's completeness.The entire process was done on Covidence [], an online tool for systematic ...
A Systematic Literature Review on Cyber Threat Intelligence for ...
To overcome network security challenges, Li ... article provides a valuable overview of the emergence of deep fake technology and its potential impact on business and society ... Sarah A. Suayyid, Manal S. Al-Ghamdi, Hayfa Al-Muhaisen, and Abdullah M. Almuhaideb. 2023. "A Systematic Literature Review on Cyber Threat Intelligence for ...
Cyber security: challenges for society-literature review
The next 2 billion www.iosrjournals.org 68 | Page Cyber security: challenges for society- literature review users will be connecting from mobile devices and many of those devices are in developing countries.The sheer number are likely to have social impact like flash mobs.A lot more politics is migrating to cyber space ,with parallel calls to ...
[Pdf] a Review: Importance of Cyber Security and Its Challenges to
This paper focusses on the cyber security trends and corresponding challenges in the world of interconnected systems. - Cyber security means being in a state of security from the vulnerabilities available in the network. Being in a world, where millions and trillions of systems are interconnected, it is quite important to save our precious data from cyber-attacks.
Integrated Cybersecurity for Metaverse Systems Operating with
In the ever-evolving realm of cybersecurity, the increasing integration of Metaverse systems with cutting-edge technologies such as Artificial Intelligence (AI), Blockchain, and Cloud Computing presents a host of new opportunities alongside significant challenges. This article employs a methodological approach that combines an extensive literature review with focused case study analyses to ...
A Review on Cybersecurity: Challenges & Emerging Threats
A bibliographic analysis of the literature is applied until 2016 to identify and discuss the cybersecurity value conflicts and ethical issues in national security.

TechRepublic

Subscribe to the Cybersecurity Insider Newsletter

Resource Details

Sign in to TechRepublic

Reset Password

Welcome. Tell us a little bit about you.

Want to receive more TechRepublic news?

Cyber security: challenges for society- literature review

A systematic literature review of how cybersecurity-related behavior has been assessed

Design/methodology/approach

Research limitations/implications

Originality/value

1. Introduction

1.1 Aims of the paper

1.2 Background

1.3 Structure of the paper

2.1 Research questions

2.2 Record searching process

2.3 Assessment criteria

2.4 Analysis of included articles

3.1 Identification, screening, eligibility and inclusion mechanism

3.2 Trend and classification of included studies

3.3 Findings

3.3.2 Data collection.

3.3.3 Measurement scale.

3.3.4 Analysis.

3.3.4.1 Are there differences between manager and employee intention and behavior in a cybersecurity context?

4. Discussion

5. Conclusion

Further readings

Acknowledgements

Corresponding author

All feedback is valuable

Join us on our journey

Questions & More Information

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

Cite this article

Similar content being viewed by others

On the Variability in the Application and Measurement of Supervised Machine Learning in Cyber Security

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Review: machine learning techniques applied to cybersecurity

1 Introduction

2 Background on SSL and cyber-security

2.1 SSL concepts and methods

2.1.1 SSL for classification and regression

2.1.1.1 Inductive methods

2.1.1.2 Transductive methods

2.1.2 SSL for clustering

2.2 Cybercrimes

2.2.1 Network intrusion

2.2.2 Phishing

2.2.4 Malware

2.2.5 Other cyber-attacks

2.3 Examples of industry deployments of ML in cyber-security

3 Review methodology

4 Datasets and performance assessment metrics

4.1 Datasets and repositories

4.1.1 Network intrusion datasets and sources

4.1.2 Spam and phishing datasets and sources

4.1.3 Malware datasets and sources

4.1.4 Additional datasets and sources

4.2 Performance assessment metrics

5 Open issues and challenges

5.1 Datasets and repositories

5.2 Performance assessment metrics

5.3 Bridging the gap between ML-based cyber-security research and practice

6 Conclusion

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Share this article

Cyber security: challenges for society-literature review

Related Papers

RELATED PAPERS