research paper software developer

Journal of Software Engineering Research and Development Cover Image

Search by keyword
Search by citation

Page 1 of 2

Metric-centered and technology-independent architectural views for software comprehension

The maintenance of applications is a crucial activity in the software industry. The high cost of this process is due to the effort invested on software comprehension since, in most of cases, there is no up-to-...

View Full Text

Back to the future: origins and directions of the “Agile Manifesto” – views of the originators

In 2001, seventeen professionals set up the manifesto for agile software development. They wanted to define values and basic principles for better software development. On top of being brought into focus, the ...

Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data

Code review is a potential means of improving software quality. To be effective, it depends on different factors, and many have been investigated in the literature to identify the scenarios in which it adds qu...

On the benefits and challenges of using kanban in software engineering: a structured synthesis study

Kanban is increasingly being used in diverse software organizations. There is extensive research regarding its benefits and challenges in Software Engineering, reported in both primary and secondary studies. H...

Challenges on applying genetic improvement in JavaScript using a high-performance computer

Genetic Improvement is an area of Search Based Software Engineering that aims to apply evolutionary computing operators to the software source code to improve it according to one or more quality metrics. This ...

Actor’s social complexity: a proposal for managing the iStar model

Complex systems are inherent to modern society, in which individuals, organizations, and computational elements relate with each other to achieve a predefined purpose, which transcends individual goals. In thi...

Investigating measures for applying statistical process control in software organizations

The growing interest in improving software processes has led organizations to aim for high maturity, where statistical process control (SPC) is required. SPC makes it possible to analyze process behavior, pred...

An approach for applying Test-Driven Development (TDD) in the development of randomized algorithms

TDD is a technique traditionally applied in applications with deterministic algorithms, in which the input and the expected result are known. However, the application of TDD with randomized algorithms have bee...

Supporting governance of mobile application developers from mining and analyzing technical questions in stack overflow

There is a need to improve the direct communication between large organizations that maintain mobile platforms (e.g. Apple, Google, and Microsoft) and third-party developers to solve technical questions that e...

Working software over comprehensive documentation – Rationales of agile teams for artefacts usage

Agile software development (ASD) promotes working software over comprehensive documentation. Still, recent research has shown agile teams to use quite a number of artefacts. Whereas some artefacts may be adopt...

Development as a journey: factors supporting the adoption and use of software frameworks

From the point of view of the software framework owner, attracting new and supporting existing application developers is crucial for the long-term success of the framework. This mixed-methods study explores th...

Applying user-centered techniques to analyze and design a mobile application

Techniques that help in understanding and designing user needs are increasingly being used in Software Engineering to improve the acceptance of applications. Among these techniques we can cite personas, scenar...

A measurement model to analyze the effect of agile enterprise architecture on geographically distributed agile development

Efficient and effective communication (active communication) among stakeholders is thought to be central to agile development. However, in geographically distributed agile development (GDAD) environments, it c...

A survey of search-based refactoring for software maintenance

This survey reviews published materials related to the specific area of Search-Based Software Engineering that concerns software maintenance and, in particular, refactoring. The survey aims to give a comprehen...

Guest editorial foreword for the special issue on automated software testing: trends and evidence

Similarity testing for role-based access control systems.

Access control systems demand rigorous verification and validation approaches, otherwise, they can end up with security breaches. Finite state machines based testing has been successfully applied to RBAC syste...

An algorithm for combinatorial interaction testing: definitions and rigorous evaluations

Combinatorial Interaction Testing (CIT) approaches have drawn attention of the software testing community to generate sets of smaller, efficient, and effective test cases where they have been successful in det...

How diverse is your team? Investigating gender and nationality diversity in GitHub teams

Building an effective team of developers is a complex task faced by both software companies and open source communities. The problem of forming a “dream”

Investigating factors that affect the human perception on god class detection: an analysis based on a family of four controlled experiments

Evaluation of design problems in object oriented systems, which we call code smells, is mostly a human-based task. Several studies have investigated the impact of code smells in practice. Studies focusing on h...

On the evaluation of code smells and detection tools

Code smells refer to any symptom in the source code of a program that possibly indicates a deeper problem, hindering software maintenance and evolution. Detection of code smells is challenging for developers a...

On the influence of program constructs on bug localization effectiveness

Software projects often reach hundreds or thousands of files. Therefore, manually searching for code elements that should be changed to fix a failure is a difficult task. Static bug localization techniques pro...

DyeVC: an approach for monitoring and visualizing distributed repositories

Software development using distributed version control systems has become more frequent recently. Such systems bring more flexibility, but also greater complexity to manage and monitor multiple existing reposi...

A genetic algorithm based framework for software effort prediction

Several prediction models have been proposed in the literature using different techniques obtaining different results in different contexts. The need for accurate effort predictions for projects is one of the ...

Elaboration of software requirements documents by means of patterns instantiation

Studies show that problems associated with the requirements specifications are widely recognized for affecting software quality and impacting effectiveness of its development process. The reuse of knowledge ob...

ArchReco: a software tool to assist software design based on context aware recommendations of design patterns

This work describes the design, development and evaluation of a software Prototype, named ArchReco, an educational tool that employs two types of Context-aware Recommendations of Design Patterns, to support us...

On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers

Non-trivial software systems are written using multiple (programming) languages, which are connected by cross-language links. The existence of such links may lead to various problems during software developmen...

SoftCoDeR approach: promoting Software Engineering Academia-Industry partnership using CMD, DSR and ESE

The Academia-Industry partnership has been increasingly encouraged in the software development field. The main focus of the initiatives is driven by the collaborative work where the scientific research work me...

Issues on developing interoperable cloud applications: definitions, concepts, approaches, requirements, characteristics and evaluation models

Among research opportunities in software engineering for cloud computing model, interoperability stands out. We found that the dynamic nature of cloud technologies and the battle for market domination make clo...

Game development software engineering process life cycle: a systematic review

Software game is a kind of application that is used not only for entertainment, but also for serious purposes that can be applicable to different domains such as education, business, and health care. Multidisc...

Correlating automatic static analysis and mutation testing: towards incremental strategies

Traditionally, mutation testing is used as test set generation and/or test evaluation criteria once it is considered a good fault model. This paper uses mutation testing for evaluating an automated static anal...

A multi-objective test data generation approach for mutation testing of feature models

Mutation approaches have been recently applied for feature testing of Software Product Lines (SPLs). The idea is to select products, associated to mutation operators that describe possible faults in the Featur...

An extended global software engineering taxonomy

In Global Software Engineering (GSE), the need for a common terminology and knowledge classification has been identified to facilitate the sharing and combination of knowledge by GSE researchers and practition...

A systematic process for obtaining the behavior of context-sensitive systems

Context-sensitive systems use contextual information in order to adapt to the user’s current needs or requirements failure. Therefore, they need to dynamically adapt their behavior. It is of paramount importan...

Distinguishing extended finite state machine configurations using predicate abstraction

Extended Finite State Machines (EFSMs) provide a powerful model for the derivation of functional tests for software systems and protocols. Many EFSM based testing problems, such as mutation testing, fault diag...

Extending statecharts to model system interactions

Statecharts are diagrams comprised of visual elements that can improve the modeling of reactive system behaviors. They extend conventional state diagrams with the notions of hierarchy, concurrency and communic...

On the relationship of code-anomaly agglomerations and architectural problems

Several projects have been discontinued in the history of the software industry due to the presence of software architecture problems. The identification of such problems in source code is often required in re...

An approach based on feature models and quality criteria for adapting component-based systems

Feature modeling has been widely used in domain engineering for the development and configuration of software product lines. A feature model represents the set of possible products or configurations to apply i...

Patch rejection in Firefox: negative reviews, backouts, and issue reopening

Writing patches to fix bugs or implement new features is an important software development task, as it contributes to raise the quality of a software system. Not all patches are accepted in the first attempt, ...

Investigating probabilistic sampling approaches for large-scale surveys in software engineering

Establishing representative samples for Software Engineering surveys is still considered a challenge. Specialized literature often presents limitations on interpreting surveys’ results, mainly due to the use o...

Characterising the state of the practice in software testing through a TMMi-based process

The software testing phase, despite its importance, is usually compromised by the lack of planning and resources in industry. This can risk the quality of the derived products. The identification of mandatory ...

Self-adaptation by coordination-targeted reconfigurations

A software system is self-adaptive when it is able to dynamically and autonomously respond to changes detected either in its internal components or in its deployment environment. This response is expected to ensu...

Templates for textual use cases of software product lines: results from a systematic mapping study and a controlled experiment

Use case templates can be used to describe functional requirements of a Software Product Line. However, to the best of our knowledge, no efforts have been made to collect and summarize these existing templates...

F3T: a tool to support the F3 approach on the development and reuse of frameworks

Frameworks are used to enhance the quality of applications and the productivity of the development process, since applications may be designed and implemented by reusing framework classes. However, frameworks ...

NextBug: a Bugzilla extension for recommending similar bugs

Due to the characteristics of the maintenance process followed in open source systems, developers are usually overwhelmed with a great amount of bugs. For instance, in 2012, approximately 7,600 bugs/month were...

Assessing the benefits of search-based approaches when designing self-adaptive systems: a controlled experiment

The well-orchestrated use of distilled experience, domain-specific knowledge, and well-informed trade-off decisions is imperative if we are to design effective architectures for complex software-intensive syst...

Revealing influence of model structure and test case profile on the prioritization of test cases in the context of model-based testing

Test case prioritization techniques aim at defining an order of test cases that favor the achievement of a goal during test execution, such as revealing failures as earlier as possible. A number of techniques ...

A metrics suite for JUnit test code: a multiple case study on open source software

The code of JUnit test cases is commonly used to characterize software testing effort. Different metrics have been proposed in literature to measure various perspectives of the size of JUnit test cases. Unfort...

Designing fault-tolerant SOA based on design diversity

Over recent years, software developers have been evaluating the benefits of both Service-Oriented Architecture (SOA) and software fault tolerance techniques based on design diversity. This is achieved by creat...

Method-level code clone detection through LWH (Light Weight Hybrid) approach

Many researchers have investigated different techniques to automatically detect duplicate code in programs exceeding thousand lines of code. These techniques have limitations in finding either the structural o...

The problem of conceptualization in god class detection: agreement, strategies and decision drivers

The concept of code smells is widespread in Software Engineering. Despite the empirical studies addressing the topic, the set of context-dependent issues that impacts the human perception of what is a code sme...

Editorial Board
Sign up for article alerts and news from this journal

software engineering Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

Identifying Non-Technical Skill Gaps in Software Engineering Education: What Experts Expect But Students Don’t Learn

As the importance of non-technical skills in the software engineering industry increases, the skill sets of graduates match less and less with industry expectations. A growing body of research exists that attempts to identify this skill gap. However, only few so far explicitly compare opinions of the industry with what is currently being taught in academia. By aggregating data from three previous works, we identify the three biggest non-technical skill gaps between industry and academia for the field of software engineering: devoting oneself to continuous learning , being creative by approaching a problem from different angles , and thinking in a solution-oriented way by favoring outcome over ego . Eight follow-up interviews were conducted to further explore how the industry perceives these skill gaps, yielding 26 sub-themes grouped into six bigger themes: stimulating continuous learning , stimulating creativity , creative techniques , addressing the gap in education , skill requirements in industry , and the industry selection process . With this work, we hope to inspire educators to give the necessary attention to the uncovered skills, further mitigating the gap between the industry and the academic world.

Opportunities and Challenges in Code Search Tools

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.

Psychometrics in Behavioral Software Engineering: A Methodological Introduction with Guidelines

A meaningful and deep understanding of the human aspects of software engineering (SE) requires psychological constructs to be considered. Psychology theory can facilitate the systematic and sound development as well as the adoption of instruments (e.g., psychological tests, questionnaires) to assess these constructs. In particular, to ensure high quality, the psychometric properties of instruments need evaluation. In this article, we provide an introduction to psychometric theory for the evaluation of measurement instruments for SE researchers. We present guidelines that enable using existing instruments and developing new ones adequately. We conducted a comprehensive review of the psychology literature framed by the Standards for Educational and Psychological Testing. We detail activities used when operationalizing new psychological constructs, such as item pooling, item review, pilot testing, item analysis, factor analysis, statistical property of items, reliability, validity, and fairness in testing and test bias. We provide an openly available example of a psychometric evaluation based on our guideline. We hope to encourage a culture change in SE research towards the adoption of established methods from psychology. To improve the quality of behavioral research in SE, studies focusing on introducing, validating, and then using psychometric instruments need to be more common.

Towards an Anatomy of Software Craftsmanship

Context: The concept of software craftsmanship has early roots in computing, and in 2009, the Manifesto for Software Craftsmanship was formulated as a reaction to how the Agile methods were practiced and taught. But software craftsmanship has seldom been studied from a software engineering perspective. Objective: The objective of this article is to systematize an anatomy of software craftsmanship through literature studies and a longitudinal case study. Method: We performed a snowballing literature review based on an initial set of nine papers, resulting in 18 papers and 11 books. We also performed a case study following seven years of software development of a product for the financial market, eliciting qualitative, and quantitative results. We used thematic coding to synthesize the results into categories. Results: The resulting anatomy is centered around four themes, containing 17 principles and 47 hierarchical practices connected to the principles. We present the identified practices based on the experiences gathered from the case study, triangulating with the literature results. Conclusion: We provide our systematically derived anatomy of software craftsmanship with the goal of inspiring more research into the principles and practices of software craftsmanship and how these relate to other principles within software engineering in general.

On the Reproducibility and Replicability of Deep Learning in Software Engineering

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility —whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability —whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks. Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study. Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

Predictive Software Engineering: Transform Custom Software Development into Effective Business Solutions

The paper examines the principles of the Predictive Software Engineering (PSE) framework. The authors examine how PSE enables custom software development companies to offer transparent services and products while staying within the intended budget and a guaranteed budget. The paper will cover all 7 principles of PSE: (1) Meaningful Customer Care, (2) Transparent End-to-End Control, (3) Proven Productivity, (4) Efficient Distributed Teams, (5) Disciplined Agile Delivery Process, (6) Measurable Quality Management and Technical Debt Reduction, and (7) Sound Human Development.

Software—A New Open Access Journal on Software Engineering

Software (ISSN: 2674-113X) [...]

Improving bioinformatics software quality through incorporation of software engineering practices

Background Bioinformatics software is developed for collecting, analyzing, integrating, and interpreting life science datasets that are often enormous. Bioinformatics engineers often lack the software engineering skills necessary for developing robust, maintainable, reusable software. This study presents review and discussion of the findings and efforts made to improve the quality of bioinformatics software. Methodology A systematic review was conducted of related literature that identifies core software engineering concepts for improving bioinformatics software development: requirements gathering, documentation, testing, and integration. The findings are presented with the aim of illuminating trends within the research that could lead to viable solutions to the struggles faced by bioinformatics engineers when developing scientific software. Results The findings suggest that bioinformatics engineers could significantly benefit from the incorporation of software engineering principles into their development efforts. This leads to suggestion of both cultural changes within bioinformatics research communities as well as adoption of software engineering disciplines into the formal education of bioinformatics engineers. Open management of scientific bioinformatics development projects can result in improved software quality through collaboration amongst both bioinformatics engineers and software engineers. Conclusions While strides have been made both in identification and solution of issues of particular import to bioinformatics software development, there is still room for improvement in terms of shifts in both the formal education of bioinformatics engineers as well as the culture and approaches of managing scientific bioinformatics research and development efforts.

Inter-team communication in large-scale co-located software engineering: a case study

AbstractLarge-scale software engineering is a collaborative effort where teams need to communicate to develop software products. Managers face the challenge of how to organise work to facilitate necessary communication between teams and individuals. This includes a range of decisions from distributing work over teams located in multiple buildings and sites, through work processes and tools for coordinating work, to softer issues including ensuring well-functioning teams. In this case study, we focus on inter-team communication by considering geographical, cognitive and psychological distances between teams, and factors and strategies that can affect this communication. Data was collected for ten test teams within a large development organisation, in two main phases: (1) measuring cognitive and psychological distance between teams using interactive posters, and (2) five focus group sessions where the obtained distance measurements were discussed. We present ten factors and five strategies, and how these relate to inter-team communication. We see three types of arenas that facilitate inter-team communication, namely physical, virtual and organisational arenas. Our findings can support managers in assessing and improving communication within large development organisations. In addition, the findings can provide insights into factors that may explain the challenges of scaling development organisations, in particular agile organisations that place a large emphasis on direct communication over written documentation.

Aligning Software Engineering and Artificial Intelligence With Transdisciplinary

Study examined AI and SE transdisciplinarity to find ways of aligning them to enable development of AI-SE transdisciplinary theory. Literature review and analysis method was used. The findings are AI and SE transdisciplinarity is tacit with islands within and between them that can be linked to accelerate their transdisciplinary orientation by codification, internally developing and externally borrowing and adapting transdisciplinary theories. Lack of theory has been identified as the major barrier toward towards maturing the two disciplines as engineering disciplines. Creating AI and SE transdisciplinary theory would contribute to maturing AI and SE engineering disciplines. Implications of study are transdisciplinary theory can support mode 2 and 3 AI and SE innovations; provide an alternative for maturing two disciplines as engineering disciplines. Study’s originality it’s first in SE, AI or their intersections.

Export Citation Format

Share document.

Software Engineering

At Google, we pride ourselves on our ability to develop and launch new products and features at a very fast pace. This is made possible in part by our world-class engineers, but our approach to software development enables us to balance speed and quality, and is integral to our success. Our obsession for speed and scale is evident in our developer infrastructure and tools. Developers across the world continually write, build, test and release code in multiple programming languages like C++, Java, Python, Javascript and others, and the Engineering Tools team, for example, is challenged to keep this development ecosystem running smoothly. Our engineers leverage these tools and infrastructure to produce clean code and keep software development running at an ever-increasing scale. In our publications, we share associated technical challenges and lessons learned along the way.

Recent Publications

Some of our teams.

Climate and sustainability

Software engineering and programming languages

We're always looking for more talented, passionate people.

An empirically based model of software prototyping: a mapping study and a multi-case study

Open access
Published: 30 August 2023
Volume 28 , article number 115 , ( 2023 )

Cite this article

You have full access to this open access article

Elizabeth Bjarnason ORCID: orcid.org/0000-0001-9070-0008 1 ,
Franz Lang 1 &
Alexander Mjöberg 1

2021 Accesses

Explore all metrics

Prototyping is an established practice within product and user interface design that is also used as a requirements engineering (RE) practice within agile development. Even so, there is a lack of theory on prototyping.

Our main research objective is to support practitioners in improving on their prototyping practices.

We have designed a model that describes key aspects of the practice of prototyping. The model is based on a systematic mapping study consisting of thirty-three primary studies and on empirical data from twelve case companies. We validate and demonstrate the applicability of our model through a focus group at one company and through semi-structured interviews at eleven (other) startup companies.

Our prototyping aspects model (PAM) consists of five aspects of prototyping, namely purpose, prototype scope, prototype media, prototype use, and exploration strategy. This model has enabled practitioners to discuss their prototyping practices in terms of the concepts provided by our model.

Conclusions

The model can be used to categorise prototyping instances and can thereby support practitioners in reflecting and improving on their prototyping practices.

Perspectives on Future Prototyping—Results from an Expert Discussion

Case Studies on End-User Engagement and Prototyping during Software Development

House of Prototyping Guidelines: A Framework to Develop Theoretical Prototyping Strategies for Human-Centered Design

Avoid common mistakes on your manuscript.

1 Introduction

Prototyping is a creative practice commonly applied within product design (Tronvoll et al. 2017 ), usability and user-interface design (Hakim and Spitzer 2000 ; Lauesen 2005 ; Hartson 2019 ), and software development (Acosta et al. 1994 , 1993 ; Goldman and Narayanaswamy 1992 ) to explore the problem and/or the solution domain through use of a prototype, i.e. an early sample, model, or release of a product that simulates one or many dimensions of the (future) product. A prototype can range from a simple paper sketch through a computer-generated mock-up to an (incomplete) version of the production software, e.g. a minimum viable product (MVP), and can be used to learn about user problems and to evaluate solution ideas (Sergio 2015 ). Thus, prototyping enables validating a solution proposal before developing the full product through cost effective testing with real users (Nielsen 1993 ). In our research, we focus on the practice of prototyping, namely the use of prototypes to obtain learning, rather than on the construction of prototypes, meaning that any representation that aids in exploring and learning about a feature or dimension of a new product or business model can be used for prototyping. This broad definition of a prototype, also includes entities such as PowerPoint sketches and videos (Karras et al. 2017 ), e.g. used as marketing material, and interview questions (Batova et al. 2016 ), e.g. used to explore the customer and user domain.

Within human–computer interaction (HCI) and design, the practice of prototyping is well established, and used to explore and test ideas and solutions for user interface designs. However, researchers within HCI pointed out a lack of knowledge about the fundamental nature and characteristics of prototyping, already in 2008 (Lim et al. 2008 ). While there is more recent research on prototyping, we find no comprehensive theory of the practice or its methodology, e.g. overarching principles or procedures for achieving a certain outcome, and only one related literature review of prototyping concerning the definition of MVP (Lenarduzzi et al. 2016 ). For these reasons, we were interested in exploring prototyping methodology with the aim of supporting practitioners in describing and discussing their prototyping practices, and thereby enable them to improve on these through reflection-based learning (Bjarnason et al. 2014 ).

The overall objective of our study was to explore how to categorise prototyping practices from the perspective of prototyping methodology. The initial part of our study was performed in collaboration with our case company Telavox, who were interested in further improving their use of prototyping. We based our research on the current body of knowledge and defined the following four research questions to guide our research:

RQ1: What are the main aspects of prototyping methodology? RQ2: What fields of research have previously investigated the main aspects of prototyping methodology (RQ1)? RQ3: What types of research have investigated the main aspects of prototyping methodology (RQ1)? RQ4: How can the initial version of our prototyping aspects model be improved to better support practitioners?

In this paper, we extend on previously published results for RQ1 (Bjarnason et al. 2021a ) by presenting a revised version of our prototyping aspects model (PAM) including improvements to the initial version of the model (RQ4). These improvements are based on additional empirical data from a multi-case study of eleven startup companies. The initial version of PAM was based on a systematic mapping study (RQ2 and RQ3) and was validated through a focus group at one case company. Herein, we also present results for RQ4 based on semi-structured interviews with twelve practitioners where PAM was used to discuss and categorise forty-three prototyping instances applied within the case companies. This additional empirical material and analysis enables us to improve on PAM and provide a revised answer to RQ1. Initial results on the prototyping practices of startups are published in Bjarnason ( 2021 ) based on four of the twelve interviews. This article is based on the full set of interviews and focuses on validating and improving PAM rather than on describing the overall prototyping practices of the companies, as in Bjarnason ( 2021 ).

Our prototyping aspects model (PAM) may be used to characterise and compare prototyping instances, e.g. regarding the scope and media of a prototype, how it is reviewed with users, and how this affects the cost–benefit balance of using the practice of prototyping for exploring the problem domain and developing software products. We believe that the additional refinements of the model presented in this paper, further improve the applicability of PAM and the model’s usefulness for practitioners.

The rest of this article is organised as follows. We describe related work on prototyping in Section 2 . In Section 3 , we outline our research method, and in Section 4 , we describe the case companies. The model of prototyping (PAM), which is our main contribution, is presented in Section 5 , in response to RQ1, based on previous work on prototyping and on our findings from the multi-case study. Section 6 contains results on RQ2-RQ4 that are discussed in Section 7 , before concluding in Sect. 8 .

2 Related work

While our point of departure and main area of competence is requirements engineering (RE), the research presented in this paper is based on previous work on prototyping within software engineering, RE, as well as, within human factors, usability and user-interaction design. Furthermore, prototyping is commonly used within software startups and in the early stages of software development to explore and validate user needs and requirements. In this section, we provide a brief introduction to related work on prototyping within human–computer interaction and design, agile RE, and for software startups. More detailed references are provided in Section 5 , as part of describing our prototyping aspects model.

2.1 Prototyping within human–computer interaction and design

Within product- and human–computer interaction design, prototyping is commonly used to design and to evaluate the user interface by “ producing or building a model or mockup of a design that can be manipulated and used … to simulate a user experience ” and thereby test this experience without having “to build the real thing” (Hartson 2019 ). While prototyping is mainly used to support design and evaluation within usability and user interaction design, several other advantages of the practice are described including its role in supporting communication and creativity. Prototypes can facilitate communication of design ideas by providing “a concrete basis for discussions” between developers, designers, and users (Budde and Zullighoven 1990 ). As such, prototypes “ serve as a vehicle that enables users and designers to develop a common language ” (Hakim and Spitzer 2000 ). Thus, the use of prototypes can stimulate user involvement, and support marketing and selling new product ideas to customers and to (internal) management. Many of these benefits and uses of prototyping are covered in our model by the aspect of Purpose (see Section 5.1 ) .

Prototyping can be performed throughout the design process starting with simple sketches that evolve into wireframes and into gradually more complete representations of the design. The variation in prototype richness and scope was characterized by Nielsen using the terms breadth and depth (or detailing) of functionality represented by the prototype, where a horizontal prototype provides a broad set of features but with a low degree of refinement (depth), while a vertical prototype can provide detailed (deep) representation of one feature only, i.e. a narrow breadth (Nielsen 1993 ). Other variants of prototype scope include local prototypes that focus on specific issues and are thus both shallow and narrow, and T prototypes that represent a broad set of features but only details (depth) for a few parts (Hartson 2019 ). We base one of the aspects of our model ( Scope ) on this terminology and include the dimension of breadth, while expanding the dimension of depth to distinguish between the refinement of one or more facets, namely functional, visual, interactive and data, see Section 5.2 .

The term fidelity is used to indicate how close to the final product the prototype is with regards to appearance and interaction (Tullis 1990 ). Low- versus high-fidelity prototypes are often used for different purposes within the design community. Low-fidelity prototypes can convey a general look of an interface and are often used to communicate, educate and inform, while high-fidelity prototypes are more expensive to build they can be used to test and evaluate further details of the design, and even serve as a basis for development of the product under design (Rudd et al. 1996 ). In our model, fidelity is represented by a combination of the two aspects Scope (see Section 5.2 ) and Prototype Media (see Section 5.3 ).

Research on the use of prototyping in generating new user interaction designs found that prototyping multiple design in parallel and sharing these led to more creative and better final designs (Dow et al. 2011 ). In our previous research on the use of prototyping in startups, we found that very few startups work with parallel exploration and prototyping, and that those who did work with parallel prototypes tended to have a background in usability and human-interaction design (Bjarnason 2021 ). This dimension of the practice of prototyping is covered in our model by the aspect of Exploration Strategy .

Finally, prototypes can be used as a means to present/demonstrate a product idea, or to evaluate a design through allowing users to interact with the prototype. Such interactions often occur in a meeting or lab settings, e.g. during usability testing, either through free testing or by providing scenarios or other protocols for the user to follow (Tronvoll et al. 2017 ). Evaluating prototypes in “the wild”, i.e. in the environment of the final product, provides a more realistic setting for evaluating a prototype both in terms of the actual physical environment and w.r.t. the people available (Hendry et al. 2005 ). These dimensions are covered in our model by the aspect of Prototype Use, see Section 5.4 .

2.2 Prototyping in agile requirements engineering

Prototyping has been identified as a practice commonly applied within agile software development to manage rapidly evolving requirements by the practices’ ability to support customer communication, and for validating and refining requirements (Ramesh et al. 2010 ). Käpayaho and Kauppinen report from a case study of using prototyping for user interface development at a large retail company applying an agile approach. Their findings indicate that prototyping addresses several challenges within agile but also identify a need to complement prototyping with additional practices. They observed that prototyping helped with challenges related to managing with very little requirements documentation, such as intangible and unaligned views and plans on what to develop. Furthermore, the quality of stakeholder communication at the case company was increased through prototyping and mutual understanding between the development team and the customers was reached faster. Prototyping also increased the motivation for requirements work since updating a prototype was considered more motivating than writing requirements documents (Käpyaho and Kauppinen 2015 ).

The benefits of using prototypes in the customer communication have also been observed by several other researchers including Ramesh et al. and Zink et al. Illustrating and communicating product requirements through a prototype reduces risks related to low requirements specification quality (Ramesh et al. 2010 ). As the prototype representation of the (future) product becomes more concrete with each iteration, the product requirements thereby gradually become more specific (Zink et al. 2017 ). In particular, an executable prototype provides a rich means of communicating requirements details, and reduces the risk of ambiguity, incompleteness and inconsistency in the requirements communication (Acosta et al. 1994 ).

While prototyping provides benefits to agile development, the practice also imposes risks. In particular, research shows that the use of production software in prototyping (rather than throw-away prototypes) incurs risks related to product quality such as scalability, security, and robustness (Ramesh et al. 2010 ). In this case, demonstrating a fully functioning (but early) version of production software may convey an overly positive view of development status to customers and other stakeholders, that in turn may lead to a push to release software prematurely, before sufficient quality has been achieved. Agile projects need to be aware of these risks and apply other practices to mitigate these. Käpayaho and Kauppinen suggest complementary practices such as clearly stating customer responsibilities, management of quality requirements, and consideration of the bigger picture (Käpyaho and Kauppinen 2015 ).

2.3 Prototyping in software startups

Software startups are new companies that develop novel software-based products or services, and that commonly apply prototyping (Bjarnason 2021 ) for exploring and evaluating new ideas and technology in a quick and relatively cheap way (Lauesen 2005 ), and thus, enables cost effective testing with users (Shepherd and Gruber 2020 ). This cost–benefit aspects is especially important to software startups who operate under very uncertain and resource-constraint conditions with the aim of exploring new business opportunities and develop innovative products (Giardino et al. 2014 ; Paternoster et al. 2014 ). While availability of open-source software and pay-as-you-go services provide software-based business opportunities, startups struggle to define solutions for which there is product-market fit and risk wasting time and resources on developing unsuccessful features (Giardino et al. 2015 ). One important success factor is to test the business idea early on to validate its viability in the market (Block and MacMillan 1985 ).

Software startups typically perform light-weight, informal and ad-hoc requirements engineering, in particular during the early stages of the startup venture due to limited resources (Giardino et al. 2014 ; Klotins et al. 2019 ; Nguyen-Duc et al. 2017 ; Terho et al. 2016 ). Requirements are initially elicited and prioritised mainly based on internal sources and on problems experienced by founders. The source of requirements gradually shifts as potential customers are identified, and prototypes are produced that can be used to elicit new ideas and priorities also from customers and other external parties (Nguyen-Duc et al. 2017 ; Terho et al. 2016 ). Tripathi et al. provide similar findings albeit with some more details and based on a broad coverage of literature and cases (Klotins et al. 2019 ).

Within startups, an early version of the final product is often used to validate product ideas with users (Alves et al. 2020 ; Tripathi et al. 2018 ), e.g. by demonstrating mock-ups to customers, as a cost-effective way to obtain market feedback. Thus, prototyping is used as a means to learn about the user and the market (Giardino et al. 2015 ; Paternoster et al. 2014 ), and to increase the chances of business success. However, the use of a live product version as a prototype poses conflicts between the need for quick feedback and a focus on product quality (Ciriello et al. 2017 ). One case study of prototyping in twenty Norwegian startups, by Nguyen-Duc et al., identified factors that affect the speed of prototyping, including the choice of prototype tools and components, varying competences, and the communication and involvement of customers and external stakeholders in the prototyping. They observed that the purpose of prototyping and how the prototype is used regarding customer involvement are factors that need to be considered when selecting prototyping practices (Nguyen-Duc et al. 2017 ).

In an earlier publication, we reported on the prototyping practices of four early-stage startups (Companies A-D, of this article) to understand how they use prototyping to elicit, prioritise, validate, and communicate requirements. In that study (Bjarnason 2021 ), we found that prototyping is commonly used within early-stage startups as a natural part of developing and validating a product, but also that prototyping is implicitly required to ensure funding of the startup venture. Internally, prototyping is used to explore and communicate detailed product requirements , while prototypes are used externally to communicate and validate product-market fit . Thus, for startups, prototyping plays a vital role in market validation and in obtaining paying customers. This validation is required by most investors and prototyping thus plays a critical part in securing the funding needed for startup ventures. Prototyping instances tend to be gradually refined from sketches and interactive mock-ups to fully functioning software versions, often MVPs. The more refined prototype versions (realistic mock-ups and early versions of production software) are more costly to produce and require software engineering expertise. Despite this, our case study found that many startups prefer using these more refined prototypes instead of prototyping using simpler media such as paper sketches or mock-ups. This preference, though more costly, may be due to that a more refined prototype appears inductive to a higher degree of customer trust. Thus, software startups face the challenge of balancing the cost of prototype scope and media against the gains and value that can be obtained from more refined prototypes.

3 Research method

We have addressed our four research questions (RQ1-RQ4) by designing a model of prototyping (PAM) using a combination of a systematic mapping study, a focus group at one case company, and a multi-case study of eleven additional case companies, see Fig. 1 . The mapping study provided a broad base of scientific knowledge that supported our design of the model. The focus group with practitioners provided an initial validation of our model. The model and its practical applicability and usefulness were further validated through semi-structured interviews at the other eleven case companies by using the model with practitioners to discuss and describe their prototyping practices.

Overview of the research method used to design and validate the Prototyping Aspects Model (PAM)

The model was designed iteratively in multiple steps. The initial design was performed by the last two authors resulting in a draft of the model. This draft was then revised by the first author after re-analysis of the literature and after using the model with the initial case companies resulting in the first version of the model (published in Bjarnason et al. ( 2021a )). The first version was then further validated by the first author through semi-structured interviews with practitioners from eleven startup companies and revised based on analysis of the prototyping instances described by the interviewees. The changes to the model were discussed and agreed among all three authors resulting in the second version of our model, which is presented in this paper.

3.1 Systematic mapping study

We performed a systematic mapping study based on guidelines by Petersen et al. (Petersen et al. 2015 ) to explore and draw from the current body of knowledge on prototyping to compile and provide an overarching view of prototyping methodology based on current knowledge. The literature review was guided by our research questions RQ1-RQ3 and consisted of searches , study selection , data extraction , and data analysis . The list of scanned articles and our categorisation of the ones included are provided on-line (Bjarnason et al. 2021b ) to enable other researchers to validate and to facilitate replicating the analysis of our systematic map.

Searches were defined iteratively in-line with RQ1 through test searches and by consulting with two experts in prototyping. The initial searches were combinations of “software prototype”, “software prototyping”, “prototype”, and “prototyping” that yielded large amounts of hits (almost 80.000). In a second iteration the test searches were limited to the terms “prototyping agile”, “requirements engineering prototyping”, “agile requirement engineering prototyping”. To further guide our study and help focus our searches, we consulted two experts in user experience and design, i.e. one senior manager at the initial case company (Telavox) and one senior researcher in user experience at Lund University. These experts provided insights into how to extend the searches beyond software engineering, thereby increasing the quality of our searches.

Test searches were performed in three search engines, of which two were selected for our review. We selected Lund University Libraries search engine (LUBsearch) and ACM digital library and excluded Google Scholar. LUBsearch provides a broad base since it includes other search engines such as Scopus, IEEE Explore, and ScienceDirect. ACM was selected due to providing a set that suited the scope of our mapping study. ACM provides good coverage of software engineering in general, while complementing the content provided by LUBsearch. In addition, our early test searches with ACM resulted in identifying several articles that are highly relevant to our review.

The search strings were specified through an exploratory process with test searches and customised for each of the two selected search engines depending on their specific search facilities. For LUBsearch, we derived keywords from a smaller set of matching papers. The final search consisted of the search string “Prototyping AND (Fidelity OR Software Prototype OR Agile)” for the subject term, with the options “Peer reviewed” and in English. For ACM, we found that many articles lacked keywords and settled on the search string “prototype OR prototyping ” for the title. The final searches were performed in February 2020.

Study selection was performed through a gradually extended scan of title, abstract, and full text using inclusion and exclusion criteria to guide the selection decisions (see below), which resulted in identifying 33 primary studies. The two last authors each performed this selection on the set from ACM and LUBsearch respectively. They aligned their selection by comparing and reaching consensus on inclusion and exclusion for a random set of 10 publications. We included articles on prototyping from all fields and excluded articles that do not explicitly discuss prototyping methodology or prototype dimensions. We defined the inclusion criteria as articles published before February 2020 that cover meta-level or methodological aspects of prototyping or prototype use, since this is the aim of our main research question RQ1. The exclusion criteria were defined as articles that merely describe the use of prototypes without considering methodological perspectives such as principles and procedures for the prototyping, articles that are not peer reviewed or written in English, and duplicates of studies already included in the systematic map. We provide a demographic overview of our map in Section 6.1 including the number of articles per search engine and for each selection step.

Data extraction, data analysis and classification of general and specific items of the primary studies was performed by the last two authors, and then validated by the first author. The general items extracted were publication year and research field (RQ2), and the primary studies were classified according to these and according to research type (RQ3, Wieringa et al. 2006 ). The resulting classification of the primary studies is reported through describing the demographics of the systematic map in Section 6.1. Furthermore, specific items related to our main research question (RQ1) were extracted. These specific items consisted of the aspects of prototyping covered in the primary studies. We extracted the information specific to our enquiry on prototyping through reading the full text of each primary study and wrote a short summary of how it relates to our study. Initial categories, or aspects of prototyping, emerged and were identified based on these summaries, similar to open coding of grounded theory. When these aspects had been established, each primary study was classified according to these aspects.

3.2 Design of prototyping aspects model including initial validation

Our model was designed iteratively based on the open coding of our systematic mapping study. First, the commonly occurring aspects of purpose (why) and prototype scope (what) were included, followed by exploration strategy and prototype use (how). Each aspect was detailed into further facets through analysis of the primary studies. An additional aspect that was considered, but at this point excluded, was the method used to produce a prototype, e.g., paper prototyping or computer simulation. Since our focus was prototyping methodology, and since computer-simulated prototypes can achieve similar effects as paper prototypes, we decided to exclude this aspect from the first version of our model. However, a similar fifth aspect of prototype media was added in the second version of our model based on the insights gained from our multi-case study. For the first version of the model, an additional design iteration was performed to increase internal validity of our model.

We performed an initial validation of the draft version of the prototyping aspects model by re-analysing the articles in our systematic map (see below). This internal validation mainly led to renaming aspects, adding, and restructuring some facets. The aspect of strategy was renamed exploration strategy and the aspect of review method was renamed prototype use to more clearly reflect what these aspects represent. The facets related to validation & testing were grouped to provide a better overview of the purposes of prototyping. Two additional facets were added to prototype scope , namely interactive & haptic behaviour , and realistic data , both of which were observed in the original analysis, but initially not included. Finally, the facet of usage environment was added to prototype use since the feedback that can be obtained from demonstrating or testing a prototype may vary for, e.g., a loud or a dark environment compared to in a meeting setting.

The design of our model was further refined after subsequent validation with practitioners at Telavox and through a multi-case study of ten software startup companies. These validation steps are described below. The first and second version of the model are described in Section 5 , and the differences between the two versions are described and motivated in Section 6.3 based on our subsequent multi-case study.

3.3 Validation of draft version: Focus Group with Telavox and Reanalysis

The initial draft version of our prototyping aspects model was validated through reanalysis of the primary studies of our systematic map and through a focus group at a case company. Footnote 1 The reanalysis was performed to improve reliability and internal validation of our prototyping aspects model, while the focus group was performed to validate the relevance and usefulness of our model with practitioners. The first author performed triangulation of the model through an independent re-analysis of the primary studies of our systematic map including (re)coding the full text of the articles in NVivo. The differences were then discussed, agreed with the other two authors, and the model updated as described in this article.

The focus group was prepared by designing a focus group protocol structured according to the main activities or stages of a requirements process, namely Concept exploration, Eliciting customer needs, Identifying system scope & requirements, Validate and improve system scope & requirements, and Confirm system scope & requirements. These stages were inspired by Lauesen and correspond to the main activities of requirements engineering common to all projects, both traditional and agile (Lauesen 2002 ). For each stage, we considered the main RE goals and suitable prototyping instances, or scenarios based on two of the aspects of our model, namely purpose and prototype scope. These prototyping scenarios were discussed at the focus group, supported by a set of questions. The protocol was iteratively designed by the last two authors, reviewed with the first author and with a contact person at the company, and then improved upon. The focus group protocol is available in Appendix A .

The focus group was conducted with five practitioners from Telavox’ user experience team. This team elicits and details product requirements through prototyping of the user interface. The participants had degrees in either industrial economics or interaction design; two B.Sc. and three M.Sc. The focus group was managed, led, and moderated by the last two authors. To ensure equal airtime, the participants were given turns to initiate the discussion for each talking point. The focus group was performed on-line due to the ongoing Covid-19 pandemic. The meeting was recorded, transcribed, and analysed using open coding to identify information related to the different stages of requirements, and to the different aspects of the model. The empirical data from the focus group it is treated as confidential since it may contain company information and the participants are promised anonymity to encourage them to speak freely. Furthermore, the description of the case company and the case-specific results have been reviewed by the case company.

3.4 Broader validation: multi-case study

We performed a multi-case study of software startups using semi-structured interviews to gain insights into current prototyping practices. Software startups were selected as our object of study due to these companies frequent use of prototyping to explore and shape their business models and product offerings, as part of the Lean startup method (Ries 2011 ) where new products are designed through continuously building, measuring, and learning what the customers want and are willing to pay for (Olsen 2015 ). Through the case study, we explored the practical applicability of our model by using it to discuss and categorise prototyping instances, thereby validating the model further and identifying improvements to it (RQ4). The case study consisted of four stages, namely preparations , data collection, data analysis, and validation.

In the preparation stage , a case study protocol and an interview guide were designed (available in Appendix B ), and an initial set of case companies and interviewees were recruited. The interview guide is based on the first version of our prototyping aspects model (Bjarnason et al. 2021a ) and on previous research on software startups (progression model (Klotins et al. 2019 ), typical characteristics (Berg et al. 2018 ; Giardino et al. 2015 )). The interviews were designed to investigate how startup companies work with requirements and prototyping, and support a broad exploration of (possibly) influencing factors by covering questions about the interviewee and the company. After the first two interviews, it became clear that the terms prototyping and prototype are not uniformly understood, which caused some initial confusions at these interviews. The interview guide was then extended with a question about what prototyping means to the interviewee. Case companies and interviewees were selected through convenience sampling and consisted of startups recruited through local business incubators; Minc, VentureLab, SmiLe, and Ideon Innovation, of which some were previously involved in our RE course. The main criteria for selecting startups to include in our study, was that they use or plan to use prototyping for developing their business and/or software. In total, eleven startups (Company A-K) were investigated through twelve interviews with thirteen interviewees. The companies and interviewees are described in Section 4.2 . In general, one interview was held per company, except for Company D where the founder and the technical lead were interviewed separately. In addition, the interview at Company H was held with two founders of different profiles (at their request).

The data collection consisted of semi-structured interviews with ample opportunity for interviewees to speak freely and to ask follow-up questions. The interviews were held and recorded in a video conferencing system (Zoom), and each lasted for about one hour. At the interviews, we presented our prototyping aspects model (version 1) and opened up for a discussion of how their prototyping practices relate to the different aspects (purpose, scope, use, and exploration strategy of prototyping). At the beginning of each interview, the participants were informed that their participation is voluntary and that they and their companies can remain anonymise, if they so wish, and that the interview data is treated as confidential since it may contain company information. Furthermore, each participant was sent a draft of the resulting articles for the parts derived from their interview for validation purposes.

In the data analysis stage , the audio recordings were transcribed and analysed by applying thematic coding in several iterations using NVivo. In the first analysis iteration , codes representing the different parts of the interview were used, such as interviewee roles and background, company and product characteristics, startup life-cycle maturity and challenges, RE and prototyping practices. In the second analysis iteration , each interview transcript was re-read and prototyping instances mentioned by the interviewees were identified. The relevant parts of the transcripts for each prototyping instance were coded using codes denoting the case company and a sequence number for each prototyping instance. For example, for Company B for which three prototyping instances were identified, the codes B1-B3 were used to denote and tag the relevant parts of the interview transcripts for each of these instances. In the third analysis iteration , the interview data per prototyping instance was analysed and each prototyping instance was categorized according to the prototyping aspects model (see table in Section 6.3 ). Observations about how well the model corresponded to the described prototyping instances were noted in memos, together with descriptions of each prototyping instance, and illustrative quotes found in the transcripts. These memos were then used to report the results and to adjust the prototyping aspects model as described in Section 6.3 .

Finally, in the validation stage of our case study, the interviewees were asked to validate the descriptions related to their companies and the co-authors discussed and agreed to the revised version of the prototyping aspects model. All interviewees were contacted and asked to validate the information relevant to their companies, and to indicate if they wished their company name to be included in the article or not. The interviewees were provided with the prototyping instances identified for their startup, the memos related to these including the quotes extracted from the transcripts, and the company and interviewee characteristics reported in Section 4.2 . The interviewees responded, mainly by agreeing to the descriptions and by highlighting some minor misunderstandings and additional information, after which the manuscript was revised accordingly.

4 Case companies

The initial draft of the prototyping aspects model was validated through a focus group at Telavox. The first published version of the model was then further validated through the multi-case study of eleven software startup companies where twelve practitioners were interviewed about their use of prototyping.

4.1 Initial case company: Telavox

Telavox offers cloud-based Private Branch Exchange solutions. The company was founded in 2002 and currently has around 250 employees. The initial part of this study, i.e. the mapping study and initial design of the model, was carried out in 2020 at the company’s site in Malmö, Sweden, by the 2nd and 3rd author. At this site there are about 15 development teams for areas such as app development for Android and iOS, user experience, and web development. Product support is coordinated and provided via key account managers that work closely with customers. The company applies Scrum, and thus uses an agile development model. New and improved product ideas are explored and communicated through prototyping and general product statements. Product owners coordinate product development and prioritise product requirements. When an idea is ready for development, the appropriate teams are assigned user stories. The teams apply test-driven development and continuous delivery. Product scope is validated with key account managers and customers before release.

4.2 Case companies A-K: software startups

Eleven software startups (Company A-K) were investigated in our multi-case study, see Table 1 . Most of these companies were in the inception or stabilisation phase (Klotins et al. 2019 ) of their business ventures, while Companies E and J were in the growth stage (Klotins et al. 2019 ). In total, we interviewed thirteen practitioners at the case companies, mostly founders and co-founders, but also some product owners and technical leads. An overview description of our interviewees is provided in Table 2 .

5 Results: Prototyping aspects model (RQ1)

Our model characterizes the methodology of prototyping by five aspects: purpose of prototyping , prototype scope , prototype media, use of prototype, and exploration strategy , and is here described based on related work. Table 3 provides an overview of the model and the primary studies from our initial mapping study for each aspect. In this article, a revised version of the model is presented based on a multi-case study where the model was used to characterise prototyping instances. The changes to the first version of the model are described and motivated in Section 6.3 . Table 3 contains both the first version of the model (published in Bjarnason et al. ( 2021a )) and the revised (second) version (presented herein) to facilitate comparison.

5.1 Purpose of prototyping – why prototype?

The practice of prototyping can achieve many different purposes that often vary throughout the life cycle of a project. We have identified eight purposes, namely exploration , communication , incremental development , quality improvement , and validation & testing of problem–solution fit, product-market fit, technical feasibility, and usability . When prototyping, several purposes may be satisfied simultaneously, e.g., communicating the product idea to potential customers while also validating its market desirability. A project’s purpose of prototyping often evolves from exploration and communication to validation & testing (Ratcliff 1988 ).

5.1.1 Exploration & learning

Prototyping is commonly used to explore the solution space (Dow et al. 2011 ; Lim et al. 2008 ; Rahman et al. 2017 ; Wiberg and Stolterman 2014 ) and learn by experimenting with ideas (Budde and Zullighoven 1990 ; Chen et al. 1994 ; Tronvoll et al. 2017 ), and is a foundational purpose for any prototyping. Problems and new solution directions can be discovered and explored through prototyping and can lead to new ideas and direct further exploration (Lim et al. 2008 ). Such exploration of multiple solutions can mitigate the risk of overinvesting in a single concept (Dow et al. 2011 ). However, Schneider found that when users are left out of the process and prototype use is purely internal, only the developers learn (Schneider 1996 ). Lichter et al. suggest combining the purpose of exploration with communication and testing of product-market fit to clarify requirements (Lichter et al. 1994 ).

5.1.2 Communication: sales, alignment of requirements

Visions and ideas about a product can be communicated through prototyping, which provides a common language between developers and stakeholders (Budde and Zullighoven 1990 ; Ciriello et al. 2017 ; Rahman et al. 2017 ; Zink et al. 2017 ) and acts as an anchor for group communication (Dow et al. 2011 ). Prototyping thus facilitates presenting, discussing, and evaluating a product with external parties, such as customers and investors, and internally within a project, thereby supporting decision making (Budde and Zullighoven 1990 ; Raja 2009 ). Ciriello et al. note that prototypes can support requirements elicitation by clarifying problems early on (Ciriello et al. 2017 ). However, a prototype may convey an overly positive impression of the current status that can lead to unrealistic customer expectations and subsequent requests to evolve the prototype into the final system (Lichter et al. 1994 ).

5.1.3 Incremental development

One purpose of prototyping may be to evolve the prototype into a final product based on user feedback and priorities (Budde and Zullighoven 1990 ; Lichter et al. 1994 ; Schneider 1996 ; Toffolon and Dakhli 2008 ). In these cases, the prototype may be a pilot system or a partial product version such as alpha or beta, or an MVP, that is developed with the expressed intention of exploring or validating a solution option (according to our definition of prototype). In agile development, prototyping is often an integral part of the development process with regular feedback from users and other stakeholders (Fairley and Willshire 2005 ), e.g. through validation of software at end-of-sprint demonstrations. Thus, prototyping is often used within agile development to detail and validate requirements, and to reduce uncertainty (Bellomo et al. 2013 ). Fern et al. propose a prototyping methodology that covers throw-away prototypes as well as prototypes that will be developed further into the deliverable system (Fern and Donaldson 1989 ).

5.1.4 Quality improvement

Prototyping can be used to optimise quality aspects, e.g. by focusing on response times while all other behaviour is retained. Kordon observed that care needs to be taken to avoid measurement overhead and to ensure accuracy in the evaluation (Kordon 1994 ). Arano et al. propose the use of hybrid prototyping for improving quality aspects since this can enable measuring quality in early development stages prior to full implementation (Arano et al. 1993 ).

5.1.5 Validation & testing

One of the main purposes of prototyping is to validate requirements by testing a solution option with internal and/or external stakeholders (Ciriello et al. 2017 ; Tronvoll et al. 2017 ) for perspectives, such as problem–solution fit, product-market fit, technical feasibility, and usability. Problem–solution fit concerns the degree to which the envisioned product solves an actual customer or end-user problem. Prototyping can be used to investigate customer needs, validate and clarify customer requirements and tasks (Budde and Zullighoven 1990 ; Ciriello et al. 2017 ), and thus support a company in designing a solution to address customer problems. Product-market fit is a premise for business viability. Prototyping can be used to explores a product’s business potential from the customer perspective, and to assess the value of the product and likeliness of purchase (McCurdy et al. 2006 ; Zink et al. 2017 ) and the product’s ability to fit within time and budget constraints (Zink et al. 2017 ). The insights gained from such prototyping can be used to support business-related decision making (Ciriello et al. 2017 ). Technical feasibility concerns a system’s technical capabilities and the feasibility of realising requirements in the solution space (Budde and Zullighoven 1990 ; Zink et al. 2017 ), e.g. to operate at scale, to resolve structural uncertainties, or to fulfill security requirements. Proof-of-concept prototypes are built for this purpose (Tronvoll et al. 2017 ) and can be used to evaluate novel technical approaches (Fern and Donaldson 1989 ). Similarly, breadboard prototypes are used to investigate technical aspects, e.g. in circuit design, and to support system specification and coding (Budde and Zullighoven 1990 ; Ciriello et al. 2017 ; Lichter et al. 1994 ; Lim et al. 2008 ). Furthermore, Lichter et al. found that a prototype built to demonstrate feasibility can also support project acquisition (Lichter et al. 1994 ). Finally, Zink et al. found that feasibility testing requires a high share of the time invested in a project (Zink et al. 2017 ). Usability testing was reported by Zink et al. as the least common singular purpose for prototyping though it is often combined with other purposes (Zink et al. 2017 ). User interface design can be validated through prototyping (Derboven et al. 2010 ; Hakim and Spitzer 2000 ; Zink et al. 2017 ) and uncover various usability issues (Lim et al. 2008 ). When prototyping for usability testing, Hakim et al. recommends ensuring that metrics such as task completion time, can be captured and considering the protocol for use (e.g. associated instructions and data) since this may affect the results of the testing (Hakim and Spitzer 2000 ).

5.2 Scope of prototype – what to prototype?

The scope of a prototype describes the extent to which a prototype resembles the final product and is, in our model, represented by the breadth of the prototype’s functionality (Bruegger et al. 2009 ; Budde and Zullighoven 1990 ; Goldman and Narayanaswamy 1992 ; Hakim and Spitzer 2000 ; Lim et al. 2008 ; McCurdy et al. 2006 ; Tronvoll et al. 2017 ), and the facets of functional refinement (Bruegger et al. 2009 ; Budde and Zullighoven 1990 ; Goldman and Narayanaswamy 1992 ; Hakim and Spitzer 2000 ; Lim et al. 2008 ; McCurdy et al. 2006 ; Tronvoll et al. 2017 ), visual appearance (Budde and Zullighoven 1990 ; Goldman and Narayanaswamy 1992 ; Hakim and Spitzer 2000 ; Hendry et al. 2005 ; Jaskiewicz and Helm 2018 ; Lim et al. 2008 ; Liu and Khooshabeh 2003 ; McCurdy et al. 2006 ; Zainuddin and Liu 2012 ), interactive & haptic behaviour (Budde and Zullighoven 1990 ; Goldman and Narayanaswamy 1992 ; Hakim and Spitzer 2000 ; Hendry et al. 2005 ; Lim et al. 2008 ; Liu and Khooshabeh 2003 ; McCurdy et al. 2006 ) and data realism (Lim et al. 2008 ; McCurdy et al. 2006 ). Breadth represents the extent to which a prototype covers a product’s full functionality, e.g., all or only one product features (broad or narrow prototype), while the other facet represents the degree of detailing of this (deep or shallow) regarding functionality, visual appearance, interactive behaviour, and data realism. Functional refinement represents the degree of detailing for each function or feature. The visual appearance of a prototype concerns aesthetics, e.g., layouts, fonts, and user interface elements, and is interrelated to functional refinement since visuals are represented by functionality. The user’s experience is also affected by how well a prototype mimics a product’s interactive & (sometimes) haptic behaviour, and to what degree the available data can simulate normal and realistic use.

The scope affects the cost of producing a prototype and the feedback that can be obtained (Derboven et al. 2010 ; Lim et al. 2008 ; Liu and Khooshabeh 2003 ; McCurdy et al. 2006 ; Tronvoll et al. 2017 ). Thus, prototype scope relates to the purpose of prototyping. A rough paper prototype of one product feature is an example of a prototype with a small functional breadth and a low degree of refinement, i.e. a narrow and shallow prototype. In contrast, a minimal viable product may have a broad functional scope by representing all menu options, but varying degrees of refinement (and depth) depending on the implementation status for each feature and for the overall product, e.g., the user interface, and for the amount of user data provided in the prototype.

Yasar describes three perspective to consider when prototyping: role (of product in a user’s life), look & feel, and implementation (Yasar 2007 ). These perspectives can help pinpoint the purpose of prototyping and the scope of a prototype. In our model, look & feel is captured by the facets of visual and interactive & haptic behaviour. The facet of realistic data relates to look & feel, but primarily to the perspective of role since the degree to which provided data simulates actual use affects the user’s ability to relate to their situation.

There are different views on how the degree of prototype refinement, e.g., for visuals, affects the type and amount of feedback that can be obtained. Sefelin et al. reports on a study on usability testing with prototypes where the number of suggestions and critiques do not significantly differ between prototypes with a high vs. low degree of refinement, i.e. between deep or shallow prototypes. However, the users preferred testing with a prototype with a high degree of interactive refinement since this provided freedom to explore the system by themselves (Sefelin et al. 2003 ). Several researchers report that some usability issues can not be discovered unless the prototype provides a broad functional scope with a high degree of refinement, in particular for visual appearance and interactive behaviour (Liu and Khooshabeh 2003 ; McCurdy et al. 2006 ). However, a mix of low and high degrees of refinement is suggested as more economical, e.g., for usability testing. Yasar even reports that simpler prototypes with narrow functionality scope and a low degree of refinement are cost effective and useful for validation purposes and to capture major issues (Yasar 207 Similarly, Bellomo et al. describe how a company consciously selects the breadth and degree of refinement of prototype scope to match the prototyping purpose, and to validate the concept in focus (Bellomo et al. 2013 ).

The terms hi and lo-fidelity are often used to describe resemblance to the final product. Our terminology is more fine-grained and provides a more precise way of categorising prototypes by their breadth and degree of refinement w.r.t. different facets. We believe that our model can support informed decisions of which prototype scope that is required to meet the desired purposes of prototyping.

5.3 Prototype media – what media to use for prototype?

A prototype can be constructed using a range of techniques and media, including sketching on paper (Hendry et al. 2005 ) or in PowerPoint, using a prototyping tool to produce a wireframe or a mock-up (Liu and Khooshabeh 2003 ), or using an early version of product software as your prototype (McCurdy et al. 2006 ; Sefelin et al. 2003 ). Even a video or a physical model is included in our definition of a prototype, and can be used to, e.g., validate product-market fit, or to communicate with stakeholders.

While there is a correspondence between different kinds of prototype media and the scope of these, the cost of producing prototypes, the gains and types of learning that can be obtain from these varies with the media (McCurdy et al. 2006 ; Zainuddin and Liu 2012 ). The choice of media may also affect the ease with which the prototype can be evaluated in some environments (Hendry et al. 2005 ). Sefelin et al. compared the use of paper sketches to that of computer-based prototypes of identical breadth and level of refinement, i.e. different media was used to represent the same prototype scope. These prototypes were used for usability testing, and the results showed no major differences in the obtained feedback depending on prototype media. However, the computer-based prototypes tended to yield more comments on graphical details, while the paper sketches stimulated participant to “ a greater willingness to draw their suggestions ” (Sefelin et al. 2003 ).

Liu and Khooshabeh made a similar comparison between paper sketches and computer-based (interactive) mock-ups (Zainuddin and Liu 2012 ). They found that while paper sketches provide more flexibility for designers in early stages, this kind of prototype media requires more effort to use with larger sets of users, e.g., to manually simulate interaction. Furthermore, Liu and Khooshabeh also found that interactive computer-based mock-ups yielded more feedback when used for user testing.

While paper sketches are often cheaper and faster to construct, Sefelin et al. suggest selecting prototype media based on a number of factors, including the competence of those constructing the prototype, the abilities of the available prototyping tools, and the envisioned continuation of the prototyping. Finally, Hendry et al. found that paper sketches were a good media for eliciting and validating requirements in actual environments and with some stakeholder groups, such as in public locations where people of all age categories can be accessed (Hendry et al. 2005 ).

5.4 Prototype use – how to use the prototype?

This aspect concerns how a prototype is used to achieve a purpose and covers who the reviewers are (internal, external, or with family-friends-and-foes FFF), if direct prototype interact is used (or not), what review approach that is used (scenario based or free), and in what environment the prototype is presented and reviewed. With these four facets of prototype use the main uses of prototype identified in our literature review can be represented, namely internal prototype use without any user presentation (Budde and Zullighoven 1990 ; Dow et al. 2011 ; Schneider 1996 ), prototype demonstrations (Bellomo et al. 2013 ; Heisler et al. 1989 ), scenario testing (Tronvoll et al. 2017 ), and free testing (Heisler et al. 1989 ; Tronvoll et al. 2017 ). Lichter et al. emphasise the importance of user involvement in prototyping (Lichter et al. 1994 ) and several researchers address challenges with this (Ciriello et al. 2017 ; Lichter et al. 1994 ; Zainuddin and Liu 2012 ). Zainuddin and Liu propose a systematic approach to building prototypes with user involvement and capturing user feedback during prototype use (Zainuddin and Liu 2012 ).

The choice of reviewers affects the learnings that can be obtained from prototyping. Purely internal use of a prototype can support brainstorming and allow designers and engineers to generate and organize ideas. While internal use allows a project to focus on ideas and possibilities rather than on external expectations, there is a risk that knowledge remains with, e.g., the developers. Schneider suggests better capitalization of pure internal prototyping by capturing and documenting this knowledge (Schneider 1996 ). Using a prototype with external parties often increases the cost due to increased expectations and demands on quality. In addition, the pre-knowledge of customers and users affect the feedback that can be obtained, and Cafer and Misra suggest adapting the review method based on the customers’ cognitive abilities (Cafer and Misra 2009 ). Several of the startups included in our multi-case study described showing prototypes to family-friends-and-foes (FFF) before going to potential customers, as a means to obtain friendly feedback.

There is usually no direct reviewer interaction with the prototype during demonstrations. Instead, the presenter shows the prototype by operating it live or by using prepared media such as videos or photos. For more refined prototypes, e.g. computer-simulated mock-ups with a high degree of interactivity reviewers may be encouraged to interact and directly use the prototype, and thereby provide richer insights and user feedback. Such interaction can be achieved also with simpler prototypes, additional human effort is required to simulate, e.g. button presses, when using prototypes with a low degree of refinement for interactive & haptic behaviour (Liu and Khooshabeh 2003 ).

Different approaches can be used when evaluating a prototype, e.g. a scenario-based approach which can support reviewers in relating the prototype to their own problems and usage scenarios. When using prototyping for testing purposes, scenarios can be used to guide users through instructions and steps. Similarly, Ciriello et al. suggests using storytelling to increase customer involvement (Ciriello et al. 2017 ). An alternative approach when testing prototypes with external parties or FFFs, is to encourage free testing as for example for beta testing when minimal instructions are provided.

The usage environment for a prototype can affect the outcome (Grevet and Gilbert 2015 ; Lichter et al. 1994 ; Tronvoll et al. 2017 ) both in the possible feedback and the representation of subjects in the review. If a product’s natural habitat is very different from a lab or meeting room environment, this can affect the feedback, e.g., for a system intended for a loud environment. Using a prototype in conditions similar to that of the final product increases the chance of uncovering new setting-specific requirements (Lichter et al. 1994 ). Hendry et al. performed usability testing among home-less people and concluded that using a prototype in the streets enabled them to reach otherwise inaccessible user segments (Hendry et al. 2005 ). Thus, the environment of use may result in some future users being underrepresented in the prototyping, further limiting the feedback.

5.5 Exploration strategy – how to traverse the solution space?

This aspect concerns strategies used to traverse the solution space over time. The exploration strategy determines which instances to pursue, how resources and decisions are organised, and how uncertainties are managed. The needs and goals may change for each iteration, which can enable focusing on specific product aspects or parts. Within one iteration, several concurrent solution paths can be pursued in parallel, which can stimulate innovation (Dow et al. 2011 ) and avoid imposing unnecessary limitations (Budde and Zullighoven 1990 ), but also cause contradictions between prototypes and complicate decision making (Jaskiewicz and Helm 2018 ). When iterating over multiple parallel prototyping instances, the length of the iterations may impact the extent and quality of the prototyped solution option and thus the prototyping effectiveness (Tronvoll et al. 2017 ). Jaskiewicz et al. found that while longer iterations promote focusing on certain aspects, multiple short iterations can lead to a more diverse but superficial set of solution options (Jaskiewicz and Helm 2018 ).

We model this aspect using the three facets: single vs parallel exploration , iteration focus , and iteration size These facets are partly based on the four strategies identified by Tronvoll et al., namely point-based , parallel (or set-based solution arrays), optimisation (or performance set investigations), and flexible exploration (Tronvoll et al. 2017 ). Our facet of single exploration corresponds to Tronvoll’s point-based strategy where resources are focused on one single solution path and where alternatives are considered before being cemented. If the solution appears unsuitable it may be necessary to discard the prototype and redo the process. In contrast, several potential solution options are pursued simultaneously when using parallel exploration . Decisions are managed by merging prototype variants either continuously or during stage-wise iterations.

Tronvoll’s other two strategies, namely optimisation and flexible exploration can be viewed as either single or parallel exploration , though the focus and size of the iterations varies, which is captured by these two facets of our model. Tronvoll’s optimisation exploration uses parallel exploration with a focus on feature or product level, and with gradually decreasing iteration sizes . For such exploration, the number of simultaneously investigated options are kept to a minimum to reduce cross-contamination and over time the solution converges by applying a systematic performance evaluation of the optimised characteristics. Decisions surrounding the solution options are postponed until they can be validated and when encountering several alternatives, only the most promising one is pursued. Solution options are judged by performance, which affects the evaluation. Such prototypes should be evaluated against a range (weak-strong-good quality) and may involve additional factors that affect each other. For example, a certain design-level requirement may provide an efficient way of saving text, but drastically increase the size of the saved file.

Finally, Tronvoll’s flexible exploration strategy corresponds to a single exploration strategy where the solution options are based on best-guesses and on facilitating the change of quality requirements as the work progresses. This approach is well suitable for agile development where the solution option is iterated, evaluated, and changed as required.

6 Underpinning results: map demographics, focus group and multi-case study

We will now present the results used to design the prototyping model aspects presented in the previous section. In this section, we report on the demographics of our systematic map, the results from the focus group and from the interviews of our multi-case study. The model was initially designed based on previous literature, then validated and adjusted through the empirical data of the focus group and the multi-case study.

6.1 Demographics of Map (RQ2 and RQ3)

Our systematic map consists of thirty-three primary studies published between 1988 and 2018, see Table 4 . The majority of these (17) were within software engineering (in general), 11 within human factors (HF) & design, and 5 within requirements engineering (RE), see Table 5 . The number of articles appear to have increased somewhat over the years from on average 1 to 2 articles a year, except around the year 2000. The increase is mainly within the areas of HF & Design and RE. The research type for each paper was classified according to the categories by Wieringa et al. (Wieringa et al. 2006 ): (1) Evaluation research investigates a problem or technique in practice and provides new knowledge of causal or logical relationships, (2) Solution proposals present a solution without a full-blown validation, (3) Validation research presents a solution proposal validated outside of industrial practice. (4) Philosophical papers sketch new theories or frameworks, (5) Experience papers describe personal experiences and may contain anecdotal evidence.

Around 40% of the articles (14 of 33) are based on in-vivo research and empirical data from industry ( Evaluation) , see Table 6 . Another 40% (12 of 33) propose solutions based on in vitro (only) validation or theoretical reasoning ( Solution proposal and Validation type). Our systematic map indicates an increase of in-vivo research concerning prototyping over the past 10–15 years.

6.2 Focus group results: Validation of Initial Version of Model Footnote 2

Practitioners discussed the use of prototyping to perform requirements-related tasks such as elicitation, specification, and validation at the focus group by reflecting on scenarios based on our prototyping aspects model. In general, the participants agreed to the scenarios, that they correspond to a typical way of working and to their prototyping practices, which they see as a good way of working with agile software development. There was some disagreement about prototyping in early project stages. Several participants said that prototyping is time consuming and difficult, and rarely used during early phases and at the first meetings with customers. Instead, these participants preferred stakeholder interviews and competitor analysis to understand the problem domain. In contrast, two participants said that early prototyping, e.g., through sketching, can be useful in understanding the problem. Another participant said that early prototypes can be beneficial in loosely defined projects and facilitate discussions about user expectations. They had used simple prototypes in the form of sketches, which helped designers and users understand what they were discussing and assisted in framing the role of the system.

6.2.1 Purpose

The participants described several purposes for prototyping. Its value in exploration & learning was described as “ the more you work with design [of the prototype] the more you know what works for you ”. Footnote 3 Several participants described the value of prototyping for internal use and the importance of “ building understanding before you start thinking about solutions .” One participant described creating personal prototypes to get an overview of the future system, essentially working as a specification . Another participant said that prototypes are more useful than a formal specification to communicate requirements. Two participants described the importance of exposing developers to the prototype as a way of testing the technical feasibility and, if possible, adjust the requirements to “ the easiest way … to implement the solution ”. Another participant described the importance of communicating and involving developers by testing prototypes and discussing requirements, although some developers just want information on “ the components I need to build .”

6.2.2 Prototype scope

The focus group provided validation for this aspect of our model. They recognised the relevance of considering breadth and depth of functionality, and refinement of all our identified facets (of the first version of our model) except for data richness, which was not mentioned during the focus group. The participants described that in early project stages, they prefer to use simpler prototypes, either paper sketches with shallow functionality and low refinement or a prototyping tool that supports producing prototypes with shallow functionality and mid to high degree of visual refinement. These simpler prototypes provide “ a way to understand the problem ” and to “ organize and explore … [the solution] space. ” Another participant implied that feedback on visual style could be avoided by using prototypes with a low degree of visual and interactive refinement such as wireframes that encourage feedback on functionality. In contrast, for more complex solutions “ you get value out of being able to click around ” and a high degree of interactive refinement is preferred. Some participants said that prototyping must involve interactivity and that they “ wouldn’t do a good job … [without] interactive prototypes ”. This illustrates the need for a common definition of prototyping. The fact that prototype scope was spontaneously discussed by our participants demonstrates that our model can provide support for this.

6.2.3 Prototype use

The participants confirmed all four review methods included in the first version of our model, i.e., internal use, demonstrations, scenario-based, and free testing. Several participants described using prototyping to test new ideas with colleagues and that they first show the prototype “ within the team and not with users ”. This was especially the case for early prototypes that one participant rarely showed to customers, despite experiencing challenges in discussing with users without any visuals. When prototypes are shown to external stakeholders it is often as a demonstration that takes place after first having discussed their problem and current situation. One participant said that users are only occasionally involved in interacting with prototypes while this is done “ all the time ” within the project. Instead, users are involved in beta testing “ all the time ”, which some, but not all, of the participants see as prototyping. One participant described the use of scenario testing a few times a year, e.g., with new employees or students.

Several participants mentioned the importance of structuring feedback sessions to “ get feedback on the right thing ”, i.e. relevant facets. One participant said that it is easier to get feedback on what is good rather than what needs improving. Another participant described that “ talking with just one person isn’t enough ” since stakeholders may have preconceived solution ideas. One participant suggested using solution-agnostic questions to encourage users to consider what problems the system should solve and to discuss requirements at domain and product level.

6.2.4 Exploration strategy

Our participants validated the four exploration strategies included in the first version model and described that the strategy is varied throughout development stages and for different prototyping purposes. One participant described the importance of creating several parallel prototyping variants to avoid getting stuck in a single solution too soon. She said that using multiple prototypes helps users’ express the direction the system should take, and thereby involves them in determining product scope and requirements. Similarly, another participant described that they “ throw a wide net with many different options ” in initial project stages. In addition, several participants said that building several variants of a prototype is useful for exploring alternative solutions and demonstrating these to stakeholders to “ collect ideas about how to proceed and what path to choose ” and thereby optimise the solution. Furthermore, one participant described that in a later stage, an incremental and flexible exploration strategy is needed to “ fit in [customer requests] with the concept you have in mind ”.

6.3 Multi-case study results: broader validation and adjustments (RQ4) Footnote 4

We validated the prototyping aspects model through our multi-case study with eleven startups. The model was presented to practitioners in the interviews and used in the analysis of the interview data to categorize the prototyping instances described by our interviewees. In total, we identified forty-three prototyping instances among our startups, see Table 7 . Prototyping in these startups ranges from using simple sketches through mock-ups, to using source code to explore and to obtain feedback on early software versions. We have categorised the identified prototyping instances using our prototyping aspects model. In this section, we describe and motivate changes to our model based on our empirical data. The first and the revised (second) version of the model are shown side-by-side in a table in Section 5 . We recommend that the reader consults that table while reading this section. Initial results of the prototyping practices of startup Companies A-D are available in a separate publication (Bjarnason 2021 ).

6.3.1 Purpose

This aspect mostly worked well in discussing and categorising how the startups use prototyping, and only two minor revisions are made to this aspect, namely for the purposes Communication and Validation .

One of the main purposes of prototyping is “ as a tool for selling products to customers ” (Company B) and is thus used in communicating “ about sales… to make them understand that there is a need ” (Company A). For this reason, Company A had invested “quite a lot of work” in building a high-fidelity interactive mockup (prototyping instance A2) which “ is fake, but quite good… communicates that we know and builds trust… Based on that we get investors. ” (Company A). For these reasons, we adjust the detailing of the aspect of Purpose in our model to highlight that communication includes sales and marketing, thus Communication & Alignment is changed to Communication: Sales, Alignment .

For Validation & Testing , the boundaries between Market viability and Business viability were hard to determine for the prototyping instances, and the difference between these two is unclear. Instead, we replace these with the terms Problem–Solution fit and Product-Market fit (Osterwalder et al. 2014 ). This will more clearly separate between prototyping to ensure that the product matches the needs of users and customers by provides a solution that addressing their problem, and prototyping to validate that customers are willing to pay for your product and thus the existence of a viable market and business opportunities. Different prototyping instances may be used for each dimension. For example, the mock-up C2 was used to “explore what is required [by the market]”, i.e. product-market fit, while the beta release C3 “to 2–3 [customers] is used to tune [the implementation] further” (Company C) and thus improve the problem–solution fit. Product-Market Fit is especially important for startups that “ must produce a prototype to be able to confirm that you have buyers, and thus revenue… [before] incurring more costs without knowing that it is sellable ” (Company A).

Furthermore, while we observe that prototyping is not directly used within our startups for the purpose of improving quality, we retain the purpose of Quality Improvement in our model. The lack of support for this prototyping option in our case study is likely due to the nature of early startups, rather than lack of relevance in general of this purpose. In startups, the main focus is on establishing a viable business model with a product solution that matches customer needs and problems, and developing initial product version to realise their solution ideas. Thus, prototyping is primarily used for sales & marketing (Communication), internal exploration of the solution domain, feasibility testing of technical aspects of the solution design, and to validate the product-solution and product-market fit of their business model.

6.3.2 Prototype media

When discussing and analysing the prototyping in the startups it became clear that the media used to represent a prototype, e.g. paper or PowerPoint sketches, computer-generated mock-ups, or source code software, was an important aspect to consider. A similar aspect, related to paper prototyping, was discussed by the authors during our original design of the model, but discarded at that point in time since the aspect of prototype scope appeared to be sufficient. However, we reintroduce this aspect in our model since the multi-case study indicates that the choice of prototype media affects the costs and benefits of prototyping. Our interviewees described that they “ have chosen to produce a mock-up due to cost and time aspects ” (Company A). We believe that this is an important aspect of prototyping that can further support practitioners in making informed decisions about the kind of prototype media to use. In particular for early stages of product development, we want to highlight the cheaper and easier kinds of prototype media such as videos and interviewing, which some of our startups find very useful. For example, in prototyping instances H1, J6, I1 and I2, videos are used for sales & marketing (communication) and for validating product-market fit. One interviewee described videos as a way “ to present products far before they are ready ” (Company I). This interviewee also described prototyping instance I3 and interviews as a means of prototyping “ even without anything to show ” by “ talking to people about how they solve this problem today ” (Company I). Similarly, the now growing startup Company J used videos in social media channels for sales & marketing purposes; “ funny videos that became a bit viral ” (Company J). This startup also uses interviewing for exploring the problem domain. In the early prototyping instance J1, they “ asked people [users] … [and] that became an important signal ” that their solution idea was viable, and later used a similar approach in prototyping instance J5 to validate their revenue model.

6.3.3 Prototype scope

When discussing the scope of a prototype, the degree of refinement of the visual appearance, interactive behaviour, and functionality worked well. For example, one interviewee described that “ we do not focus that much on interactivity at the beginning ” (Company F). In addition, five interviewees expressed the importance of realistic data, e.g. to capture behaviour around “ errors and bad data ” (Company I). Also, “ it is really important to use that [realistic data] otherwise your [customers] probably get stuck on that” (Company F) and that when “ the data is fairly realistic … [it] can be shown to customer without any problems ” (Company G). Another startup that markets advanced AI algorithms described that they “ need accurate data to back up [our algorithm]”, and thus demonstrate their solution by using “ actual [customer] data ” (Company H).

The aggregated dimension of broad vs narrow functional scope was harder to convey, though we saw examples of both broad and narrow prototypes covering many vs a few features. For this reason, we revise this aspect to include Breadth as a stand-alone dimension, alongside the dimensions that can be refined, i.e. functionality , visual appearance , interactive & haptic behaviour , and data realism . We believe this provides a clearer and more easily comprehensible model for describing the scope of a prototype.

6.3.4 Prototype use

This is the aspect where the previous structure was less aligned with how the interviewees described their use of prototypes. The dimension of usage environment worked well, even though many of our startups only tested their prototypes in meeting settings. In part, this is due to challenges in accessing the actual usage environment. For example, for Company F where testing in a live environment is “ difficult since our product targets clinics… varies a lot how open they are with this .” Other reasons for this, are likely cost and awareness of when testing in a live environment is important. One interviewee believed that “[environment] is an important aspect of mobile solutions, to test them in their correct context ” (Company I).

We find that the (previous) dimension of review method is not fine-grained enough to categories the identified prototyping instances. Instead, we replace this dimension in our model with the three dimensions reviewers (who receives or gives feedback on the prototype), prototype interaction (directly with prototype or not, e.g. when demoing), and review approach (scenario-based or free). This corresponds better to how the interviewees describe their use of prototypes.

All interviewees described with whom prototypes were used, i.e. Reviewers, and often the same prototype is used with multiple kinds of reviewers. For example, the sketches of prototyping instance A1 were “ tried out on ourselves and on people in our proximity, then we started developing ” (Company A). The later more refined “ mockup [A2] then becomes a blueprint for the product to be built by developers ” (Company A). Thus, reviewers are often internal within the startups and development team, as well as, external such as customers, funders, or product users. In addition, some startups described using family and friends (so called FFF , Family, Friends, and Foes) for obtaining feedback on prototypes. For example, for Company J, the prototyping instance J3 used “ FFF when we have made something new. What do they think? ” and for J4 they “ grabbed people in the corridors and ask them what they think… [and then] realised that we had missed several things ”(Company J).

There were also variations in how these reviewers could interact with the prototype. When demoing, e.g. of sketches or early mock-ups, there is no reviewer interaction with the prototype. There are also examples of choosing not to allow reviewer interaction with the prototype since “[customer] awareness is too low ” (Company A) and “ the market is still not that aware of their needs…to have the level of understanding to be active [in giving constructive feedback on premature mock-ups]” (Company E). For more refined prototypes w.r.t. functionality, visual appearance, and interactivity, the users may be encouraged to try the prototypes out for themselves. For example, the fully functioning product prototype of B3 was available to the general public and “ customers get to use our stuff… [allowing the startup to] follow up how they are used in the field ” (Company B). Similarly, users are encouraged to interact directly with the mock-up of prototyping instance F2 “ to see reactions…. how they reason, how they select and react to things ” (Company F), which provides the startup with rich feedback. We have added a dimension of Interactivity to our model to cover this dimension.

Finally, while scenario testing was included in our initial version of the model, our interviewees primarily described the use of scenarios as an approach that could be applied both when demoing a prototype, or when allowing users to interact directly with the prototype. For example, when demonstrating the paper mock-ups of prototyping instance F1, the startup would “ try to get them [customers] to think about what they do then [in that scenario] (Company F). Similarly, a startup that provides a technical solution uses scenarios to help non-techie customers “ see the value [in the solution] …[through] presenting these cases to them… and avoid just clicking us through ” (Company G). A similar approach is seen for prototyping instances K4 and K5, where the startup “ goes through a simple scenario, explains the challenges, and shows the solution ” (Company K). Thus, we added Review approach as a dimension with the options scenario-based or free, where free, covers both free testing, e.g. for beta releases, and demos without any clear scenarios.

6.3.5 Exploration strategy

This final aspect was the hardest to discuss with our interviewees, and the one with the least number of responses with the exception of parallel exploration . Using multiple prototype versions in parallel was a clear and understandable strategy, though most of our startups stated that due to resource constraints they tend to focus on one version at a time. As one interviewee said: “ we go with one [option] first since we are a small team ” (Company H). Another interviewee mentioned that “ we always try to sketch different [options] ” (Company F) though without giving any concrete example of this (and therefor not seen in any of the reported prototyping instances.) A third interviewee said: “ focus one thing at a time and learn about. But, in some cases [parallel exploration is useful] … e.g. for a pricing model ” (Company I). Thus, we have modified the aspect of exploration strategy to include three facets: single or parallel exploration , iteration focus ( business , product , feature or optimisation level), and iteration size . The two facets related to prototyping iterations were indicated by a few of our interviewees. For example, one interviewed technical lead advocated the “ need to iterate slowly… [to ensure] what the stakeholder originally wished for ” (Company D). Initially, startups often iterate at the level of the product, as was described by the interviewee from Company E: “ we have an idea and want to see if it works. ” And, then move towards “ more gradual development … and more structured feature growth from the very basic experimental ideas up to the ready-for-market products ” (Company E).

The dimension of iteration size replaces the other three exploration options (of the initial version of our model), namely point-based , optimisation , and flexible exploration . The difference between these previous strategies mainly connects to the size of the change between prototype versions. Prototyping with an optimisation strategy, and thus with small changes between versions, could be conveyed through giving the Purpose of Quality improvement. Similarly, the difference between point-based and flexible could also be deferred to the size of changes and connected to the stage and maturity of a product or an idea. For example, in the early stages of a startup prototyping could be used for broad exploration of a suitable solution approach, in which case flexible exploration appears suitable. As a startup and their solution approach matures, they are more likely to want to explore more fine-grained variations, e.g. in what features or what user-interface design to implement, in which case point-based exploration, or even optimisation exploration is more relevant.

7 Discussion

We have investigated the current body of knowledge regarding prototyping methodology through a systematic mapping study and a multi-case study of eleven startups. Our research identifies five main aspects of prototyping (RQ1) that cover the Why? and How? and What? of prototyping. These aspects can be used to characterize prototyping instances and thereby provide practitioners and researchers with a model (see Section 5 ) that can help them to reflect on, analyse, and improve their prototyping practices. This was the case at our focus group and in the twelve interviews of our multi-case study where the four aspects included in the initial version of our model were covered, namely purpose , prototype scope , prototype use , and exploration strategy . When categorising the prototyping instances described in the interviews of the multi-case study, a fifth aspect emerged, namely the kind of prototype media used, e.g. sketch, mock-up, or source code software.

The systematic map on which our model is based consists of thirty-three primary studies and represents a wide set of papers from the past thirty years from the areas of human factors and design, requirements engineering, and agile software development (RQ2). We observe an increase in publication rate during the past decade and of empirically based research (RQ3).

7.1 The aspects of prototyping (RQ1 and RQ4)

7.1.1 purpose of prototyping.

The purpose of prototyping can vary and may consist of a combination of reasons. At the heart of prototyping, lies exploration of the solution space through experimenting with ideas, gathering feedback, and iteratively detailing, validating, and communicating product requirements. Within agile, prototyping is a known RE practice where requirements are gradually defined, validated, and communicated through prototyping as part of the incremental development process (Ramesh et al. 2010 ). With a prototype, new requirements can be elicited through exploration & learning and validated by testing business viability, market desirability, technical feasibility, or usability.

Prototyping also provides a powerful tool for communicating with customers and for creating a good communication climate within a development team (Dow et al. 2011 ) by showing rather than just telling. In addition, for startups prototypes play a vital role in sales & marketing and in obtaining funding for their venture (Bjarnason 2021 ). However, there is also a risk that prototypes can convey an inaccurate perception of development status and create unrealistic expectations (Lichter et al. 1994 ). This risk should be considered when prototyping to validate product-market fit to avoid making unrealistic business decisions regarding budget and development plans (Ciriello et al. 2017 ; Zink et al. 2017 ).

Prototyping can also support exploring the problem at hand. An additional purpose of prototyping is to specify requirements and to act as a requirements specification, as was mentioned by several of our case companies. This is a topic that requires further research to understand in what contexts a prototype can be used as a specification and what is required of a prototype to fulfil this purpose.

7.1.2 Prototype media

The kind of prototype media used affects the cost of constructing it, but also the learnings and benefits that can obtained from using it. Simpler and cheaper media such as paper or interviewing, or computer-based mock-ups can be used with good benefits, especially in the early stages of ideation and product design and development, such as in software startups (Nguyen-Duc et al. 2017 ). However, our previous research indicates that some startups tend to prefer and strive to use source code software for prototyping since they believe this enables them to demonstrate ability and build trust with customers and investors (Bjarnason 2021 ). This preference for certain types of prototype media is often due to the skills and previous experiences of prototyping technologies and tools of those involved in the startup, which play an important role when selecting the media used for prototyping (Gupta et al. 2021 ; Nguyen-Duc et al. 2017 ). This connection between skill set and prototype media also found in our interview material. For example, while UX designers can quickly produce mock-ups using tools, e.g. Figma, software developers often prefer sketching and exploring their ideas directly in source code software.

While we advocate considering current experience and competence, we also encourage practitioners to consider other factors when selecting the kind of media to use for prototyping, such as what kind of feedback that is sought and the risks involved in using source code software for prototyping. While choosing source code as the prototype media enables quickly getting started with actual development, this also comes with risks related to product quality and to cementing solution ideas too early on. While there may be stakeholders, e.g. sponsors, that are eager to quickly move on to realising and delivering a novel idea, moving too quickly to production source code comes with a risk of having to cancel the project later on (Ciriello et al. 2017 ). For these reasons, throw-away prototyping is often advocated since companies are then forced to separate between identifying the ‘right’ requirements (through prototyping) and implementing the requirements ‘right’.

7.1.3 Scope of a prototype

The prototype scope also affects the cost of prototyping and the type of feedback that can be obtained. Thus, the scope should be selected to match the intended purposes of the prototyping effort. The breadth of a prototype and its degree of refinement for functionality, visual appearance, interactive behaviour, and data realism, can be varied. For example, a high-quality mock-up that covers all the features of a future system with toy examples, has a broad prototype scope covering all major features but with a low degree of functional refinement of these, but with a high degree of refinement for their visual appearance and interactive behaviour, while it provides a low degree of refinement for realistic data (only toy examples). The two dimensions of breadth and refinement (in general) have been described in previous literature using the terms horizontal versus vertical prototyping (Budde and Zullighoven 1990 ), where horizontal prototyping explores the breadth and a vertical prototyping the refinement of the scope of the software being development. Compared to previous literature, our model also distinguishes between the type of facet that is refined, e.g. visuals or functionality, and provides a more fine-grained terminology for describing the scope of a prototype.

While our focus group participants actively related to three of the facets of refinement, namely functionality, visual appearance and interactive behaviour, they did not mention data richness. However, five of our interviewees in the startups could relate to the importance of using realistic data, mainly to convey realistic scenarios and trust to customers, but also for validating error cases connected to missing or badly formatted data. We interpret the omission of mentioning this facet at the focus group and among the remaining seven interviewees as an indication of low awareness of the importance of the facet of data for prototyping. Since it is also the facet with the least number of supporting references in our mapping study, we interpret this as an indication that further knowledge and research on the role of data richness in prototyping is needed. For example, to what degree does the use of realistic data, e.g., in a demonstration, affect what feedback that can be obtained?

Furthermore, we note that studies report contradicting results on the relationship between prototype scope and the feedback that can be obtained, and thus the purpose that can be fulfilled. For example, for usability testing, one study shows no significant difference in feedback for prototyping with a low versus a high degree of refinement (Sefelin et al. 2003 ), while other studies suggest that the most cost-effective scope is either a mix of low and high refinement (Liu and Khooshabeh 2003 ; McCurdy et al. 2006 ), or to use simple prototypes with a low degree of refinement (Yasar 2007 ). These conflicting findings indicate that the matter is complex, and that further empirical research is needed to identify the relevant factors of prototyping practice and the environment, and the relations between these.

7.1.4 Use of a prototype

The aspect of prototype use covers with whom the prototype is presented and reviewed (internal, external, or with family-friends-and-foes), how (with or without user interaction, and based on scenarios or more freely), and in which environment this takes place, e.g., in a lab setting or in the actual environment in which the actual product is to be used. At our initial case company, prototypes are frequently used internally to try out new ideas. Our focus group participants described using prototypes with customers either through a demonstration without any direct user interaction, or as free testing of a beta version of the software. In general, our startup companies describe using prototypes primarily internally in the early stages of their business venture, and gradually extending the use to family-friends-and-foes and then to external stakeholders such as potential customers and investors.

Direct interaction with prototypes is common for internal use and when the prototype is slightly more refined. Several startups also describe using scenarios both when demonstrating a prototype and when asking users to interact with it as a means to enhance understandability and communicating how their product solution can address user problems. Who and how a prototype is used affects the type of knowledge that can be obtained through the interactions that takes place both with the actual prototype and between the people involved in, e.g. a prototype demonstration.

The obtained feedback can be steered to specific aspects by structuring the prototype demo sessions. For example, the focus group participants described using solution-agnostic questions to focus a prototype demonstration on the problem description, rather than on details in the solution. The interactions around prototypes relate to the cognitive abilities (Cafer and Misra 2009 ) and communication skills (Ciriello et al. 2017 ) of those involved and is a complex and interesting area for further research. For example, to investigate how to optimise the feedback obtained from users through communication techniques such as storytelling, and if a smaller, less refined, and thus cheaper, prototype scope can be compensated for by boosting the prototype use through designing extensive and highly realistic usage scenarios.

The environment in which a prototype is used is another facet of use that affects the feedback that can be obtained. Previous research describes the impact of the physical environment (Lichter et al. 1994 ). We suggest that the digital environment also may play an important role here and is a facet to consider for use of a prototype. In our previous research on digital work environments, we have identified systems interplay and work interplay as two important factors to consider in RE for systems and tools intended to be used in the work place, i.e. the interaction with other systems and with current work practices (Håkansson and Bjarnason 2020 ). Further research is needed to investigate in which situations a prototype should be used in the targeted digital and physical environment, and in which situations a lab or meeting setting is sufficient to gain the knowledge needed at a specific stage in the development process and for a specific product domain, type of feature, and prototyping purpose.

7.1.5 The exploration strategy

The strategy used to explore the solution space affects which prototyping instances that are pursued and how resources are utilised. Previous research identifies four main strategies, namely point-based, parallel, optimisation and flexible exploration (Tronvoll et al. 2017 ). While all these strategies are relevant to prototyping, we found that when discussing exploration strategy with practitioners a different set of facets is more suitable. This set of facets can also be used to describe all four strategies included in the first version of the model. For that reason, we modify our model to cover single or parallel exploration where multiple solution paths are explored at once, the size and the focus of the iteration , e.g. business, product, feature, or optimisation level.

Exploring multiple options in parallel enables optimising, e.g. a detail in the design, or different parameters in a revenue model. While this approach requires more resources, it also allows for delaying decisions on alternative requirement options until more knowledge has been obtained for these. Exploring one solution path or quality aspect at a time through prototyping allows for freely selecting solution options based on current knowledge and as requirements changes. Single option exploration is the most common approach among our startups since it is more cost-effective. The interviewees that did mention using a parallel exploration approach tended to have a background in user-interaction design. This is similar to our focus group participants who described only using single exploration in later development stages, while preferring a strategy of parallel exploration in early stages to keep an open mind to alternative solutions and avoid getting locked into one solution option at an early stage. This corresponds to previous research that found that parallel exploration stimulates innovation (Dow et al. 2011 ).

We note that previous research identifies a connection between the length of an iteration and the number and quality of solution options, or requirement possibilities, that are obtained (Jaskiewicz and Helm 2018 ). This highlights an interesting avenue for further research in considering the perspective of time in relation to prototyping scope and considering the costs and benefits of taking multiple short prototyping steps covering only a few requirements for each step, compared to performing fewer iterations on a larger set of requirements for each step.

Furthermore, we note a parallel between prototyping in small optimising iterations with how to manage quality (non-functional) requirements (Berntsson Svensson and Regnell 2015 ). In both cases, the outcome focuses on evaluating against a range of values, as opposed to a simple pass/fail, and the increased complexity of needing to considering influencing factors (Tronvoll et al. 2017 ). This complexity in managing quality requirements has also been observed in the context of another agile RE practice, namely the one of using test cases as requirements (Bjarnason et al. 2016 ). It would be interesting to investigate if this is due to the same underlying characteristics of quality requirements, or if there are other factors in common between prototyping and testing, and, if so, how these two practices relate and can be aligned.

7.2 Threats to validity

We discuss the validity of our study and the presented model in view of descriptive and theoretical validity, and generalizability.

Interpretative and descriptive validity concern how reasonable the conclusions are given the data and the extent to which these are objectively described. We judge that both these aspects are high for our study. Several steps were taken to mitigate the risks of misunderstanding and misinterpreting the literature on which our model is based, the focus group participants, and the interviewees. The primary studies were analysed twice, first as part of the initial design by the 2 nd and 3 rd author who calibrated their views, and then independently by the 1 st author, as part of the initial validation (after the focus group). We interpret the fact that this re-analysis only led to minor modifications and improvements of the presented model (see Section 3.3 ) as an indication of high descriptive validity although further research is required to further strengthen the evidence for our model. The risk of misinterpreting the focus group participants was mitigated using the same kind of independent re-analysis of the transcript. In addition, the results from the literature study and the focus group were presented at the case company, who also reviewed an earlier version of this paper describing the first version of our model without raising any concerns about misinterpretation. Furthermore, the risk of the (single) first researcher mis-interpreting the interview material was partly mitigated by asking the interviewees to read through the summarising memos with selected quotes from the material and the set of identified prototyping instances. The feedback from the interviewees mainly concerned additions, e.g. mentioning more prototyping instances, which were then incorporated in the paper.

Theoretical validity is determined by our ability to capture what was intended. While we believe our model provides a good representation of the existing research on prototyping methodology, there is a risk that we have missed relevant previous research and that there are additional primary studies relevant to the topic of our systematic mapping study. This remains an open threat that could be addressed in future research through triangulating the search results, e.g. through snowballing. Furthermore, there is also a risk that our model does not fully align with current prototyping practices in industry, even though we have taken initial steps to explore and validate this. Thus, there is a risk that our model is not complete and that there are additional aspects of prototyping, and in particular additional facets, that should be included in our model. This is especially relevant to the aspect that were modified in this second version of the model, primarily i.e. Prototype media , Prototype use , and Exploration strategy . While these modified aspects were used in our analysis of the interview material, they were not presented to the interviewees. Thus, further research, in particular of the practical applicability of our model, is needed to strengthen the completeness of our model, e.g. through case studies and other empirical investigations of industrial prototyping practices. Furthermore, our research indicates a number of potential relationships within the entities of the model, e.g. between prototype scope and prototyping purpose, that appear to affect the learnings that can be obtained. Exploring these relationships, e.g. through case studies, quasi-controlled experiments, and extended literature reviews, poses an interesting area for further research that can provide insights into how to optimise prototyping practices from a cost–benefit perspective.

Generalizability of our model beyond our case companies and the area of RE is believed to be medium. Since our model is based on previous research within HF, RE and agile software development, we believe that our model may be valid and relevant beyond the case companies involved in our study. However, further research is required to validate this. In particular, prototyping practices for large and established software product companies need to be investigated since their prototyping practice may differ greatly from those of the smaller and newer companies investigated so far in our research. Furthermore, additional research is needed to understand the lack of empirical data in our case studies on some of the aspects and facets of our model, in particular exploration strategy and quality improvement. The lack of evidence for these in our case studies may be due to contextual difference in how the practice of prototyping is used, but may also be an indication of areas for which increased insight may support practitioners in improving their prototyping practices. For these reasons, we suggest further case studies where our prototyping model is applied to additional companies and organisational contexts to further improve generalizability.

8 Conclusions and future work

While there is a host of research on prototyping within user interaction design, software engineering and agile development, we find only some research on the use of prototyping as a requirement engineering (RE) practice. Also, most of the research on the methodological aspects of the practice of prototyping concerns the scope of a prototype (e.g. horizontal vs. vertical, hi- vs lo-fidelity, paper prototyping, sketching, mockups etc.) rather than considering the overall practice of prototyping (that also includes how and with whom a prototype is used and for what purposes). To address this lack, we have designed a model of prototyping. Our prototyping aspects model (PAM) is based on a systematic mapping study of previous research and has been iteratively validated and improved upon through a focus group at one case company and through interviews at eleven startup companies (RQ4). We have identified five main aspects of prototyping (RQ1) that are included in our model, namely the purpose of prototyping, the scope of a prototype, the prototype media, the method of prototype use , and the exploration strategy used to explore the solution space. In this paper, we provide a description of each aspect and their more detailed facets based on previous research and on empirical data from our case studies. We conclude that research on prototyping methodology has mainly been performed within software engineering (in general) and within human factors & design (RQ2), and that there is roughly the same amount of in-vivo and in-vitro research on this topic (RQ3), although there appears to be an increase of in-vivo research during the past decade.

We believe that our model can support agile development teams in reflecting on their prototyping practices and in making conscious choices regarding how to explore the solution space in an effective way considering their goals and resources. Practitioners are encouraged to consider the following:

Purpose of prototyping : What is to be achieved with prototyping, in general and for a prototyping instance, e.g. learning or validation, communication or optimisation? Select the prototype scope and the method of prototype use to match the intended purpose based on existing knowledge from research and your own experience.

Prototype scope : To what extent does the prototype need to represent the final product to achieve the intended the purpose? Is a broad representation of all future product features needed, or is a narrower scope sufficient? What level of refinement is needed w.r.t. functionality, visual appearance, interactive behaviour, and data? Depending on the aim, focus on detailing a specific feature (narrow and refined prototype scope) or providing an overall system view (broad prototype scope). Balance the cost of a broader and more refined prototype scope against the possible benefits.

Prototype media : Given the purpose of the prototyping and the selected scope , consider what media that will yield the best learnings and benefits with the least amount of effort. If the purpose is to explore an early idea within the development team, consider using simpler media such as sketches either in paper or in digital form. If the purpose is to test the technical feasibility of a new component, this will likely be best achieved by prototyping in actual source code software . We encourage practitioners to consider simpler forms of media, such as paper sketches, interviewing, and videos that enable exploring ideas very early on in the design and development process. Also, consider the available resources and competences within your team and select the prototype media accordingly. If someone is skilled in producing computer-based mock-ups , this is a fast and cheap way to explore and validate product ideas involving user interaction. In contrast, it is quicker for an experienced software developer to validate, e.g. a new algorithm in a technically complex product through prototyping in source code software. Furthermore, consider the overall cost of development from a long-term perspective. In particular, if considering prototyping in source code software, also consider if and how this prototype is to be thrown out or carried on into subsequent development stages with the associated risks to product quality and cementing ideas too early on.

Prototype use : Which stakeholders and user categories can provide the feedback needed to fulfil the purpose? Should they interact directly with the prototype or is a pure demo sufficient? Should a scenarios-based approach be used to increase understandability? Design the review to align with the purpose; to focus on problem or on solution understanding, and on the relevant facets of scope. Consider the usage environment, both physical and digital, and adapt to yield the desired type of feedback, both regarding content and stakeholder representation.

Exploration strategy : How broad is our current potential solution space? Is the main focus currently on product or feature level? What sized changes are suitable to manage between iterations? If in the initial stages of development, consider using a parallel exploration strategy to avoid fixating on a single solution option too early on. Switch to a single exploration strategy as more certainty is gained.

The presented model poses a starting point for further research into prototyping in specific organisational contexts, e.g., for startups, and to explore the relationships between prototyping aspect. The impact of different factors can be studied, and different prototyping practices compared, by categorising prototyping instances using the five aspects of our model and by comparing to other contextual factors. Areas for future research on prototyping practices include the use of prototyping as a specification practice, the effect of realistic data, the interplay between how a prototype is used/reviewed and the obtained feedback, the communication around prototypes, and the influence of cognitive abilities. Through evidence-based guidelines and insight into prototyping, practitioners may be supported in selecting their prototyping practices from a cost–benefit perspective, and thus improve their abilities to effectively elicit, specify, validate, and communicate novel business ideas and requirements. We believe that effective prototyping can help software development organisations to optimise their use of resources to pinpoint and develop successful products.

Finally, since prototyping is used in several areas, such as user interaction design, requirements engineering, software design and development, the practice has the potential to bridge and integrate different development activities throughout the development life-cycle. As such, it is an interesting practice for further research, and in particular, to investigate how prototyping can facilitate a better integration and alignment between different software development activities such as requirements engineering, user interface and software design, implementation, and testing.

Data availability

The dataset for the systematic map of literature and the protocols for the focus group and the case study including interview guide are available on-line (Bjarnason 2021b ). Data collected during the focus group and the interviews are not publicly available due to reasons of confidentiality.

The focus group was performed prior to the reanalysis and thus on initial draft of PAM, as depicted in Fig. 1 .

The aspect of prototype media was not included in the model at this stage of our study and thus not covered in the focus group.

Direct quotes from focus group participants are noted within citations and italicized.

Direct quotes from interviewees are noted within citations and italicized.

Questions primarily for interviewees with the business perspective

Acosta, R.D., Burns, C.L., Rzepka, W.E. and Sidoran, J.L. (1994).A case study of applying rapid prototyping techniques in the Requirements Engineering Environment. Proceedings of IEEE International Conference on Requirements Engineering 66–73

Alves, C., Cunha, J. and Araujo, J. (2020). On the Pragmatics of Requirements Engineering Practices in a Startup Ecosystem. 311–321

Arano, T., Chang, C.K., Mongkolwat, P., Liu, Y. and Shu, X. (1993). An object-oriented prototyping approach to system development. Proc. of 17th Int. Computer Softw. and Applications Conf. 56–62

Batova, T., Clark, D. and Card, D. (2016). Challenges of lean customer discovery as invention. 2016 IEEE International Professional Communication Conference (IPCC) 1–5

Bellomo, S., Nord, R.L. and Ozkaya, I. (2013). Elaboration on an integrated architecture and requirement practice: Prototyping with quality attribute focus. 2nd Int Workshop on TwinPeaks 8–13

Berg V, Birkeland J, Nguyen-Duc A, Pappas IO, Jaccheri L (2018) Software startup engineering: A systematic mapping study. J Syst Softw 144(2018):255–274

Article Google Scholar

BerntssonSvensson R, Regnell B (2015) A Case Study Evaluation of the Guideline-Supported QUPER Model for Elicitation of Quality Requirements. REFSQ 15:230–246

Google Scholar

Bjarnason E, Hess A, BerntssonSvensson R, Regnell B, Doerr J (2014) Reflecting on Evidence-Based Timelines. IEEE Software 31(4):37–43. https://doi.org/10.1109/MS.2014.26

Bjarnason E, Unterkalmsteiner M, Borg M, Engström E (2016) A multi-case study of agile requirements engineering and the use of test cases as requirements. Inf Softw Technol 77(2016):61–79

Bjarnason E, Lang F, Mjöberg A (2021a) A Model of Software Prototyping based on a Systematic Map. ESEM 2021:1–11

Bjarnason, E., Lang, F., Mjöberg, A. (2021b) Data set for systematic mapping study on prototyping aspects and protocol for focus group and case study. https://serg.cs.lth.se/experiment-packages/pam/ . Accessed: 2023-08-14

Bjarnason, E. (2021). Prototyping Practices in Software Startups: Initial Case Study Results. 29th International Requirements Engineering Conference Workshops (REW) 206–211

Block, Z. and MacMillan, I.C. (1985) Milestones for Successful Venture Planning. Harvard Business Review, Sept 1985.

Bruegger, P., Lalanne, D., Lisowska, A. and Hirsbrunner, B. (2009). Tools for designing and prototyping activity-based pervasive application. Proc of 7th Int Conf on Mobile Comp and Multim 129–136

Budde R, Zullighoven H (1990) Prototyping revisited. Proc of COMPEURO 90:418–427

Cafer, F. and Misra, S. (2009). A cognitive requirement specification model. 24th Int Symp on Computer and Information Sciences 518–521

Chen Chen, Porter, A. and Purtilo, J. (1994) Tool support for tailored software prototyping. 3rd Symp on Assessm of Quality Softw Dev Tools 171–181

Ciriello, R.F., Richter, A. and Schwabe, G. (2017). When Prototyping Meets Storytelling: Practices and Malpractices in Innovating Software Firms. Proc 39th Int Conf on Softw Engineering in Practice Track 163–172.

Derboven J, De Roeck D, Verstraete M, Geerts D, Schneider-Barnes J, Luyten K (2010) Comparing user interaction with low and high fidelity prototypes of tabletop surfaces. Proc NordCHI 2010:148–157

Dow, S., Fortuna, J., Schwartz, D., Altringer, B., Schwartz, D. and Klemmer, S. (2011) Prototyping dynamics: sharing multiple designs improves exploration, group rapport, and results. Conf on HFs in Computing Systems 2807–2816

Fairley RE, Willshire MJ (2005) Iterative rework: the good, the bad, and the ugly. Computer. 38(9):34–41

Fern, D.A. and Donaldson, S.E. (1989). Tri-Cycle: a prototype methodology for advanced software development. Proc of 22nd HICSS 377–386

Giardino C, Unterkalmsteiner M, Paternoster N, Gorschek T, Abrahamsson P (2014) What Do We Know about Software Development in Startups? IEEE Software. 31(5):28–32

Giardino C, Bajwa SS, Wang X, Abrahamsson P (2015) Key Challenges in Early-Stage Software Startups. Agile Processes in Software Engineering and Extreme Programming 2015:52–63

Goldman N, Narayanaswamy K (1992) Software Evolution through Iterative Prototyping. Int Conf on Software Engineering 1992:158–172

Grevet, C. and Gilbert, E. (2015). Piggyback Prototyping. 33rd Annual ACM Conf on Human Factors in Computing Systems 4047–4056

Gupta V, Rubalcaba L, Gupta C (2021) Multimedia Prototyping for Early-Stage Startups Endurance: Stage for New Normal? IEEE MultiMedia. 28(4):107–116. https://doi.org/10.1109/MMUL.2021.3122539

Håkansson, E. and Bjarnason, E. (2020) Including Human Factors and Ergonomics in Requirements Engineering for Digital Work Environments. IEEE First International Workshop on Requirements Engineering for Well-Being, Aging, and Health (REWBAH) 57–66

Hakim, J. and Spitzer, T. (2000).Effective prototyping for usability. 18th Conf on Computer Documentation 47–54

Rex Hartson (2019). The UX Book. Morgan Kaufmann Publishers

Heisler, K.G., Tsai, W.T. and Ramamoorthy, C.V. (1989). Integrating the role of requirements specification into the process of prototyping: the protospec. 22nd Annual Hawaii Int Conf on System Sciences 348–357

Hendry DG, Mackenzie S, Kurth A, Spielberg F, Larkin J (2005) Evaluating paper prototypes on the street. CHI 05:1447–1450

Jaskiewicz, T. and van der Helm, A. (2018) Unlocking the Interactive Office: Concurrent Prototyping Approach. DIS 18 547–558

Käpyaho, M. and Kauppinen, M. (2015) Agile requirements engineering with prototyping: A case study. 23rd Int Requirements Engineering Conf 334–343

Karras, O., Unger-Windeler, C., Glauer, L. and Schneider, K. (2017). Video as a By-Product of Digital Prototyping: Capturing the Dynamic Aspect of Interaction. 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW) 118–124

Klotins E, Unterkalmsteiner M, Chatzipetrou P, Gorschek T, Prikladnicki R, Tripathi N, Pompermaier L (2019) A progression model of software engineering goals, challenges, and practices in start-ups. IEEE TOSEM 2019:1–1

Kordon F (1994) Proposal for a generic prototyping approach. ETFA 94:396–403

Lauesen S (2002) Software Requirements. Addison Wesley, Styles and Techniques

Lauesen, S. (2005). User interface design : a software engineering perspective . Pearson/Addison-Wesley

Lenarduzzi, V. and Taibi, D. MVP ( 2016) Explained: A Systematic Mapping Study on the Definitions of Minimal Viable Product. 42th Euromicro Conf on Softw Engineering and Adv Applications (SEAA) 112–119

Lichter H, Schneider-Hufschmidt M, Zullighoven H (1994) Prototyping in industrial software projects-bridging the gap between theory and practice. IEEE Trans on Software Engineering. 20(11):825–832

Lim, Y.-K., Stolterman, E. and Tenenberg, J. (2008). The anatomy of prototypes: Prototypes as filters, prototypes as manifestations of design ideas. ACM Trans on Computer-Human Interaction . 15, 2 7:1–7:27

Liu, L. and Khooshabeh, P (2003) Paper or interactive? : A study on prototyping techniques for ubiquitous computing environments. CHI ’03 Extended Abstracts on Human Factors in Computing Systems 1030–1031

McCurdy, M., Connors, C., Pyrzak, G., Kanefsky, B. and Vera, A. (2006).Breaking the fidelity barrier: an examination of our current characterization of prototypes and an example of a mixed-fidelity success. Proc of SIGCHI Conf on HFs in Computing Systems 1233–1242

Nguyen-Duc A, Wang X, Abrahamsson P (2017) What Influences the Speed of Prototyping? An Empirical Investigation of Twenty Software Startups. Agile Proc in Softw Eng 2017:20–36

Nielsen, J. (1993). Usability engineering . AP Professional

Olsen D (2015) The Lean Product Playbook - How to Innovate with Minimum Viable Products and Rapid Customer Feedback. Wiley

Book Google Scholar

Osterwalder A, Pigneur Y, Bernarda G, Smith A, Papadakos T (2014) Value Proposition Design. Wiley

Paternoster N, Giardino C, Unterkalmsteiner M, Gorschek T, Abrahamsson P (2014) Software development in startup companies: A systematic mapping study. Inf & Softw Techn. 56(10):1200–1218

Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: An update. Inf Softw Technol 64(2015):1–18

Rahman, A., Razek, A. and van Husen, C. (2017) Innovation by service prototyping design dimensions and attributes, key design aspects, and toolbox. Int Conf on Engin , Techn and Innovation (ICE/ITMC) 571–576

Raja, U.A. (2009). Empirical studies of requirements validation techniques. 2nd Int Conf on Computer Control & Comm ) 1–9

Ramesh, B., Cao, L. and Baskerville, R. (2010). Agile requirements engineering practices and challenges: an empirical study. information Systems Journal . 20 5

Ratcliff B (1988) Early and not-so-early prototyping-rationale and tool support. Proc COMPSAC 88(1988):127–134

Ries, E. (2011). The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Currency

Rudd J, Stern K, Isensee S (1996) Low vs. high-fidelity prototyping debate. Interactions. 3(1):76–85. https://doi.org/10.1145/223500.223514

Schneider K (1996) Prototypes as assets, not toys. Why and how to extract knowledge from prototypes. (Experience report). ICSE 96:522–531

Sefelin, R., Tscheligi, M. and Giller, V. (2003) Paper prototyping - what is it good for? a comparison of paper- and computer-based low-fidelity prototyping. CHI ’03 Extended Abstracts on HFs in Comp. Systems 778–779

Sergio, L.M. (2015) Why is important prototyping? in Human-Computer Interaction (course material), Universitat d’Alacant: http://desarrolloweb.dlsi.ua.es/cursos/2015/hci/why-is-important-prototyping . Accessed: 2023-08-14

Shepherd, D.A. and Gruber, M. (2020). The Lean Startup Framework: Closing the Academic–Practitioner Divide - Dean A. Shepherd, Marc Gruber,. Entrepreneurship Theory and Practice

Teece DJ (2010) Business Models, Business Strategy and Innovation. Long Range Planning. 43(2):172–194

Terho H, Suonsyrjä S, Systä K (2016) The Developers Dilemma: Perfect Product Development or Fast Business Validation? Product-Focused Software Process Improvement 2016:571–579

Toffolon, C. and Dakhli, S. (2008). An Iterative Meta-Lifecycle for Software Development, Evolution and Maintenance. 3rd Int Conference on Software Engineering Advances 284–289

Tripathi N, Klotins E, Prikladnicki R, Oivo M, Pompermaier LB, Kudakacheril AS, Unterkalmsteiner M, Liukkunen K, Gorschek T (2018) An anatomy of requirements engineering in software startups using multi-vocal literature and case survey. J Syst Softw 146(2018):130–151

Tronvoll SA, Elverum CW, Welo T (2017) Prototype Experiments: Strategies and Trade-offs. Procedia CIRP 2017:554–559

Tullis, T.S. (1990). High-fidelity prototyping throughout the design process. Proceedings of the Human Factors Sciety 34th Annual Meeting 266

Wiberg, M. and Stolterman, E. (2014). What makes a prototype novel? | Proceedings of the 8th Nordic Conference on Human-Computer Interaction: Fun, Fast, Foundational. Proc of 8th Nordic Conf on HCI 531–540

Wieringa R, Maiden N, Mead N, Rolland C (2006) Requirements engineering paper classification and evaluation criteria: a proposal and a discussion. Requirements Engineering. 11(1):102–107

Yasar, A.-U.-H. (2007) Enhancing experience prototyping by the help of mixed-fidelity prototypes. Proc of 4th Int Conf on Mobile Technology 468–473

Zainuddin, F.B. and Liu, S. (2012). An Approach to Low-fidelity Prototyping Based on SOFL Informal Specification. 2012 19th Asia-Pacific Software Engineering Conference 654–663

Zink, L., Hostetter, R., Böhmer, A.I., Lindemann, U. and Knoll, A. (2017).The use of prototypes within agile product development. 2017 Int. Conf. on Engin , Techn and Innovation (ICE/ITMC) 68–77

Download references

Acknowledgements

We thank the participants at Telavox and the interviewees at the startups for good collaboration, and for investing their time and engagement in this study. This work was partly funded by the Swedish strategic research environment ELLIIT.

Open access funding provided by Lund University. The research presented in this article was partly funded by the Swedish strategic research environment ELLIIT.

Author information

Authors and affiliations.

Department of Computer Science, Lund University, Lund, Sweden

Elizabeth Bjarnason, Franz Lang & Alexander Mjöberg

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth Bjarnason .

Ethics declarations

Conflicts of interests.

Bjarnason’s contribution to this work was funded by the Swedish strategic research environment ELLIIT.

She has no other financial or non-financial interests, or connection to any of the involved case companies, to disclose. Lang’s and Mjöberg’s initial contribution to this work (the initial mapping study) was part of their MSc project, which was performed at the case company Telavox during 2020. Since completing their MSc project, they have no financial or non-financial interests, or connections to Telavox or any of the other involved case companies, to disclose.

Additional information

Communicated by Maria Teresa Baldassarre, Markos Kalinowski.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on International Symposium on Empirical Software Engineering and Measurement (ESEM)

APPENDIX A: Focus Group Protocol

The following protocol was used to perform an initial validation of our model at a focus group with practitioners at our initial case company Telavox. The aim of the focus group was to discuss prototyping scenarios (categorised by the initial draft of our model) for five stages of RE with generic and stage-specific questions. The following questions were used for all stages:

• What is a good/bad outcome for this stage?

• How do you ensure a good outcome for this stage?

• What is clear/unclear with this stage?

• What knowledge is required for a good outcome of this stage?

• Which people should be involved in this stage?

1.1 Stage 1 – Concept exploration.

Purpose: exploration of problem domain and solutions

Scope: shallow functionality, low degree of refinement

Use: mainly internal usage

Strategy: parallel exploration

Domain- and product-level requirements are elicited by exploring ideas and solution space. Prototypes of shallow functional scope and low refinement are tested with low-cost methods such a paper prototyping. The main focus is on internal learning. Sharing knowledge externally is optional.

• Which are the best ways to brainstorm ideas for a new system? What are the pros and cons with these?

• Pros and cons of not having thought of a new idea prior to talking to a customer about a future system?

1.2 Stage 2 - Eliciting customer needs.

Purpose: Testing market desirability, Exploration

Scope: narrow and shallow functionality, low degree of refinement

Use: any review method with users/customers.

Strategy: flexible exploration

Domain-level requirements are elicited and market desirability tested. The aim is to understand customer needs with a focus on the role of the system from the users’ perspective. Simple prototypes of selected system concepts are designed and presented to users.

• In this stage, a simple paper prototype can be of use. What can you vs. can you not learn from such a prototype?

• It can feel difficult to present early prototypes for discussions. Do you agree, what pros and cons do you see with this?

1.3 Stage 3 – Identify system scope & requirements.

Purpose: Testing market desirability, Exploration (external), Communication (internal)

Use: any review method, internal and external use

Strategy: point-based exploration

Prototyping is used to identify system scope and requirements. A simple prototype with shallow functional scope and a low degree of refinement is used externally to pinpoint requirements that will satisfy customer needs, and internally to communicate and align regarding requirements.

• Requirements are identified in traditional, as well as in agile projects, but noted in different ways and forms. In traditional RE, requirements are documented in an SRS, often kept in a spreadsheet. Do you experience a need for such documentation in agile projects? How do you achieve this?

1.4 Stage 4 – Test and improve system scope & requirements.

Purpose: Usability testing, Test market desirability, Communication (internal & external)

Scope: broad and shallow functionality, low degree of refinement

Use: any review method, external and internal use

Product scope and requirements are communicated, and usability and market desirability is validated. Communication and alignment of requirements between customer and development, and within a project is facilitated by prototypes that act as requirements specifications. Simple prototypes (broad and shallow functionality with low refinement) represent the current understanding. User feedback is captured by demonstrations, scenario testing, or free testing. A flexible exploration strategy is used to develop prototypes based on feedback.

• How early on is it good to test prototypes?

1.5 Stage 5 – Confirm system scope & requirements.

Purpose: Communication, Validation of Market desirability and Usability, Optimisation, Incremental development

Scope: broad and mid/deep functionality, low/mid visual refinement

Use: any review method

Prototyping is used to communicate with customers and to agree on system scope and requirements. The prototype is broader and more refined than in previous stage, particular for functionality, and can be a true (throw-away) prototype or an early version of the system.

• When do you need to perform a more formal validation of the requirements and user-interface design for a product?

• What is the difference between performing a formal validation and having a colleague perform the validation?

APPENDIX B: Interview Guide for Multi-Case Study of Startups

The following interview guide was used with entrepreneurs in our multi-case study of software startup companies. The main aim of the interviews was to explore prototyping practices of software startups and contextual factors that may influence these. The interviews were also designed to validate the prototyping aspects identified in our theoretical model (PAM) and explore to the model’s usefulness in supporting practitioners to describing their prototyping practices.

1.1 Interview introduction – 10 min

1) Present the study (purpose and time frame), main researcher, policy for NDA & confidentiality. GDPR paper, recording etc

2) Interviewee presentation : current role, main area of expertise, #years at startup/in field, current and previous experience of startup ventures

1.2 Contextual characteristics [Business & Tech] – 10 min

4) Company/startup : company origin/history, age (years), size (employees, teams)

5) Product : domain, type of VP [SW-based product, content, service, experiences, user data Teece 2010 )]

5) [B] Footnote 5 Business model : customer type (market/bespoke, B2B), revenue model, channels

6) [B] Show model of start-up life-cycle maturity (Klotins et al. 2019 )

a) What stage and status are you currently in?

b) Describe current main focus and goals for

i) product development (ideation, building, variations)

ii) marketing/growth

iii) operations & customer support

c) What stages & status have you been through?

7) [B] Startup challenges & characteristics (based on (Berg et al. 2018 ) and (Giardino et al. 2015 ), grouped by Time & Resources, Business vs Technology focus, Organisation)

a) How does your startup relate to these, for each category?

1.3 RE practices – 10 min

For your current stage:

8) What are your main requirements sources : internal/external, Tech/Business focus?

9) Do you currently have any software development? Development model (agile, traditional, hybrid), Size (#engineers & teams)

10) How do you do handle the following? Techniques based on (Klotins et al. 2019 ) and (Lauesen 2002 ) as checklist

a) Elicitation

b) Validation

c) Communication of ideas & requirements (primarily externally to customers and sponsors/investors)

1.4 Prototyping – 25 min

11) What does prototyping mean to you? Simple sketches, mock-ups, MVPs?

12) Describe how you use prototyping, in what stages, for what purposes, with what scope and how.

13) Present prototyping aspects model. How do you relate to each aspect in your prototyping practices :

b) Scope of prototype

c) Use of prototype: review method, environment.

d) Strategy for handling uncertainties

14) How do you reason concerning the cost-benefit balance for prototyping?

15) Do you use any tools for prototyping?

16) Does your prototyping approach vary, if so how and why?

a) for different purposes, such as eliciting, validating, and communicating?

b) for different stakeholders?

c) due to different points-in time, e.g. as your start-up matures?

1.5 Future work – 5 min

17) What would you like to improve around prototyping in your startup?

18) What topics/areas/questions/problems within prototyping would you as a startup want research to address & investigate?

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Bjarnason, E., Lang, F. & Mjöberg, A. An empirically based model of software prototyping: a mapping study and a multi-case study. Empir Software Eng 28 , 115 (2023). https://doi.org/10.1007/s10664-023-10331-w

Download citation

Accepted : 19 April 2023

Published : 30 August 2023

DOI : https://doi.org/10.1007/s10664-023-10331-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Prototyping
Requirements engineering
Systematic mapping study

Find a journal
Publish with us
Track your research

Research Trends in Software Development Effort Estimation

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

Michiel Schreurs ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3 na1 ,
Supinya Piampongsant 1 , 2 , 3 na1 ,
Miguel Roncoroni ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3 na1 ,
Lloyd Cool ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
Beatriz Herrera-Malaver ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
Christophe Vanderaa ORCID: orcid.org/0000-0001-7443-5427 4 ,
Florian A. Theßeling 1 , 2 , 3 ,
Łukasz Kreft ORCID: orcid.org/0000-0001-7620-4657 5 ,
Alexander Botzki ORCID: orcid.org/0000-0001-6691-4233 5 ,
Philippe Malcorps 6 ,
Luk Daenen 6 ,
Tom Wenseleers ORCID: orcid.org/0000-0002-1434-861X 4 &
Kevin J. Verstrepen ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3

Nature Communications volume 15 , Article number: 2368 ( 2024 ) Cite this article

46k Accesses

805 Altmetric

Metrics details

Chemical engineering
Gas chromatography
Machine learning
Metabolomics
Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig. S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig. 1 , upper panel, Supplementary Data 1 and 2 , and Supplementary Fig. S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig. 1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data 1 , correlations between all chemical compounds are depicted in Supplementary Fig. S2 and correlation values can be found in Supplementary Data 2 . See Supplementary Data 4 for sensory panel assessments and Supplementary Data 5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig. S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data 3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p > 0.05), indicating good panel consistency (Supplementary Table S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig. 1 , bottom left panel and Supplementary Data 4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig. S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig. S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig. 2 , Supplementary Fig. S5 , Supplementary Data 6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data 6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig. S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig. 3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig. 3 and below).

RateBeer text mining results can be found in Supplementary Data 7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data 7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig. 3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table 1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2 = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table S3 and Supplementary Table S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig. 4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig. 4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig. 4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig. S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig. 4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig. S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig. S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig. S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig. S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2 = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig. S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig. S9 , Supplementary Table S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data 1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig. 5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig. 5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data 1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig. S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table S7 and Supplementary Table S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data 3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p < 0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA). Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article ADS CAS PubMed Google Scholar

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article CAS Google Scholar

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article CAS PubMed PubMed Central Google Scholar

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article CAS PubMed Google Scholar

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article Google Scholar

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article ADS PubMed PubMed Central Google Scholar

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article PubMed Google Scholar

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS Google Scholar

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article PubMed PubMed Central Google Scholar

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article MathSciNet Google Scholar

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article ADS CAS PubMed PubMed Central Google Scholar

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article ADS Google Scholar

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Gabriela Martinez Sanchez

Sr Research Software Engineer at Central Engineering

Follow on Twitter
Like on Facebook
Follow on LinkedIn
Subscribe on Youtube
Follow on Instagram
Subscribe to our RSS feed

Share this page:

Share on Twitter
Share on Facebook
Share on LinkedIn
Share on Reddit

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications

Aria is Your AI Research Assistant Powered by GPT Large Language Models

lifan0127/ai-research-assistant

Folders and files, repository files navigation, a.r.i.a. (aria) - your ai research assistant.

Aria is a Zotero plugin powered by Large Language Models (LLMs). A-R-I-A is the acronym of "AI Research Assistant" in reverse order.

The easist way to get started with Aria is to try one of the interactive prompts in the prompt library.

Use Drag-and-Drop to Reference Your Zotero Items and Collections

Autocompletion for creators (authors), tags, items and more, visual analysis (gpt-4 vision).

How to use Zotero area annotation to create a draggable area in PDF?

Save Chat as Notes and Annotations

Zotero and gpt requirements.

Currently, only Zotero 6 is supported. Compatibility with Zotero 7 has not been tested.
Aria requires the OpenAI GPT-4 model family. ( how can I access GPT-4? )
The visual analysis feature requires the preview access to the GPT-4 Vision model.

Installation

For a detailed walkthrough of the installation process, please check out: https://twitter.com/MushtaqBilalPhD/status/1735221900584865904 (credit: Mushtaq Bilal, PhD - Syddansk Universitet)

Download the latest release (.xpi file) from GitHub: https://github.com/lifan0127/ai-research-assistant/releases/latest
In Zotero select Tools from the top menu bar, and then click on Addons .
On the Add-ons Manager panel, click the gear icon at the top right corner and select Install Add-on From File
Select the .xpi file you just downloaded and click Open which will start the installation process.

Before using Aria, you need to provide an OpenAI API Key . Follow the in-app instruction to add a key and restart Zotero . ( screenshots )

After restart, you should see the activated Aria window (as shown above) and can start using it through conversations.

Preferences

Aria is configurable through Edit > Preferences > Aria. Please note that some changes require Zotero restart.

Model Selection : Choose between the base GPT-4 model and the new GPT-4 Turbo model (Preview).
Zoom Level : Adjust the zoom level to fit your screen resolution
Keyboard shortcut : Change the keyboard shortcut combination to better fit your workflow.

Aria can perform automatic update when internet access is available. To check for available update, select Tools from the top menu bar, and then click on Addons .
To manually update ARIA, click More under Aria and then click the gear icon at the top right corner. Select Check for Updates . ( screenshots )

Limitations

The following are known limitations based on user feedback.

Currently Aria can query your Zotero library through the Zotero search API. The ability to query the Zotero SQLite database for document count and other metrics will be delivered in a future release.
Aria has limited awareness of your Zotero application state (selected item, current tab, highlighted text). However, you can use the drag-n-drop and the autocompeltion features to provide such context within your message.

Troubleshooting

Interaction with Zotero, in an open conversational manner and through a probabilistic model, can lead to many different, often unexpected outcomes. If you experience any error, please create an GitHub issue with a screenshot of the error message from your Aria chat window. Thank you!

"Agent stopped due to max iterations": For certain questions, the bot will make multiple API calls iteratively for response synthesis. Sometimes, it may fail to produce an answer before reaching the max iterations.

Aria tab not in Preferences panel: You may choose the Advanced tab in Preferences and open the Configuration Editor Under Advanced Configuration. From there, please search for "aria" and then double-click on the "extensions.zotero.aria.OPENAI_API_KEY" entry to add your OpenAI API Key.

Development

Refer to the Zotero Plugin Development guide to find instructions on how to setup the plugin in your local environment.

You can now submit feedback and share your chat session to help improve Aria. Let's make Aria better together!

Releases 29

JavaScript 86.7%
TypeScript 12.8%

Exclusive: Software industry calls for more UK government support

Illustration shows a laptop with binary codes displayed in front of the UK flag

The Technology Roundup newsletter brings the latest news and trends straight to your inbox. Sign up here.

Reporting by Martin Coulter, editing by Ed Osmond

Our Standards: The Thomson Reuters Trust Principles. , opens new tab

Hanwha Aerospace to spin off industrial solutions businesses from defence

South Korea's Hanwha Aerospace Co said on Friday it will spin off its industrial solutions and semiconductor equipment businesses from its flagship defence division.

A man points his light at the Milky Way during the peak of the Perseid meteor shower in Macedonia

IMAGES

research-paper
How to format research paper in Word
Research papers on software reliability models
(PDF) Writing good software engineering research papers
Research Paper On Software Reengineering : Re-engineering Software
(PDF) A review of software engineering research from a design science

VIDEO

Software Ecosystems: A New Research Agenda
22413 software engineering question paper 2022/23 TY-CO 4th sem MSBTE Board exam question paper
Final Term Paper Software Engineering II
The Paper + Software I Use. #musiccomposition #music #writingmusic
Software Engineering Question Paper (4th semester) 2022 ||@BCA with Mannu||
Principles of software engineering ques paper📃of B.tech(CSE) 6th sem (2019)

COMMENTS

Journal of Software Engineering Research and Development
They wanted to define values and basic principles for better software development. On top of being brought into focus, the ... Philipp Hohl, Jil Klünder, Arie van Bennekum, Ryan Lockard, James Gifford, Jürgen Münch, Michael Stupperich and Kurt Schneider. Journal of Software Engineering Research and Development 2018 6 :15.
Full article: The evolution of software development orchestration
There, scholars can research how existing development and deployment processes can be orchestrated using insights from research on software development governance and control. Another variation is the NoOps movement (Cockcroft, Citation 2012 ; Gualtieri, Citation 2011 ) that propagates total automation which would make operations obsolete.
Software development methodologies and practices in start‐ups
This knowledge gap motivated this study. We conducted systematic mapping study along with empirical research with a focus on software development methodologies and practices in the start-up context. 3 Research methodology 3.1 Systematic mapping study. We followed the established SMS guidelines and procedures of Kitchenham and Charters .
Journal of Systems and Software
For JSS's full CfP including information on Special Issues, Industry, Trends, and Journal First tracks please continue to read for further details. The Journal of Systems and Software publishes papers covering all aspects of software engineering. All articles should provide evidence to support …. View full aims & scope. $3670.
Current and Future Challenges of Software Engineering ...
This paper summarizes the challenges that the Software Engineering for Services and Applications (SE4SA) cluster is considering as relevant. Â© 2016 The Authors. ... From Distributed to Complete Computing. Keywords: Software; Services; Research Challenges; Collaboration; Software Development 1. Motivation ICT and, in particular, software is ...
DevOps and software quality: A systematic mapping
After a good quantity of research paper has been obtained the research paper are further classified into relevant categories. The filtered papers are screened thoroughly and the best papers are selected for the further research into the subject. ... RQ7: Does DevOps practices bridge development of software and software quality assurance? The ...
software engineering Latest Research Papers
End To End . Predictive Software. The paper examines the principles of the Predictive Software Engineering (PSE) framework. The authors examine how PSE enables custom software development companies to offer transparent services and products while staying within the intended budget and a guaranteed budget.
PDF Secure Software Development Methodologies: A Multivocal Literature Review
2 • our systematization covers practices integrated in the SDLC and auxiliary (non-technical) practices that support software security; • we systematize the existing evaluation approaches for secure software development methodologies; • we report on the discovered gaps that require more attention in the research community.
Software Engineering
Software Engineering. At Google, we pride ourselves on our ability to develop and launch new products and features at a very fast pace. This is made possible in part by our world-class engineers, but our approach to software development enables us to balance speed and quality, and is integral to our success. Our obsession for speed and scale is ...
(PDF) A review of software engineering research from a design science
Design science is recognized as a pragmatic research paradigm, addressing this and other characteristics of applied and prescriptive research. Applying the design science lens to software ...
Full article: Design and management of software development projects
We usedsoftware development project data fromLi (Citation 2008),which the authors used in anSD model for their research.Because our study is also focused on project development with SD modelling, we found this data relevant tovalidate our decision-making model. This dataset contains project information such asthe number of tasks in the project ...
PDF The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
This paper studies the productivity effects of AI tools on software development. We present a controlled trial of GitHub Copilot, an AI pair programmer that suggests code and entire func-tions in real time based on context. GitHub Copilot is powered by OpenAI's generative AI model, Codex [Chen et al., 2021].
An empirically based model of software prototyping: a ...
The research type for each paper was classified according to the categories by Wieringa et al. ... In contrast, it is quicker for an experienced software developer to validate, e.g. a new algorithm in a technically complex product through prototyping in source code software. Furthermore, consider the overall cost of development from a long-term ...
Applying and Researching DevOps: A Tertiary Study
Abstract: DevOps is an emerging software development methodology, that differs from more traditional approaches due to the closer involvement of the customer and the adoption of " continuous-*" (e.g., integration, deployment, delivery, etc.) practices.The vast research on DevOps (including numerous secondary studies) published in a short timeframe, and the diversity of the authors ...
(PDF) Research Process on Software Development Model
The classic Software Development Life Cycle (SDLC) is composed of four stages (i) planning; (ii) analysis; (iii) design, and (iv) implementation [14]. From this, we can note that several different ...
(PDF) AGILE SOFTWARE DEVELOPMENT
Hence, it becomes necessary to use agile software development methodology in todays' fast-paced revolutionizing software industry. This paper discusses the important subtopics of Agile Software ...
Why science needs more research software engineers
Paul Richmond is a research software engineer in the United Kingdom. Credit: Shelley Richmond. In March 2012, a group of like-minded software developers gathered at the University of Oxford, UK ...
Research Trends in Software Development Effort Estimation
Abstract: Developing a software project without the appropriate amount of effort would significantly impede and even fail the project, putting the software developer's quality at risk. Therefore, software development effort estimation (SDEE) is the most critical activity in software engineering. SDEE has seen extensive research, resulting in a massive rise in the literature in a relatively ...
Papers for Software Engineers
A curated list of papers that may be of interest to Software Engineering students or professionals. See the sources and selection criteria below. List of papers by topic. Von Neumann's First Computer Program. Knuth (1970). Computer History; Early Programming. The Education of a Computer. Hopper (1952).
Research Paper
In this new research launch from the Developer Success Lab, we share original empirical research with 3000+ software engineers and developers across 12+ industries engaged in the transition to generative AI-assisted software work.We bring a human-centered approach to pressing questions that engineering organizations are facing on the rapidly-changing possibilities of AI-assisted coding.
Research Paper
Understanding and maintaining long-term developer productivity is a top priority for leaders, engineering managers, and developers themselves. In this in-depth research report on Developer Thriving, we share our findings from three research studies looking at the factors that drive developers' productivity, visibility for engineering work ...
[PDF] Microsoft's research paper on "What Makes a Great Software Engineer"
Point 1 is the ability to be able to read 76 pages of densely packed data about "What makes a quality engineer". Let me save you from opening the PDF. Our analysis (Li et al., 2015) identified a diverse set of 54 attributes of great software engineers. At a high level, our informants described great software engineers as people who are ...
Exclusive: Behind the plot to break Nvidia's grip on AI by targeting
Nvidia's CUDA is a compelling piece of software on paper, as it is full-featured and is consistently growing both from Nvidia's contributions and the developer community.
When Will Developers Be Able To 'Just' Develop?
The problem is, while the central activity software developers would like to be focused on is 'cutting code' i.e. writing command line instructions to wield algorithmic logic and harness data ...
(PDF) Software Development Methodologies
A software development methodology is a way of managing a software development project. This. typically address issues like selecting features for inclusion in the current version, when software ...
Predicting and improving complex beer flavor through machine ...
The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine ...
Gabriela Martinez Sanchez at Microsoft Research
Sr Research Software Engineer at Central Engineering. Opens in a new tab. Research areas. Systems and networking; ... Contact Gabriela Martinez Sanchez. LinkedIn; Microsoft Research Lab - Redmond Microsoft Building 99, 14820 NE 36th Street, Redmond, Washington, 98052 USA Learn more Follow us: Follow on Twitter ...
GitHub
Fund open source developers The ReadME Project. GitHub community articles Repositories. Topics Trending ... zotero gpt research-paper ai-assistant large-language-models Resources. Readme License. AGPL-3.0 license Activity. Stars. ... Aria is Your AI Research Assistant Powered by GPT Large Language Models - lifan0127/ai-research-assistant.
Exclusive: Software industry calls for more UK government support
IRIS Software Group, founded in 1978, is the largest third-party tax filer with the British government, and is used by 21,000 of the country's accountancy practices.
(PDF) Agile Software Development with Artificial Intelligence System
This chapter provides a characterization and definition of agile software development, an overview of research through a summary of existing overview studies, an analysis of the research ...