An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

Davide Fucci, Giuseppe Scanniello, Simone Romano, Martin Shepperd, Boyce Sigweni, Fernando Uyaguari, Burak Turhan, Natalia Juristo, Markku Oivo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

Original languageEnglish
Title of host publication10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
PublisherIEEE Computer Society
Volume08-09-September-2016
ISBN (Electronic)9781450344272
DOIs
Publication statusPublished - Sep 8 2016
Event10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016 - Ciudad Real, Spain
Duration: Sep 8 2016Sep 9 2016

Other

Other10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
CountrySpain
CityCiudad Real
Period9/8/169/9/16

Fingerprint

Productivity
Experiments
Students
Testing
Costs

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Cite this

Fucci, D., Scanniello, G., Romano, S., Shepperd, M., Sigweni, B., Uyaguari, F., ... Oivo, M. (2016). An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach. In 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016 (Vol. 08-09-September-2016). [a3] IEEE Computer Society. https://doi.org/10.1145/2961111.2962592
Fucci, Davide ; Scanniello, Giuseppe ; Romano, Simone ; Shepperd, Martin ; Sigweni, Boyce ; Uyaguari, Fernando ; Turhan, Burak ; Juristo, Natalia ; Oivo, Markku. / An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach. 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016. Vol. 08-09-September-2016 IEEE Computer Society, 2016.
@inproceedings{c2197c7791c3418781def320828b6103,
title = "An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach",
abstract = "Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.",
author = "Davide Fucci and Giuseppe Scanniello and Simone Romano and Martin Shepperd and Boyce Sigweni and Fernando Uyaguari and Burak Turhan and Natalia Juristo and Markku Oivo",
year = "2016",
month = "9",
day = "8",
doi = "10.1145/2961111.2962592",
language = "English",
volume = "08-09-September-2016",
booktitle = "10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016",
publisher = "IEEE Computer Society",
address = "United States",

}

Fucci, D, Scanniello, G, Romano, S, Shepperd, M, Sigweni, B, Uyaguari, F, Turhan, B, Juristo, N & Oivo, M 2016, An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach. in 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016. vol. 08-09-September-2016, a3, IEEE Computer Society, 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016, Ciudad Real, Spain, 9/8/16. https://doi.org/10.1145/2961111.2962592

An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach. / Fucci, Davide; Scanniello, Giuseppe; Romano, Simone; Shepperd, Martin; Sigweni, Boyce; Uyaguari, Fernando; Turhan, Burak; Juristo, Natalia; Oivo, Markku.

10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016. Vol. 08-09-September-2016 IEEE Computer Society, 2016. a3.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

AU - Fucci, Davide

AU - Scanniello, Giuseppe

AU - Romano, Simone

AU - Shepperd, Martin

AU - Sigweni, Boyce

AU - Uyaguari, Fernando

AU - Turhan, Burak

AU - Juristo, Natalia

AU - Oivo, Markku

PY - 2016/9/8

Y1 - 2016/9/8

N2 - Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

AB - Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

UR - http://www.scopus.com/inward/record.url?scp=84991666654&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991666654&partnerID=8YFLogxK

U2 - 10.1145/2961111.2962592

DO - 10.1145/2961111.2962592

M3 - Conference contribution

VL - 08-09-September-2016

BT - 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016

PB - IEEE Computer Society

ER -

Fucci D, Scanniello G, Romano S, Shepperd M, Sigweni B, Uyaguari F et al. An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach. In 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016. Vol. 08-09-September-2016. IEEE Computer Society. 2016. a3 https://doi.org/10.1145/2961111.2962592