
arXiv: 2201.04588
Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalisation of hypotheses about determinants of successful software projects. It further highlights general pitfalls in big data analysis and shows that the use of bigger data sets does not automatically lead to more reliable insights.
Conference: ICSE 2022 - The 44th International Conference on Software Engineering, 25 pages, 4 figures, 3 tables
Software Engineering (cs.SE), Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Computers and Society, Physics - Physics and Society, Computers and Society (cs.CY), FOS: Physical sciences, Computer Science - Social and Information Networks, Physics and Society (physics.soc-ph)
Software Engineering (cs.SE), Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Computers and Society, Physics - Physics and Society, Computers and Society (cs.CY), FOS: Physical sciences, Computer Science - Social and Information Networks, Physics and Society (physics.soc-ph)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
