descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jul 2022Embargo end date: 01 Jan 2022 English Publisher:World Scientific Pub Co Pte LtdJournal:International Journal of Software Engineering and Knowledge Engineering, volume 32, pages 947-970 (issn: 0218-1940, eissn: 1793-6403,

Authors: Jialiang Lin; Yingmin Wang; Yao Yu; Yu Zhou; Yidong Chen; Xiaodong Shi;

doi: 10.1142/s0218194022500358 , 10.48550/arxiv.2209.14155

arXiv: http://arxiv.org/abs/2209.14155

Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

- Summary
- Subjects
- Related research
  (14)
- Metrics

Abstract

Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to the AI community. However, manual collection is a labor-intensive and time-consuming task. To address this issue, we propose a method to automatically identify papers with available source code and extract their source code repository URLs. With this method, we find that 20.5% of regular papers of 10 top AI conferences published from 2010 to 2019 are identified as papers with available source code and that 8.1% of these source code repositories are no longer accessible. We also create the XMU NLP Lab README Dataset, the largest dataset of labeled README files for source code document research. Through this dataset, we have discovered that quite a few README files have no installation instructions or usage tutorials provided. Further, a large-scale comprehensive statistical analysis is made for a general picture of the source code of AI conference papers. The proposed solution can also go beyond AI conference papers to analyze other scientific papers from both journals and conferences to shed light on more domains.

Related Organizations

Xiamen University
China (People's Republic of)

Keywords

Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Digital Libraries, Digital Libraries (cs.DL), Computation and Language (cs.CL), Machine Learning (cs.LG)

14 Research products, page 1 of 2

pwc software on GitHub
IsRelatedTo
pytorch_geometric software on GitHub
IsRelatedTo
google-research software on GitHub
IsRelatedTo
LightGBM software on GitHub
IsRelatedTo
Top-AI-Conferences-Paper-with-Code software on GitHub
IsRelatedTo
wordcloud2 software on GitHub
IsRelatedTo
magenta software on GitHub
IsRelatedTo
models software on GitHub
IsRelatedTo
fairseq software on GitHub
IsRelatedTo
bi-tree-lstm-crf software on GitHub
IsRelatedTo

chevron_left
1
2
chevron_right

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Average

Top 10%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering