Downloads provided by UsageCounts
This upload contains datasets and pre-trained models used for the paper Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent. The code for easily loading these datasets and models will be made available here: http://github.com/nokia/codesearch Datasets There are three types of datasets: snippet collections (code snippets + natural language descriptions): so-ds-feb20, staqc-py-cleaned, conala-curated code search evaluation data (queries linked to relevant snippets of one of the snippet collections): so-ds-feb20-{valid|test}, staqc-py-raw-{valid|test}, conala-curated-0.5-test training data (datasets used to train code retrieval models): so-duplicates-pacs-train, so-python-question-titles-feb20 The staqc-py-cleaned snippet collection, and the conala-curated datasets were derived from existing corpora: staqc-py-cleaned was derived from the Python StaQC snippet collection. See https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset, LICENSE. conala-curated was derived from the conala corpus. See https://conala-corpus.github.io/ , LICENSE The other datasets were mined directly from a recent Stack Overflow dump (https://archive.org/details/stackexchange, LICENSE). Pre-trained models Each model can embed queries and (annotated) code snippets in the same space. The models are released under a BSD 3-Clause License. ncs-embedder-so-ds-feb20 ncs-embedder-staqc-py tnbow-embedder-so-ds-feb20 use-embedder-pacs ensemble-embedder-pacs
code search, machine learning, software reuse
code search, machine learning, software reuse
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 62 | |
| downloads | 142 |

Views provided by UsageCounts
Downloads provided by UsageCounts