
Background While trying to integrate multiple data sets collected by different researchers, we noticed that the sample names were frequently entered inconsistently. Most of the variations appeared to involve punctuation, white space, or their absence, at the juncture between alphabetic and numeric portions of the cell line name. Results Reasoning that the variant names could be described in terms of mutations or deletions of character strings, we implemented a simple version of the Needleman-Wunsch global sequence alignment algorithm and applied it to the cell line names. All correct matches were found by this procedure. Incorrect matches only occured when a cell line was present in one data set but not in the other. The raw match scores tended to be substantially worse for the incorrect matches. Conclusions A simple application of the Needleman-Wunsch global sequence alignment algorithm provides a useful first pass at matching sample names from different data sets.
Short Report, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, RC254-282
Short Report, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, RC254-282
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
