
doi: 10.21236/ad0697403
Abstract : An information retrieval system was developed using technical word occurrences as a basis for classification. A set of words, designated a vocabulary, was selected from the middle range of frequency listing of words occurring in an experimental sample of 94 documents. The selection produced 115 non-function words with technical definition that did not allow ambiguous usage and they were assigned one of eighty concept numbers. The frequencies of these concepts served as data for factor analysis and 39 factors were extracted to represent the orthogonal axes of a geometric subject-content space. The locations of concepts in this space were used to locate the geometric position of documents according to their frequencies in the documents. The total of 194 documents was used in the measuring of system effectiveness. Requests formulated for a previous experiment using the same data base were processed. Precision and recall measures were calculated and on the average 66% precision and 80% recall were attained with one of three dissemination thresholds. Overall analysis of the results supports the theory that statistical data about word occurrences is sufficient to accurately represent documents relative to their subject content.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
