The work in this article presents a methodology and coding examples to be used by those wanting to explore the content of a large corpus of digital publications stored online in PDF format and gain insight into their common content and changes over time. The method can ... View more
 L. Bornmann and R. Mutz, Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references, Journal of the Association for Information Science and Technology 66(11) (2015), 2215-2222. doi:10. 1002/asi.23329.
 S. Harnad and T. Brody, Comparing the impact of open access (OA) vs. non-OA articles in the same journals, D-lib Magazine 10(6) (2004).
 F. Loizides and S.A. Jones, Insights from over a decade of electronic publishing research, in: Positioning and Power in Academic Publishing: Players, Agents and Agendas, p. 119.
 J. Parker and E. van Teijlingen, The Research Excellence Framework (REF): Assessing the impact of social work research on society, Practice 24(1) (2012), 41-52. doi:10.1080/09503153.2011.647682.
 M.F. Porter, An algorithm for suffix stripping, Program 14(3) (1980), 130-137. doi:10.1108/eb046814.
 P. Willett, The Porter stemming algorithm: Then and now, Program 40(3) (2006), 219-223. doi:10.1108/ 00330330610681295.
 I.H. Witten, D. Bainbridge, G. Paynter and S. Boddie, Importing documents and metadata into digital libraries: Requirements analysis and an extensible architecture, in: International Conference on Theory and Practice of Digital Libraries, Springer, Berlin Heidelberg, 2002, pp. 390-405.