
handle: 1842/23368
In 1999, the Brazilian National Council for Scientific and Technological Development (CNPq) launched the Lattes CV Platform, and all Brazilian HEIs oblige their researchers and staff to inform and update their publication metadata on the Platform. The Lattes CVs thus represent a rich source of metadata for Brazilian HEIs needing to identify which publications should be in their IR, populating the IR with this metadata in a concealed way until the full text file is ingested. Despite being publicly accessible on the web and belonging to HEIs, the automated extraction of data available on the Lattes Platform has been restricted by the recent addition of a CAPTCHA to the Platform. To overcome this, we developed a proxy server (available at https://github.com/nitmateriais/cnpqwsproxy) based on the OpenResty platform to share access to the Lattes SOAP services, and permits the HEI to manage its internal IP addresses that can access the services as well as guaranteeing that multiple apps from the same institution do not overload the CNPq servers by creating local data caches. These data are in XML format and are processed by scripts developed in Python, with the aid of the lxml library and the XPath standard. Publication duplicates (i.e. identical metadata published in different curricula pertaining to the different authors of the same paper) are detected by the DOI or from similar titles according to the Jaccard metric. In applying this solution, we were able to retrieve 1,166 curricula of researchers working at our HEI in 11 minutes, representing 573 MB of XML data composed of the metadata of 78,370 journal and Proceedings papers. In this way, the specific objective of gaining direct and official access to public metadata hosted on the Lattes Platform was attained.
institutional repository
institutional repository
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
