
AI has the potential to significantly impact open access repository development landscape in various ways like enabling better search, content recommendation, identifying new patterns in scholarly content, and promoting openness in datasets and content. Large language models (LLMs) have emerged as crucial and widely used resources in the field of natural language processing, which is a subfield of artificial intelligence (AI) and shares common ground with machine learning (ML). LLMs allow computers to comprehend and produce text in a manner that resembles human communication. Our goal during the experiment was to create a conversation application that integrates OpenAI to query DSpace using natural language processing (NLP). We explored technologies such as LLMs, OpenAI API, LangChain, embeddings, and vector stores. LLMs are deep learning models trained on large datasets. The OpenAI API provides a cloud interface for accessing OpenAI's machine learning models. LangChain is an AI framework for language-based applications. Embeddings encode information in high-dimensional vector spaces. Vector stores are databases that store vector embeddings of non-numerical data. To create better responses, we used retrieval-augmented generation (RAG) to incorporate additional, real-time data from DSpace. This allows us to explore the most up-to-date data in DSpace.
Large language models (LLMs), LangChain, Retrieval-augmented generation (RAG), DSpace, OR2024, Natural language processing (NLP)
Large language models (LLMs), LangChain, Retrieval-augmented generation (RAG), DSpace, OR2024, Natural language processing (NLP)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
