
Тема выпуÑкной квалификационной работы: «Разработка ÑервиÑа Ð´Ð»Ñ Ð¾Ñ†ÐµÐ½ÐºÐ¸ ÑложноÑти научных Ñтатей и новоÑтных Ñтатей вузов РФ». Цель работы - Ñоздать ÐºÐ¾Ñ€Ð¿ÑƒÑ Ð´Ð°Ð½Ð½Ñ‹Ñ… Ð´Ð»Ñ ÑервиÑа оценки ÑложноÑти Ñта-тей. Ð”Ð»Ñ Ñтого поÑтавлены Ñледующие задачи: определение подходÑщих иÑточников, автоматизированный Ñбор данных, клаÑÑÐ¸Ñ„Ð¸ÐºÐ°Ñ†Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…, а также разработка прототипа ÑервиÑа. Были раÑÑмотрены различные иÑточники, Ð²ÐºÐ»ÑŽÑ‡Ð°Ñ "Политех Ðаука и ин-новации", "Ðрхив Проекты Ð›Ð°Ð±Ð¾Ñ€Ð°Ñ‚Ð¾Ñ€Ð¸Ñ ÐŸÐ¡ÐŸÐžÐ”" и новоÑтные Ñтатьи СПбПУ из раздела "Ðаука и Инновации". Ð”Ð»Ñ Ð°Ð²Ñ‚Ð¾Ð¼Ð°Ñ‚Ð¸Ð·Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ð¾Ð³Ð¾ Ñбора данных разработан процеÑÑ, включающий Ñборщики данных, Ñпециально разработанные Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ иÑточника. Сборщики извлекают нужные данные (заголовки, ÑÑылки, метаданные, текÑÑ‚ Ñтатей) и ÑохранÑÑŽÑ‚ их Ñтруктурированно. Реализована отказоуÑÑ‚Ð¾Ð¹Ñ‡Ð¸Ð²Ð°Ñ Ð°Ñ€Ñ…Ð¸Ñ‚ÐµÐºÑ‚ÑƒÑ€Ð° Ñ Ð¿Ð»Ð°Ð½Ð¸Ñ€Ð¾Ð²Ñ‰Ð¸ÐºÐ¾Ð¼ задач. Так-же решена задача клаÑÑификации данных, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¿Ð¾Ð¼Ð¾Ð³Ð°ÐµÑ‚ отделить Ñтатьи по их ÑложноÑти. Ðлгоритм оÑнован на ÑущеÑтвующих метриках и интегрирован в прототип ÑервиÑа. Результаты иÑÑÐ»ÐµÐ´Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¿Ð¾ÐºÐ°Ð·Ñ‹Ð²Ð°ÑŽÑ‚ уÑпешноÑть автоматизированно-го Ñбора данных. Собранные данные отражают оÑновную целевую аудиторию Ñтатей, ÑоÑтоÑщую из Ñтудентов Ñтарших курÑов. Таким образом, контент Ñер-виÑа больше подходит Ð´Ð»Ñ Ð»ÑŽÐ´ÐµÐ¹ Ñ Ð±Ð¾Ð»ÐµÐµ выÑоким уровнем компетенции. Важно обратить внимание авторов иÑточников на разработку материалов Ð´Ð»Ñ Ð½Ð°Ñ‡Ð¸Ð½Ð°ÑŽÑ‰Ð¸Ñ… Ñтудентов. Ðто ÑпоÑобÑтвует привлечению новых людей в науку и мо-жет заинтереÑовать молодежь в изучении научных диÑциплин. Результаты Ñбора данных ÑвлÑÑŽÑ‚ÑÑ Ñ†ÐµÐ½Ð½Ñ‹Ð¼ вкладом в дальнейшее развитие ÑервиÑа. Они будут иÑпользованы Ð´Ð»Ñ Ð°Ð½Ð°Ð»Ð¸Ð·Ð° и ÑƒÐ»ÑƒÑ‡ÑˆÐµÐ½Ð¸Ñ ÐºÐ°Ñ‡ÐµÑтва опре-Ð´ÐµÐ»ÐµÐ½Ð¸Ñ ÑƒÑ€Ð¾Ð²Ð½Ñ ÑложноÑти Ñтатей.
Topic of the graduation qualification work: «Development of a service for assessing the complexity of scientific articles and news articles of Russian universities» The aim of the work is to create a data corpus for the article complexity assessment service. To achieve this, the following tasks were set: identification of suitable sources, automated data collection, data classification, and development of a service prototype. Various sources were considered, including "Polytech Science and Innova-tion," "Archive Projects Laboratory PSPOD," and news articles from SPbPUs "Science and Innovation" section. For automated data collection, a process was developed, including data crawlers specifically designed for each source. The crawlers extract relevant data (headings, links, metadata, article text) and store them in a structured manner. A fault-tolerant architecture with a task scheduler has been implemented. The task of data classification has also been solved, which helps differentiate articles based on their complexity. The algorithm is based on existing metrics and integrated into the service prototype. The research results demonstrate the success of automated data collection. The collected data reflect the main target audience of the articles, consisting of senior students. Thus, the content of the service is more suitable for individuals with a higher level of competence. It is important to draw the attention of source authors to the development of materials for beginner students. This contributes to attracting new people to science and may interest young people in studying scientific disciplines. The data collection results are a valuable contribution to the further development of the service. They will be used for analysis and improvement of the qual-ity of article complexity determination.
ÑдобоÑиÑаемоÑÑÑ ÑекÑÑа, data collection, text complexity assessment, ÑÐ±Ð¾Ñ Ð´Ð°Ð½Ð½ÑÑ, data collection automation, оÑенка ÑложноÑÑи ÑекÑÑа, text readability, авÑомаÑизаÑÐ¸Ñ ÑбоÑа даннÑÑ
ÑдобоÑиÑаемоÑÑÑ ÑекÑÑа, data collection, text complexity assessment, ÑÐ±Ð¾Ñ Ð´Ð°Ð½Ð½ÑÑ, data collection automation, оÑенка ÑложноÑÑи ÑекÑÑа, text readability, авÑомаÑизаÑÐ¸Ñ ÑбоÑа даннÑÑ
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
