
handle: 10919/114082
For this project, our team took 20,000 image samples from ETDs and annotated them using a Python package called PyLabel. PyLabel is an open-source Python library used to label PDFs. PyLabel can also take a trained dataset and use it for AI-aided annotations. We also created a pipeline in order to divide the dataset into equal pieces, where a user can select the number of samples they want to annotate. Then old sample data is cleared out and replaced with new sample data that contains classes with low accuracy. Finally, we saved the annotations as a YOLOv7 .txt file which is accumulated in order to retrain the model with 10,000 annotated images and finally with 20,000 annotated pages. With these annotated pages we conducted an experiment timing how long it takes to annotate the pages to see the improvement of the average time per page to annotate as the different models were trained. We concluded that the model trained with 10,000 pages was significantly faster than the original model.
ObjectDetectionReport.pdf - Final Report (PDF Version) ObjectDetectionReport.docx - Final Report (Word Version) ObjectDetectionPresentation.pdf - Final Presentation (PDF Version) ObjectDetectionPresentation.pptx - Final Presentation (PowerPoint Version)
Machine Learning, Labeling, Object Detection
Machine Learning, Labeling, Object Detection
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
