publication . Preprint . 2018

The Many Shapes of Archive-It

Jones, Shawn M.; Nwala, Alexander; Weigle, Michele C.; Nelson, Michael L.;
Open Access English
  • Published: 18 Jun 2018
Abstract
Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via their user-supplied metadata or via text analysis, but the metada...
Subjects
free text keywords: Computer Science - Digital Libraries, H.3.7, H.3.1
Download from
27 references, page 1 of 2

[1] Myriam Abramson and David Aha. 2012. What's in a URL? Genre Classification from URLs. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, Palo Alto, California.

[2] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2016. Characteristics of social media stories. International Journal on Digital Libraries 17 (2016), 239-256. https://doi.org/10.1007/s00799-016-0185-3 [OpenAIRE]

[3] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2016. Detecting of-topic pages within TimeMaps in Web archives. International Journal on Digital Libraries 17, 3 (2016), 203-221. https://doi.org/10.1007/s00799-016-0183-5 [OpenAIRE]

[4] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference (WebSci '17). ACM, Troy, New York, USA, 309-318. https://doi.org/ 10.1145/3091478.3091508 [OpenAIRE]

[5] Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, and Herbert Van de Sompel. 2014. Profiling web archive coverage for top-level domain and content language. International Journal on Digital Libraries 14, 3 (2014), 149-166. https://doi.org/ 10.1007/s00799-014-0118-y

[6] Ann Apps. 2013. Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. http://dublincore.org/documents/dc-citation-guidelines/.

[7] The National Archives. 2018. UK Government Web Archive - The National Archives. http://www.nationalarchives.gov.uk/webarchive/.

[8] William Y. Arms, Selcuk Aya, Pavel Dmitriev, Blazej Kot, Ruth Mitchell, and Lucia Walle. 2006. A Research Library Based on the Historical Collections of the Internet Archive. http://www.dlib.org/dlib/february06/arms/02arms.html. D-Lib Magazine 12, 2 (February 2006).

[9] Karl-Rainer Blumenthal. 2017. Access Archive-It's Wayback index with the CDX/C API. https://support.archive-it.org/hc/en-us/articles/ 115001790023-Access-Archive-It-s-Wayback-index-with-the-CDX-C-API.

[10] Karl-Rainer Blumenthal. 2017. Access web archives with the OAIPMH metadata feed. https://support.archive-it.org/hc/en-us/articles/ 210510506-Access-web-archives-with-the-OAI-PMH-metadata-feed.

[11] Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5-32. https: //doi.org/10.1023/A:1010933404324

[12] Daniel Chudnov. 2011. Saving the Web. https://www.questia.com/magazine/ 1P3-2538290041/saving-the-web. Computers in Libraries 31, 10 (December 2011), 30 - 32.

[13] Edgar Crook. 2009. Web archiving in a Web 2.0 world. The Electronic Library 27, 5 (2009), 831-836. https://doi.org/10.1108/02640470910998542

[14] Renata Gonçalves Curty and Ping Zhang. 2011. Social commerce: Looking back and forward. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1-10. https://doi.org/10.1002/meet.2011.14504801096 [OpenAIRE]

[15] Samantha Deutch and Sally McKay. 2016. The Future of Artist Files: Here Today, Gone Tomorrow. Art Documentation: Journal of the Art Libraries Society of North America 35, 1 (2016), 27-42. https://doi.org/10.1086/685975 [OpenAIRE]

27 references, page 1 of 2
Abstract
Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via their user-supplied metadata or via text analysis, but the metada...
Subjects
free text keywords: Computer Science - Digital Libraries, H.3.7, H.3.1
Download from
27 references, page 1 of 2

[1] Myriam Abramson and David Aha. 2012. What's in a URL? Genre Classification from URLs. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, Palo Alto, California.

[2] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2016. Characteristics of social media stories. International Journal on Digital Libraries 17 (2016), 239-256. https://doi.org/10.1007/s00799-016-0185-3 [OpenAIRE]

[3] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2016. Detecting of-topic pages within TimeMaps in Web archives. International Journal on Digital Libraries 17, 3 (2016), 203-221. https://doi.org/10.1007/s00799-016-0183-5 [OpenAIRE]

[4] Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference (WebSci '17). ACM, Troy, New York, USA, 309-318. https://doi.org/ 10.1145/3091478.3091508 [OpenAIRE]

[5] Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, and Herbert Van de Sompel. 2014. Profiling web archive coverage for top-level domain and content language. International Journal on Digital Libraries 14, 3 (2014), 149-166. https://doi.org/ 10.1007/s00799-014-0118-y

[6] Ann Apps. 2013. Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. http://dublincore.org/documents/dc-citation-guidelines/.

[7] The National Archives. 2018. UK Government Web Archive - The National Archives. http://www.nationalarchives.gov.uk/webarchive/.

[8] William Y. Arms, Selcuk Aya, Pavel Dmitriev, Blazej Kot, Ruth Mitchell, and Lucia Walle. 2006. A Research Library Based on the Historical Collections of the Internet Archive. http://www.dlib.org/dlib/february06/arms/02arms.html. D-Lib Magazine 12, 2 (February 2006).

[9] Karl-Rainer Blumenthal. 2017. Access Archive-It's Wayback index with the CDX/C API. https://support.archive-it.org/hc/en-us/articles/ 115001790023-Access-Archive-It-s-Wayback-index-with-the-CDX-C-API.

[10] Karl-Rainer Blumenthal. 2017. Access web archives with the OAIPMH metadata feed. https://support.archive-it.org/hc/en-us/articles/ 210510506-Access-web-archives-with-the-OAI-PMH-metadata-feed.

[11] Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5-32. https: //doi.org/10.1023/A:1010933404324

[12] Daniel Chudnov. 2011. Saving the Web. https://www.questia.com/magazine/ 1P3-2538290041/saving-the-web. Computers in Libraries 31, 10 (December 2011), 30 - 32.

[13] Edgar Crook. 2009. Web archiving in a Web 2.0 world. The Electronic Library 27, 5 (2009), 831-836. https://doi.org/10.1108/02640470910998542

[14] Renata Gonçalves Curty and Ping Zhang. 2011. Social commerce: Looking back and forward. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1-10. https://doi.org/10.1002/meet.2011.14504801096 [OpenAIRE]

[15] Samantha Deutch and Sally McKay. 2016. The Future of Artist Files: Here Today, Gone Tomorrow. Art Documentation: Journal of the Art Libraries Society of North America 35, 1 (2016), 27-42. https://doi.org/10.1086/685975 [OpenAIRE]

27 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue