
This dataset has been collected and annotated by Terms of Service; Didn't Read (ToS;DR), an independent project aimed at analyzing and summarizing the terms of service and privacy policies of various online services. ToS;DR helps users understand the legal agreements they accept when using online platforms by categorizing and evaluating specific cases related to these policies. The dataset includes structured information on individual cases, broader topics, specific services, detailed documents, and key points extracted from legal texts. Cases refer to individual legal cases or specific issues related to the terms of service or privacy policies of a particular online service. Each case typically focuses on a specific aspect of a service's terms, such as data collection, user rights, content ownership, or security practices. id, a unique id for each case (incremental). classification, one of those values (good, bad, neutral, blocker). score, values range between 0 to 100. title. description. topic_id, connecting the case with it's topic. created_at. updated_at. privacy_related, a flag indicate if it's related to privacy or not. docbot_regex, the regex expression used to check for specific words in the quoted text. Topics are general categories or themes that encompass various cases. They help organize and group similar cases together based on the type of issues they address. For example, "Data Collection" could be a topic that includes cases related to how a service collects and uses user data. id, a unique id for each topic (incremental). title. subtitle, small description. description. created_at. updated_at. Services represent specific online platforms, websites, or applications that have their own terms of service and privacy policies. id, a unique id for each service (incremental). name. url. created_at. updated_at. wikipedia, wikipedia url of the service. keywords. related, connecting the service with one of known similar services in the same field. slug. extracted from the name, small letters, no spaces and so on. is_comprehensively_reviewed, a flag indicate if it's comprehensively_reviewed or not. rating, overall rating for the service based on the all cases. status, indicate if the service is deleted or not (deleted, NaN). Points are individual statements or aspects within a case that highlight important information about a service's terms of service or privacy policy. These points can be positive (e.g., strong privacy protections) or negative (e.g., data sharing with third parties). id, a unique id for each point (incremental). rank, all values are zero. title, mostly it's similar to case title. source, url of the source. status, one of those values (approved, declined, pending, changes-requested, disputed, draft). analysis. created_at. updated_at. service_id, connecting the point with it's service. quote_text, quotted text from the source which contain information for this point. case_id, connecting the point with the related case. old_id, used for data migration. quote_start, index of first letter of the quotted text in the document. quote_end, index of last letter of the quotted text in the document. service_needs_rating_update, all values are False. document_id, connecting the point with the related document. annotation_ref. Documents refer to the original terms of service and privacy policies of the services that are being analyzed on TOSDR. These documents are the source of information for the cases, points, and ratings provided on the platform. TOSDR links to the actual documents, so users can review the full details if they choose to. id, a unique id for each document (incremental). name, name of document like privacy policy or cookies policy, etc. url, url of the document. xpath. text, the actual document. created_at. updated_at. service_id, connecting the document with it's service. reviewed, a flag indicate if the document has been reviewed or not. status, indicate if the service is deleted or not (deleted, NaN). crawler_server, the server used to crawl the document
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
