
This set of Python scripts automates the standardization of lithology in borehole databases. It converts unstandardized terms (raw descriptions) to standardized terms (keywords). It is particularly useful for databases where no previous standardization of lithology keywords were used. The dictionaries are customizable. The scripts should be run in the order provided by the leading numbers (from 00 to 06). Example data are included in the files. 00_stack_horizontals.py: For rows in the database where multiple intervals were entered on a single line (horizontals), extracts those intervals, computes their depths, and stacks them into individual rows. 01_convert_lowercase.py: Converts all descriptions to lowercase to simplify the use of code in the following scripts. 02_reverse_adj_noun_pairs.py: Takes leading adjectives in adjective-noun pairs, such as "sandy clay", or "clayey sand", and places them after the noun, yielding "clay, sandy" or "sand, clayey". This simplifies the use of dictionaries in the following scripts. 03_extract_keywords.py: Uses a dictionary to look for terms and returns the place (as an integer) of that term in the string. This place is used to rank the primary and secondary lithology terms. 04_rank_sort_keywords.py: Ranks the terms by their place in the string, then replaces the term by the keyword from the dictionary. 05_tag_bedrock_tops.py: Optional script to find bedrock terms. Can be used as a first pass operation for mapping the bedrock surface. 06_convert_units.py: Converts U.S. standard to metric depth units.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
