Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 9 versions
addClaim

Audio tagging of avian dawn chorus recordings in California, Oregon, and Washington

Authors: Matthew J. Weldy; Tom Denton; Abram B. Fleishman; Jaclyn Tolchin; Matthew Mckown; Robert S. Spaan; Zachary J. Ruff; +3 Authors

Audio tagging of avian dawn chorus recordings in California, Oregon, and Washington

Abstract

General Summary This acoustic data collection includes 1,575 5-minute soundscape recordings randomly selected from passive acoustic recordings made at 525 sites during 2022 on federally managed lands in western California, Oregon, and Washington, USA. We fully labeled 141 recordings (11.75 hrs) with 39,717 annotations for 118 sound types, including 58 avian species, two mammalian species, six aggregated biotic sounds, and eight non-biotic sound types. An additional 215 recordings were partially annotated with 1,466 annotations. The remaining unlabeled recordings have been included to facilitate novel research applications and methodological evaluations. Beyond the labeled soundscape recordings, we have included township and range identifications and 38 environmental covariates for each recording location. Data Collection Lesmeister et al. (2021) collected passive acoustic recordings during 2022 in support of long-term monitoring of federally threatened northern spotted owl (Strix occidentalis caurina) populations under the Northwest Forest Plan Effective Monitoring Program (U. S. Fish and Wildlife Service 1990, U. S. Department of Agriculture and U. S. Department of the Interior 1994). These data were collected at 643 hexagons that were randomly selected from a tessellation of 5 km2 hexagons covering the entire range of the northern spotted owl (Northern California, Oregon, Washington) under a selective constraint that hexagons contain ≥ 50 % forest-capable lands (def. forested lands or lands capable of developing closed-canopy forests) and be ≥ 25% federal ownership (Davis et al., 2011). Each hexagon was sampled by four Song Meter 4 (SM4) acoustic recording units (Wildlife Acoustics, Maynard, MA) deployed in a standardized spatial arrangement, such that recorders on a site were placed ≥ 500 m apart and were ≥ 200 m from the edge of the sampling hexagon boundary. Recorders were mounted to small trees (15 – 20 cm diameter at breast height) approximately 1.5 m above the ground and were placed on mid-to-upper slopes and ≥ 50 m from roads, trails, and streams. The SM4 devices each have two built-in omnidirectional microphones with a signal-to-noise ratio of 80 dB, typical at 1 kHz, and a recording bandwidth of 20 Hz – 48 kHz. Each device recorded ~11 hours of audio daily for six weeks from March to August at a sampling rate of 32 kHz. The daily recording schedule included a 4-hour window from two hours before sunrise to two hours after sunrise, a 4-hour window from one hour before sunset to 3 hours after sunset, and 10-minute recordings outside the two longer recording blocks at the start of every hour. Data Sampling The goal of this project was to develop a tagged audio dataset (hereafter project dataset) focused on the avian dawn chorus, which is an ecologically important period for the study of avian behavior (McNamara et al. 1987, Staicer et al. 1996, Zhang et al. 2015) and monitoring avian biodiversity (Bibby et al. 2000), but remains a challenging problem for acoustic classification systems (Duan et al. 2013, Stowell 2022). Passive acoustic monitoring on our sites occurs throughout the day. We filtered the full dataset to recordings collected between May and August during the hour immediately after sunrise. From the recordings meeting our filtering criteria, we randomly selected three 5-minute files from each site, which were assigned ordinal labels ‘A, ‘B,’ or ‘C.’ The final project dataset comprised 131.25 hours of acoustic data. Annotation Protocol We randomly selected 141 sites from the project dataset and fully annotated each recording at a 2-second resolution. We applied labels to each 2-second window of the selected recordings following a predefined sound phonology library (available in the ‘metadata.csv’ file), which concatenated the 2021 eBird taxonomy codes (Clements list; Clements et al. 2022) with standardized sonotype codes that incremented depending on the species repertoire (i.e., ‘call_1,’ ‘song_1,’ ‘drum_1’). For example, ‘herthr_song_1’ is the label for Hermit Thrush, song_1. Unknown signals were labeled ‘unknown,’ and clips with no biotic signals (or noise classes of interest documented in metadata.csv) were labeled ‘empty.’ Windows were labeled ‘complete’ and considered fully annotated when every signal was assigned an annotation. Files were deemed fully annotated when every 2-second window contained the ‘complete’ label. Environmental Covariates Sampling locations will not be published to afford protections for Federally Threatened or Endangered species which may occur on our sites. However, we provide the State, Township, and Range for each sampling location along with the site-specific values for 38 forest structure, topographic, and climatic environmental covariates developed by the Landscape Ecology, Modeling, Mapping, and Analysis group in the Pacific Northwest (https://lemma.forestry.oregonstate.edu/data; Ohmann and Gregory 2002). State, Township, and Range values are sufficient to explore geographic variation in species- or community-specific call and song phenology and the extracted environmental covariates may provide useful contextual information for novel machine-learning developments (Liu et al. 2018). Description of Data Format The fully annotated audio files can be accessed by downloading and extracting “annotated_recordings.zip.” Partially annotated and non-annotated audio files can be accessed by downloading and extracting “additional_recordings_part_1.zip” or “additional_recordings_part_2.zip.” Acoustic file names contain site and replicate indicators, such that file “Site_001_Rep_A.wav’ was recorded on site 1 and is the A replicate random draw from the available set of dawn chorus recordings. The site and replicate numbers link to additional recording information in “files.csv,” annotations in “annotations.csv” and “partial_annotations.csv,” as well as site and replicate specific environmental characteristics in “environmental_characteristics.csv.” Metadata describing sound classes and environmental characteristics can be found in “metadata.csv,” and “environmental_characteristics_metadata.csv.” Acknowledgments Acoustic data collection was funded and collected by the US Forest Service and the US Bureau of Land Management. Annotation work was funded by Google. We would also like to thank the many biologists that collected and processed the data compiled here. The use of trade or firm names in this publication is for reader information and does not imply endorsement by the U.S. Government of any product or service. Data Dictionaries files.csv column_name description site Site name replicate An ordinal label indicating the random draw label: ‘A,’ ‘B,’ or ‘C.’ recording_date Recording date and time formatted as “Year-Month-Day Hour:Minute:Second” annotated Categorical assignment describing whether a recording was completely annotated: ‘complete,’ ‘partial,’ or ‘not annotated.’ file Wav file name zip_file The zip file location of the file annotations.csv column_name description file Wav file name clip_complete Binary indicator for whether the clip was completely labeled start Start time of the 2-second clip in seconds end End time of the 2-second clip in seconds eBird_2021 2021 species identification eBird code label Sonotype label comprising a concatenation of the 2021 eBird taxonomy code and the sound type label partial_annotations.csv column_name description file Wav file name clip_complete Binary indicator for whether the clip was completely labeled. start Start time of the 2-second clip in seconds end End time of the 2-second clip in seconds label Sonotype label comprising a concatenation of the 2021 eBird taxonomy code and the sound type label. metadata.csv column_name description common_name The common name of the sound source. For avian species, the scientific name follows Clement’s taxonomy outlined in the 2021 eBird taxonomy. scientific_name The scientific name of the biotic sound source. For avian species, the scientific name follows Clement’s taxonomy outlined in the 2021 eBird taxonomy. eBird_2021 2021 eBird taxonomy species_code. sound Sound type label label Sonotype label comprising a concatenation of the 2021 eBird taxonomy code and the sound type label. description Biological and phonetic description of the target sound n_files Total number of audio files containing at least 1 of the target label n_annotations Total number of label-specific annotations in the fully annotated data. environmental_characteristics.csv column_name description site Site name replicate An ordinal label indicating whether the row describes a random sample ‘A,’ ‘B,’ or ‘C.’ state State location of survey site township_range Township and range identifier of the survey site. The township was data obtained from three sources: https://spatialdata.oregonexplorer.info/geoportal/search;fq=Cadastral, https://gis.data.ca.gov/datasets/cadoc::public-land-survey-system-plss-township-and-range/about, https://geo.wa.gov/datasets/fde7d46b0adf46b68f177d850ce85042/explore?location=47.804173%2C-122.122059%2C9.96 age_dom_2017 Basal area weighted stand age based on dominant and codominant trees ba_ge_3_2017 Basal area of live trees >= 2.5 cm dbh bac_ge_3_2017 Basal area of live conifers >= 2.5 cm dbh bah_ge_3_2017 Basal area of live hardwoods >= 2.5cm dbh bph_ge_3_crm_2017 Component Ratio Method biomass of all live trees >= 2.5 cm bphc_ge_3_crm_2017 Component Ratio Method biomass of all live conifers >= 2.5 cm bphh_ge_3_crm_2017 Component Ratio Method biomass of all live hardwoods >= 2.5 cm cancov_2017 Canopy cover of all live trees cancov_con_2017 Canopy cover of all conifers cancov_hdw_2017 Canopy cover of all hardwoods cancov_layers_2017 Number of tree canopy layers present conplba_2017 Conifer tree species with the plurality of basal area covcl_2017 Cover class based on cancov ddi_2017 Diameter diversity index fortypba_2017 Forest type, which describes the dominant tree species of current vegetation hdwplba_2017 Hardwood tree species with the plurality of basal area mndbhba_2017 Basal-area weighted mean diameter of all live trees mndbhba_con_2017 Basal-area weighted mean diameter of all live conifers mndbhba_hdw_2017 Basal-area weighted mean diameter of all live hardwoods qmd_dom_2017 The quadratic mean diameter of all dominant and codominant trees qmd_ht25_2017 Quadratic mean diameter in inches of trees whose heights are in the top 25% of all tree heights qmdc_dom_2017 The quadratic mean diameter of all dominant and codominant conifers qmdh_dom_2017 The quadratic mean diameter of all dominant and codominant hardwoods sbph_ge_25_2017 Biomass of snags >= 25 cm dbh and >= 2m tall sdi_reineke_2017 Reineke's stand density index sizecl_2017 Size class, based on QMD_DOM and CANCOV stndhgt_2017 Stand height, computed as the average height of all dominant and codominant trees stph_ge_25_2017 Stand height, computed as the average height of all dominant and codominant trees struccond_2017 Structural condition svph_ge_25_2017 Volume of snags >= 25 cm dbh and >= 2 m tall tph_ge_3_2017 The density of live trees >= 2.5 cm dbh tphc_ge_3_2017 The density of live conifers >= 2.5 cm dbh tphh_ge_3_2017 The density of live hardwoods >= 2.5 cm dbh treeplba_2017 Tree species with the plurality of basal area vegclass_2017 Vegetation class based on CANCOV, BAH_PROP, QMD_DOM vph_ge_3_2017 Vegetation class based on CANCOV, BAH_PROP, QMD_DOM vphc_ge_3_2017 The volume of live conifers >= 2.5 cm dbh vphh_ge_3_2017 The volume of live hardwoods >= 2.5 cm dbh environmental_characteristics_metadata.csv column_name description covariate Covariate name. type Value type of variable range The range of values extracted across our survey sites. The values in this cell represent the value minimum to the value maximum. description A description of the variable including a brief discussion of the methods used to create the variable. source Variable source citation.

{"references": ["Bibby, C. J. (2000). Bird census techniques (2nd ed.). Academic Press.", "Clements, J. F., T. S. Schulenberg, M. J. Iliff, S. M. Billerman, T. A. Fredericks, J. A. Gerbracht, D. Lepage, B. L. Sullivan, and C. L. Wood. 2021. The eBird/Clements checklist of Birds of the World: v2021. Downloaded from https://www.birds.cornell.edu/clementschecklist/download/", "Davis, R.J., Dugger, K.M., Mohoric, S., Evers, L., and Aney, W.C. (2011). Northwest Forest Plan\u2014the first 15 Years (1994-2008): status and trends of Northern Spotted Owl populations and habitat. (Gen. Tech. Rep. No. PNW-GTR-850). U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, Portland, OR", "Duan, S., Zhang, J., Roe, P., Wimmer, J., Dong, X., Truskinger, A., and Towsey, M. (2013). Timed Probabilistic Automaton: A bridge between Raven and Song Scope for automatic species recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 27(2), 1519\u20131524. https://doi.org/10.1609/aaai.v27i2.18993", "Landscape Ecology Modeling, Mapping, and Analysis (LEMMA) Team. (2020). Gradient Nearest Neighbor (GNN) raster dataset (version 2020.01). Modeled forest vegetation data using direct gradient analysis and nearest neighbor imputation. Retrieved from https://lemma.forestry.oregonstate.edu/data", "Lesmeister, D. B., Appel, C. L., Davis, R. J., Yackulic, C. B., and Ruff, Z. J. (2021). Simulating the effort necessary to detect changes in Northern Spotted Owl (Strix occidentalis caurina) populations using passive acoustic monitoring (Research Paper PNW-RP-618; p. 55). U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station.", "Liu, J., Zhang, Z., and Razavian, N. (2018). Deep EHR: Chronic disease prediction using medical Notes, in: Proceedings of the 3rd Machine Learning for Healthcare Conference, Proceedings of Machine Learning Research. Proceedings of Machine Learning Research, pp. 440\u2013464.", "McNamara, J. M., Mace, R. H., and Houston, A. I. (1987). Optimal daily routines of singing and foraging in a bird singing to attract a mate. Behavioral Ecology and Sociobiology, 20(6), 399\u2013405. https://doi.org/10.1007/BF00302982", "Ohmann, J. L., and Gregory, M. J. (2002). Predictive mapping of forest composition and structure with direct gradient analysis and nearest-neighbor imputation in coastal Oregon, U.S.A. Canadian Journal of Forest Research, 32, 725\u2013741.", "Staicer, C. A., Spector, D. A., and Horn, A. G. (1996). The dawn chorus and other diel patterns in acoustic signaling. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and evolution of acoustic communication in birds. Cornell University Press.", "Stowell, D. (2022) Computational bioacoustics with deep learning: a review and roadmap. Peerj, 10:e13152. DOI: 10.7717/peerj.13152", "U. S. Department of Agriculture, and U. S. Department of the Interior. (1994). Northwest Forest Plan - Record of Decision for amendments for Forest Service and Bureau of Land Management planning documents within the range of the Northern Spotted Owl.", "U. S. Fish and Wildlife Service. (1990). 50 CFR part 17 endangered and threatened wildlife and plants; determination of threatened status for northern spotted owl; final rule. Federal Register 55, 26114\u201326194.", "Zhang, V. Y., Celis-Murillo, A., and Ward, M. P. (2016). Conveying information with one song type: Changes in dawn song performance correspond to different female breeding stages. Bioacoustics, 25(1), 19\u201328. https://doi.org/10.1080/09524622.2015.1076348"]}

Related Organizations
Keywords

bird, avian, vocalization, annotated soundscapes, mammal, dawn chorus, forest ecology

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 142
    download downloads 40
  • 142
    views
    40
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
142
40
Related to Research communities
Italian National Biodiversity Future Center