Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

huggingface/datasets: 2.0.0

Authors: Quentin Lhoest; Albert Villanova del Moral; Patrick von Platen; Thomas Wolf; Mario Šaško; Yacine Jernite; Abhishek Thakur; +24 Authors

huggingface/datasets: 2.0.0

Abstract

🤗 Datasets 2.0.0 We're happy to announce that our new documentation is available at hf.co/docs/datasets ! Dataset Features Load a folder of images using the imagefolder dataset loader: Add imagefolder dataset by @nateraw in https://github.com/huggingface/datasets/pull/2830 Faster ImageFolder + add option to drop labels by @mariosasko in https://github.com/huggingface/datasets/pull/3887 Push your image and audio datasets on the Hugging Face Hub with push_to_hub: Add support for Audio and Image feature in push_to_hub by @mariosasko in https://github.com/huggingface/datasets/pull/3685 New processing methods for streaming datasets: Add IterableDataset.filter by @lhoestq in https://github.com/huggingface/datasets/pull/3826 Manipulate columns on IterableDataset (rename columns, cast, etc.) by @lhoestq in https://github.com/huggingface/datasets/pull/3862 Add the new methods to IterableDatasetDict by @lhoestq in https://github.com/huggingface/datasets/pull/3923 And more: Add more compression types for to_json by @bhavitvyamalik in https://github.com/huggingface/datasets/pull/3551 Multi-GPU support for FaissIndex by @rentruewang in https://github.com/huggingface/datasets/pull/3721 Breaking changes API changes for map and shuffle for datasets loaded in streaming mode: Align map when streaming: update instead of overwrite + add missing parameters by @lhoestq in https://github.com/huggingface/datasets/pull/3801 Align IterableDataset.shuffle with Dataset.shuffle by @lhoestq in https://github.com/huggingface/datasets/pull/3842 Rename GenerateMode to DownloadMode by @albertvillanova in https://github.com/huggingface/datasets/pull/3759 Remove deprecated methods/params (preparation for v2.0) by @mariosasko in https://github.com/huggingface/datasets/pull/3803 Remove deprecated remove_columns param in filter by @mariosasko in https://github.com/huggingface/datasets/pull/3827 Module namespace cleanup for v2.0 by @mariosasko in https://github.com/huggingface/datasets/pull/3875 Dataset Changes New: CFPB Consumer Complaints by @kayvane1 in https://github.com/huggingface/datasets/pull/3617 New: told-br (brazilian hate speech) by @JAugusto97 in https://github.com/huggingface/datasets/pull/3683 New: electricity load diagram by @kashif in https://github.com/huggingface/datasets/pull/3722 New: MIT Scene Parsing Benchmark by @mariosasko in https://github.com/huggingface/datasets/pull/3607 New: ElkarHizketak v1.0 by @antxa in https://github.com/huggingface/datasets/pull/3780 New: wikitablequestions by @SivilTaram in https://github.com/huggingface/datasets/pull/3870 New: ontonotes_conll by @richarddwang in https://github.com/huggingface/datasets/pull/3853 Update: BnL Historical Newspapers - make the dataset streamable by @albertvillanova in https://github.com/huggingface/datasets/pull/3616 Update: Common voice - add validated partition by @shalymin-amzn in https://github.com/huggingface/datasets/pull/3669 Update: Common Voice - add local paths to audio files by @lhoestq in https://github.com/huggingface/datasets/pull/3736 Update: Common Voice - simplify code by @lhoestq in https://github.com/huggingface/datasets/pull/3817 Update: Natural Questions - add dev-only configuration by @albertvillanova in https://github.com/huggingface/datasets/pull/3699 Update: pubmed - update data url by @albertvillanova in https://github.com/huggingface/datasets/pull/3692 Update: pubmed - make the dataset streamable by @abhi-mosaic in https://github.com/huggingface/datasets/pull/3740 Update: RedCaps - make the dataset streamable by @mariosasko in https://github.com/huggingface/datasets/pull/3737 Update: cats_vs_dogs - update metadata by @albertvillanova in https://github.com/huggingface/datasets/pull/3752 Update: newsroom - update manual download url by @albertvillanova in https://github.com/huggingface/datasets/pull/3779 Update: xcopa - update to new version by @albertvillanova in https://github.com/huggingface/datasets/pull/3810 Update: cats_vs_dogs size by @mariosasko in https://github.com/huggingface/datasets/pull/3878 Fix: sem_eval_2018_task_1 - fix download location by @maxpel in https://github.com/huggingface/datasets/pull/3643 Fix: newsqa - fix unique keys by @albertvillanova in https://github.com/huggingface/datasets/pull/3696 Fix: The Pile datasets - fix host urls by @albertvillanova in https://github.com/huggingface/datasets/pull/3627 Fix: Evidence Infer Treatment - fix dataset script by @albertvillanova in https://github.com/huggingface/datasets/pull/3718 Fix: NewsQA - fix dataset script by @albertvillanova in https://github.com/huggingface/datasets/pull/3734 Fix: head_qa - fix data url by @albertvillanova in https://github.com/huggingface/datasets/pull/3766 Fix: msr_sqa - fix unique keys by @albertvillanova in https://github.com/huggingface/datasets/pull/3771 Fix: reddit_tifu - fix data url by @albertvillanova in https://github.com/huggingface/datasets/pull/3774 Fix: wiki_lingua - fix spanish data file url by @albertvillanova in https://github.com/huggingface/datasets/pull/3806 Fix: beans - fix data urls by @mariosasko in https://github.com/huggingface/datasets/pull/3890 Fix: CRD3 - fix NonMatchingChecksumError by @albertvillanova in https://github.com/huggingface/datasets/pull/3921 Fix: MultiWOZ 2.2 - fix NonMatchingChecksumError by @albertvillanova in https://github.com/huggingface/datasets/pull/3922 Dataset cards Add code example in wikipedia card by @lhoestq in https://github.com/huggingface/datasets/pull/3678 Fix Multi-News dataset metadata and card by @albertvillanova in https://github.com/huggingface/datasets/pull/3731 Reddit dataset card additions by @anna-kay in https://github.com/huggingface/datasets/pull/3781 Update gigaword card and info by @mariosasko in https://github.com/huggingface/datasets/pull/3775 Reddit dataset card contribution by @anna-kay in https://github.com/huggingface/datasets/pull/3797 Metric Changes New: FrugalScore by @moussaKam in https://github.com/huggingface/datasets/pull/3674 New: Mahalanobis distance by @JoaoLages in https://github.com/huggingface/datasets/pull/3794 New: mIoU by @NielsRogge in https://github.com/huggingface/datasets/pull/3745 New: MSE and MAE - V2 by @dnaveenr in https://github.com/huggingface/datasets/pull/3874 Fix: METEOR - fix bug due to nltk version by @albertvillanova in https://github.com/huggingface/datasets/pull/3884 Metric cards Add perplexity to metrics by @emibaylor in https://github.com/huggingface/datasets/pull/3757 Create SQuAD metric README.md by @sashavor in https://github.com/huggingface/datasets/pull/3873 SQuAD v2 metric: create README.md by @sashavor in https://github.com/huggingface/datasets/pull/3879 Update README.md for SQuAD v2 metric by @sashavor in https://github.com/huggingface/datasets/pull/3908 Update README.md for SQuAD metric by @sashavor in https://github.com/huggingface/datasets/pull/3907 Create README.md for WER metric by @sashavor in https://github.com/huggingface/datasets/pull/3898 Create README.md for GLUE by @sashavor in https://github.com/huggingface/datasets/pull/3916 New documentation Update docs to new frontend/UI by @mishig25 in https://github.com/huggingface/datasets/pull/3690 Image process doc by @stevhliu in https://github.com/huggingface/datasets/pull/3882 General improvements and bug fixes Better TQDM output by @mariosasko in https://github.com/huggingface/datasets/pull/3654 Prioritize module.builder_kwargs over defaults in TestCommand by @lvwerra in https://github.com/huggingface/datasets/pull/3672 Extend support for streaming datasets that use os.path.relpath by @albertvillanova in https://github.com/huggingface/datasets/pull/3623 Add Fon language tag by @albertvillanova in https://github.com/huggingface/datasets/pull/3620 Remove unnecessary 'r' arg in by @bryant1410 in https://github.com/huggingface/datasets/pull/3661 Fix TestCommand to copy dataset_infos to local dir with only data files by @albertvillanova in https://github.com/huggingface/datasets/pull/3680 Upgrade black to version ~=22.0 by @LysandreJik in https://github.com/huggingface/datasets/pull/3691 Fix streaming for servers not supporting HTTP range requests by @albertvillanova in https://github.com/huggingface/datasets/pull/3689 Pin ElasticSearch by @lhoestq in https://github.com/huggingface/datasets/pull/3701 Raise informative error when loading a save_to_disk dataset by @albertvillanova in https://github.com/huggingface/datasets/pull/3705 Fix ClassLabel to/from dict when passed names_file by @albertvillanova in https://github.com/huggingface/datasets/pull/3695 Fix CI code quality issue by @albertvillanova in https://github.com/huggingface/datasets/pull/3710 Check if indices values in Dataset.select are within bounds by @mariosasko in https://github.com/huggingface/datasets/pull/3719 Pin pandas to avoid bug in streaming mode by @albertvillanova in https://github.com/huggingface/datasets/pull/3725 Use config pandas version in CSV dataset builder by @albertvillanova in https://github.com/huggingface/datasets/pull/3726 Set base path to hub url for canonical datasets by @lhoestq in https://github.com/huggingface/datasets/pull/3709 Fix ValueError message formatting in int2str by @akulchik in https://github.com/huggingface/datasets/pull/3742 Patch all module attributes in its namespace by @albertvillanova in https://github.com/huggingface/datasets/pull/3727 Fix typo in train split name by @albertvillanova in https://github.com/huggingface/datasets/pull/3751 feat: 🎸 generate info if dataset_infos.json does not exist by @severo in https://github.com/huggingface/datasets/pull/3670 Support streaming in size estimation function in push_to_hub by @mariosasko in https://github.com/huggingface/datasets/pull/3732 Expose method and fix param by @severo in https://github.com/huggingface/datasets/pull/3767 Fix HfFileSystem docstring by @lhoestq in https://github.com/huggingface/datasets/pull/3768 process .opus files (for Multilingual Spoken Words) by @polinaeterna in https://github.com/huggingface/datasets/pull/3666 Fix: dataset name is stored in keys by @thomasw21 in https://github.com/huggingface/datasets/pull/3772 Use the same seed to shuffle shards and metadata in streaming mode by @lhoestq in https://github.com/huggingface/datasets/pull/3746 Start removing canonical datasets logic by @lhoestq in https://github.com/huggingface/datasets/pull/3777 Support passing str to iter_files by @albertvillanova in https://github.com/huggingface/datasets/pull/3783 Fix Google Drive URL to avoid Virus scan warning by @albertvillanova in https://github.com/huggingface/datasets/pull/3787 Skip checksum computation if ignore_verifications is True by @mariosasko in https://github.com/huggingface/datasets/pull/3796 Fix error message in CSV loader for newer Pandas versions by @mariosasko in https://github.com/huggingface/datasets/pull/3798 Add data_dir to data_files resolution and misc improvements to HfFileSystem by @mariosasko in https://github.com/huggingface/datasets/pull/3791 Error of writing with different schema, due to nonpreservation of nullability by @richarddwang in https://github.com/huggingface/datasets/pull/3782 Handle Nones in PyArrow struct by @mariosasko in https://github.com/huggingface/datasets/pull/3814 Fix iter_archive getting reset by @lhoestq in https://github.com/huggingface/datasets/pull/3815 Added computer vision tasks by @merveenoyan in https://github.com/huggingface/datasets/pull/3800 Fix typo in doc build yml by @mishig25 in https://github.com/huggingface/datasets/pull/3819 Allow not specifying feature cols other than predictions/references in Metric.compute by @mariosasko in https://github.com/huggingface/datasets/pull/3824 Logo float left by @mishig25 in https://github.com/huggingface/datasets/pull/3836 Pin responses to fix CI for Windows by @albertvillanova in https://github.com

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 40
  • 40
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
40