Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

huggingface/datasets: 2.8.0

Authors: Lhoest, Quentin; Moral, Albert Villanova Del; Von Platen, Patrick; Wolf, Thomas; Ć aĆĄko, Mario; Jernite, Yacine; Abhishek Thakur; +24 Authors

huggingface/datasets: 2.8.0

Abstract

Important Removed YAML integer keys from class_label metadata by @albertvillanova in https://github.com/huggingface/datasets/pull/5277 From now on, datasets pushed on the Hub and using ClassLabel will use a new YAML model to store the feature types The new model uses strings instead of integers for the ids in label name mapping (e.g. 0 -> "0"). This is due to the Hub limitations. In a few months the Hub may stop allowing users to push the old YAML model. Old versions of datasets are not able to reload datasets pushed with this new model, so we encourage everyone to update. Datasets Features Fix methods using IterableDataset.map that lead to features=None by @alvarobartt in https://github.com/huggingface/datasets/pull/5287 Datasets in streaming mode now update their features after column renaming or removal Add num_proc to from_csv/generator/json/parquet/text by @lhoestq in https://github.com/huggingface/datasets/pull/5239 Use multiprocessing to load multiple files in parallel Add features param to IterableDataset.map by @alvarobartt in https://github.com/huggingface/datasets/pull/5311 Sharded save_to_disk + multiprocessing by @lhoestq in https://github.com/huggingface/datasets/pull/5268 Pass num_shards or max_shard_size to ds.save_to_disk() or ds.push_to_hub() Pass num_proc to use multiprocessing. Support for decoding Image/Audio types in map when format type is not default one by @mariosasko in https://github.com/huggingface/datasets/pull/5252 Support torch dataloader without torch formatting for IterableDataset by @lhoestq in https://github.com/huggingface/datasets/pull/5357 You can now pass any dataset in streaming mode to a PyTorch DataLoader directly:from datasets import load_dataset ds = load_dataset("c4", "en", streaming=True, split="train") dataloader = DataLoader(ds, batch_size=32, num_workers=4) Docs Complete doc migration by @mishig25 in https://github.com/huggingface/datasets/pull/5248 General improvements and bug fixes typo by @WrRan in https://github.com/huggingface/datasets/pull/5253 typo by @WrRan in https://github.com/huggingface/datasets/pull/5254 remove an unused statement by @WrRan in https://github.com/huggingface/datasets/pull/5257 fix wrong print by @WrRan in https://github.com/huggingface/datasets/pull/5256 Fix max_shard_size docs by @lhoestq in https://github.com/huggingface/datasets/pull/5267 Specify arguments as keywords in librosa.reshape to avoid future errors by @polinaeterna in https://github.com/huggingface/datasets/pull/5266 Change release procedure to use only pull requests by @albertvillanova in https://github.com/huggingface/datasets/pull/5250 Warn about checksums by @lhoestq in https://github.com/huggingface/datasets/pull/5279 Tweak readme by @lhoestq in https://github.com/huggingface/datasets/pull/5210 Save file name in embed_storage by @lhoestq in https://github.com/huggingface/datasets/pull/5285 Use correct dataset type in from_generator docs by @mariosasko in https://github.com/huggingface/datasets/pull/5307 Support streaming datasets with pathlib.Path.with_suffix by @albertvillanova in https://github.com/huggingface/datasets/pull/5294 Fix xjoin for Windows pathnames by @albertvillanova in https://github.com/huggingface/datasets/pull/5297 Fix xopen for Windows pathnames by @albertvillanova in https://github.com/huggingface/datasets/pull/5299 Ci py3.10 by @lhoestq in https://github.com/huggingface/datasets/pull/5065 Update Overview.ipynb google colab by @lhoestq in https://github.com/huggingface/datasets/pull/5211 Support xPath for Windows pathnames by @albertvillanova in https://github.com/huggingface/datasets/pull/5310 Fix description of streaming in the docs by @polinaeterna in https://github.com/huggingface/datasets/pull/5313 Fix Text sample_by paragraph by @albertvillanova in https://github.com/huggingface/datasets/pull/5319 [Extract] Place the lock file next to the destination directory by @lhoestq in https://github.com/huggingface/datasets/pull/5320 Fix loading from HF GCP cache by @lhoestq in https://github.com/huggingface/datasets/pull/5321 This was affecting datasets like wikipedia or natural_questions Fix docs building for main by @albertvillanova in https://github.com/huggingface/datasets/pull/5328 Origin/fix missing features error by @eunseojo in https://github.com/huggingface/datasets/pull/5318 fix: 🐛 pass the token to get the list of config names by @severo in https://github.com/huggingface/datasets/pull/5333 Clarify imagefolder is for small datasets by @stevhliu in https://github.com/huggingface/datasets/pull/5329 Close stream in ArrowWriter.finalize before inference error by @mariosasko in https://github.com/huggingface/datasets/pull/5309 Use same num_proc for dataset download and generation by @mariosasko in https://github.com/huggingface/datasets/pull/5300 Set IterableDataset.map param batch_size typing as optional by @alvarobartt in https://github.com/huggingface/datasets/pull/5336 fix: dataset path should be absolute by @vigsterkr in https://github.com/huggingface/datasets/pull/5234 Clean up DatasetInfo and Dataset docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5340 Clean up docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5334 Remove tasks.json by @lhoestq in https://github.com/huggingface/datasets/pull/5341 Support topdown parameter in xwalk by @mariosasko in https://github.com/huggingface/datasets/pull/5308 Improve use_auth_token docstring and deprecate use_auth_token in download_and_prepare by @mariosasko in https://github.com/huggingface/datasets/pull/5302 Clean up Loading methods docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5350 Clean up remaining Main Classes docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5349 Clean up Dataset and DatasetDict by @stevhliu in https://github.com/huggingface/datasets/pull/5344 Clean up Table class docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5355 Raise error for .tar archives in the same way as for .tar.gz and .tgz in _get_extraction_protocol by @polinaeterna in https://github.com/huggingface/datasets/pull/5322 Clean filesystem and logging docstrings by @stevhliu in https://github.com/huggingface/datasets/pull/5356 ExamplesIterable fixes by @lhoestq in https://github.com/huggingface/datasets/pull/5366 Simplify skipping by @Muennighoff in https://github.com/huggingface/datasets/pull/5373 Release: 2.8.0 by @lhoestq in https://github.com/huggingface/datasets/pull/5375 New Contributors @WrRan made their first contribution in https://github.com/huggingface/datasets/pull/5253 @eunseojo made their first contribution in https://github.com/huggingface/datasets/pull/5318 @vigsterkr made their first contribution in https://github.com/huggingface/datasets/pull/5234 Full Changelog: https://github.com/huggingface/datasets/compare/2.7.0...dfwe New Contributors @WrRan made their first contribution in https://github.com/huggingface/datasets/pull/5253 @eunseojo made their first contribution in https://github.com/huggingface/datasets/pull/5318 @vigsterkr made their first contribution in https://github.com/huggingface/datasets/pull/5234 @Muennighoff made their first contribution in https://github.com/huggingface/datasets/pull/5373 Full Changelog: https://github.com/huggingface/datasets/compare/2.7.0...2.8.0

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average