hed-standard/hed-python: Release 1.1.0 April 19, 2026

Release 1.1.0 focuses on HED search performance and expanded search capabilities. The query engine (ExpressionAnd, ExpressionOr, ExpressionNegation) has been refactored to use set-based deduplication, yielding substantial speedups for complex queries. Two new internal modules — string_search and schema_lookup — enable schema-free HED search for use cases where loading a full schema is not practical. This release also adds Pandas 3.0 compatibility, a filename filter for the extract bids-sidecar CLI command, a get_task_names() method on BidsFileGroup, and deduplication of skip_cols in TabularSummary. A new benchmarks/ directory and docs/search_details.md page document the performance characteristics and design trade-offs of the three HED search implementations. New features Search engine performance improvements ExpressionAnd, ExpressionOr, and ExpressionNegation in query_expressions.py now use set-based deduplication instead of O(n²) list scanning, giving a significant speedup for queries that produce many intermediate results. Supporting this, SearchResult gains __eq__ and __hash__ methods so instances can be stored in sets and dicts. QueryHandler._expr_has_wildcard() replaces the fragile "?" in str(interior) string check with a proper recursive AST walk, eliminating false positives on queries whose string representations happen to contain a literal ?. StringQueryHandler and schema_lookup (internal / experimental) Two new internal modules support schema-free HED search: hed/models/string_search.py — StringQueryHandler subclasses QueryHandler and accepts a raw HED string instead of a parsed HedString, enabling query evaluation without loading a schema. StringNode duck-types HedGroup/HedTag so that existing Expression subclasses evaluate against it without modification. hed/models/schema_lookup.py — generate_schema_lookup(schema) builds a compact {short_tag: tag_terms} dict from a loaded schema that can be passed to StringQueryHandler.search() to enable ancestor-aware matching on short-form strings. save_schema_lookup() / load_schema_lookup() persist the table as JSON for offline use. These modules are not part of the public API and may change in future releases. HedGroup find-method documentation clarified Docstrings for find_tags, find_wildcard_tags, find_exact_tags, and find_tags_with_term now document the exact comparison property each method uses (short_base_tag, short_tag, HedTag.__eq__, and tag_terms respectively) and explain the rationale for that choice. Search benchmarks A new benchmarks/ directory provides reproducible performance benchmarking tools: search_benchmark.py measures throughput across query types and string sizes, data_generator.py synthesizes realistic HED strings, and report.py generates Markdown and PNG reports. Pre-computed results are stored under benchmarks/results/ and benchmark figures under docs/_static/images/. Search documentation A new docs/search_details.md page covers all three HED search implementations (basic_search, QueryHandler, and StringQueryHandler): design trade-offs, query language reference, and measured performance characteristics with benchmark figures. Pandas 3.0 compatibility All pandas 3.0 breaking changes have been addressed, and the pandas version constraint in pyproject.toml has been updated from <3.0.0 to <4.0.0: Copy-on-Write (CoW): Chained df[col][mask] = ... assignments in df_util.py replaced with df.loc[mask, col] = ... to prevent silent no-ops and the new ChainedAssignmentError. drop() API: Removed redundant axis=1 argument when columns= is already specified in data_util.py (the two arguments conflict in pandas 3.0). NaN handling in schema loading: df2schema.py, df_util.py, and hed_id_util.py now check isinstance(value, str) before calling string methods such as .strip() and .startswith(), preventing AttributeError when empty cells are float NaN rather than "". StringDtype in _merge_dataframes: Fillna logic updated in schema_io/df_util.py to use pd.api.types.is_numeric_dtype() instead of dtype == "object", correctly handling pandas 3.0 StringDtype columns. Float64 column FutureWarning: assign_hed_ids_section in hed_id_util.py now casts all-NaN hedId columns from float64 to object before assigning string values, eliminating a pandas deprecation warning. Added tests/test_pandas3_compat.py with 27 targeted tests covering all of the above fixes. Filename filter for extract bids-sidecar hedpy extract bids-sidecar and the underlying hed_extract_bids_sidecar script now accept a --filter / -fl option. Only files whose name contains the filter string are included in the sidecar extraction. Example: hedpy extract bids-sidecar /path/to/dataset --filter sub-01 BidsFileGroup.get_task_names() BidsFileGroup now exposes a get_task_names() method that returns a sorted list of unique task names (the xxxx portion of task-xxxx BIDS entities) found across all sidecar and data files in the group. TabularSummary deduplicates skip_cols TabularSummary.__init__ now deduplicates the skip_cols list using dict.fromkeys, preserving order. Passing the same column name more than once no longer produces duplicate entries in skip_cols or in the "Skip columns" field of the summary metadata output. Functional behaviour (which columns are skipped) is unchanged. Documentation Removed {index} placeholder annotations from README.md and examples/README.md. CI/CD Bumped actions/configure-pages from 5 to 6. Bumped astral-sh/setup-uv from v7 to v8.0.0. Updated anthropics/claude-code-action to v1.0.97. Pinned all GitHub Actions steps to full SHA hashes for supply-chain security. Updated spec_tests/hed-examples, spec_tests/hed-schemas, and spec_tests/hed-tests submodules. Full Changelog: https://github.com/hed-standard/hed-python/compare/1.0.0...1.1.0

Related Organizations

Dartmouth College
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

UArctic