pola-rs/polars: Rust Polars 0.54.4

🏆 Highlights Add LazyFrame.gather (#27501) Nested common subplan elimination (#27340) Stabilize streaming engine (#27497) Speed up parquet metadata decode with hand-written Thrift (#27427) Add streaming support for grouped AsOf join (#27293) 🚀 Performance improvements Eliminate filters with contradictory predicates (#27775) Update to new jemalloc (#27797) Do not materialize ScalarColumn in Column split_at (#27782) Avoid materializing broadcast in array.shift (#27740) Avoid materializing broadcast list in list.sample(n) and list.sample(frac) (#27679) Adaptive size dispatch to hashset or radix sort + capacity-aware reset in agg_n_unique (#27719) Dispatch {list,arr}.{unique,n_unique,reverse} to group_by engine (#27278) Improve in-memory grouped non-null count (#27702) Factor shared conjuncts out of OR-of-ANDs predicates (#27627) Skip downloading IPC batches exceeding slice bounds (#27683) Faster Series::is_sorted for logical / non-primitive types (#27567) Avoid materializing broadcast list in list.shift (#27628) Optimise json_decode Datetime string parsing (#27559) Speed up to_numpy C-order via cache-blocked transpose (#27522) Optimize select(len()) for non-strict horizontal concat (#27516) Pushdown slices to inputs on left/right/full join (#27508) Don't infer CSV schema if schema is set (#27507) Nested common subplan elimination (#27340) Make is_in row-group pruning precise on null-containing haystacks (#27495) Don't do fused-multiply-add on scalars (#27479) List full fast path (#27477) Make is_in row-group pruning precise on multi-value lists (#27475) Add streaming GatherNode (#27465) Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462) Speed up parquet metadata decode with hand-written Thrift (#27427) Skip validity mask processing in __array_ufunc__ when no inputs have nulls (#27358) Create IR slice from expr slice pushdown (#27200) Add streaming support for grouped AsOf join (#27293) Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264) Lower basic over() to streaming primitives (#27303) Lower drop_{nulls,nans} in streaming group_by aggregations (#27296) Lower entropy to streaming reductions (#27174) Add native streaming interpolate (#27185) Streaming strptime with format=None (#27056) Lower skew / kurtosis to streaming aggregations (#27176) Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192) Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187) Always process pyarrow scan in batches (#27183) Make cut output Enum and mark as elementwise (#27173) Remove unused expression sorts (#27075) Use delta stats for mixed hive and non-hive predicate pushdown (#27102) Take into account size per row in join sampling (#27098) Streaming is_first_distinct and unique(maintain_order=True) (#27052) Streaming cov and corr (#27008) Add sorted unique node to streaming engine (#26990) Ensure Expr.append is lowered in streaming engine (#27022) Collapse consecutive Sort nodes (#26965) Drop maintain_order=True requirement in sink_delta (#27007) Lower index_of to streaming engine (#26923) Streaming native backward_fill (#26967) Native streaming forward_fill (#26922) Drop unused filter column above cache (#26955) Optimize .replace() from a single value (#26948) Add a streaming range-join (#26790) Lower arg_{min,max} to streaming engine (#26845) Additional IR slice pushdown after filter pushdown (#26815) Streaming first/last on Enum through physical (#26783) Fast filter for scalar predicates (#26745) Allow SimpleProjection in streaming engine to rename (#26709) Streaming cloud download for scan_csv (#26637) Drop columns only needed for predicates after the predicate is applied (#26703) Run projection pushdown after predicate pushdown (#26688) Comparison literal downcasting (#26663) Add dynamic predicates for TopK (#26495) Increase minimum default parquet row group prefetch to 8 (#26632) Partial predicate conversion to PyArrow (#26567) Streaming cloud download for scan_ndjson / scan_lines (#26563) Grab GIL fewer times during Object join materialization (#26587) Improve CSV and NDJSON cloud sink performance (#26545) Tune cloud writer performance (#26518) Allow parallel InMemorySinks in streaming engine (#26501) Add streaming AsOf join node (#26398) ✨ Enhancements Expose fixed-size rolling window expressions in Python visitor (#27108) Expose IR::Scan hive parts in the python node visitor (#27829) Expose IRFunctionExpr::DynamicPred in the python visitor (#27616) Fix SchemaError using lazy HConcat->Sink (#27770) Add pinning and queuing logic to polars-ooc (#27791) Add tiered multi-file parquet metadata resolver (#27720) Cache and shuffle DNS for cloud object_store (#27659) Update to new jemalloc (#27797) Allow deeper expressions (#27768) Add is_inherently_nondeterministic helper for AExpr (#27687) Use true division for the / operator in Polars SQL (#27391) Add Rust backend for Expr.has_nulls (#27590) Add block_in_place to Polars' async executor (#27612) Stabilize float16 (#27607) Add Expr.is_empty (#27583) Add support for the SQL FILTER clause for aggregate functions, and STRING_AGG (#27564) Make parquet FileMetadata prunable for IR-plan dispatch (#27535) Broadcast scalar input for list.slice (#27487) Add LazyFrame.gather (#27501) Add null_on_oob in {Expr/Series}.gather (#27327) Stabilize streaming engine (#27497) Process batched arr.eval on overflow boundaries (#27496) Process batched list.eval on overflow boundaries (#27483) Print SLICED UNION in LazyFrame explain (#27467) Cargo deny (#27363) Add maintain_order parameter to merge_sorted (#27263) Add ignore_nulls to {list,arr}.{any,all} (#27186) Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774) Add is_unique to list/array dtypes (#27290) Add pl.merge_sorted operating on multiple frames (#27014) Add fast_alloc feature flag, remove default_alloc (#27206) Add a GPU slot to OptFlags so we can control CSE (similar to streaming) (#27026) Allow group_by() without key exprs (#27141) Collapse consecutive Sort nodes (#26965) Use UUIDv7 for sink_iceberg directory name generation (#26958) Truncate large binary/utf8 Parquet statistics values (#26764) Error if PartitionBy path provider returns absolute path that does not begin with base path, or contains '..' (#26894) Support Delta deletion vectors in scan_delta (#26867) Support Decimal32/64 in scan_parquet (#26941) Support casting Duration to String in ISO 8601 format (#26860) Add a streaming range-join (#26790) Support Expr for holidays in business day calculations (#26193) Parameter for pivot to always include value column name (#26730) Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866) Extend Expr.reinterpret to all numeric types of the same size (#26401) Add missing_columns parameter to scan_csv (#26787) Clear no-op scan projections (#26858) Support nested datatypes for {min,max}_by (#26849) Support SQL ARRAY init from typed literals (#26622) Accept table identifier string in scan_iceberg() (#26826) Add a convenience make fresh command to the Makefile (#26809) Add unstable LazyFrame.sink_iceberg (#26799) Add maintain order argument on implode (#26782) Implement predicate pushdown for aliased groupby keys (#26597) Speed up casting primitive to bool by at least 2x (#26823) Enable rowgroup skipping for float columns (#26805) Add expression context to errors (#26716) Add Decimal support for product reduction (#26725) Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669) Re-work behavior of arrow_schema parameter on sink_parquet (#26621) Add contains_dtype() method for Schema (#26661) Implement truncate as a "to_zero" rounding mode (#26677) Expose AExpr::Rolling in the python visitor (#26715) More generic streaming GroupBy lowering (#26696) Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443) Add truncate Expression for numeric values (#26666) Better error messages for hex literal conversion issues in the SQL interface (#26657) Add SQL support for LPAD and RPAD string functions (#26631) Support SQL "FROM-first" SELECT query syntax (#26598) Speed up any() and all() for nulls (#26615) Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers (#26075) Expose unstable assert_schema_equal in py-polars (#24869) Allow parsing of compact ISO 8601 strings (#24629) Streaming cloud download for scan_ndjson / scan_lines (#26563) Configuration to cast integers to floats in cast_options for scan_parquet (#26492) Add escaping to quotes and newlines when reading JSON object into string (#26578) Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425) Support sas_token in Azure credential provider (#26565) Expose HConcat options in the python node visitor (#26551) Relax SQL requirement for derived tables and subqueries to have aliases (#26543) Add polars-config and pl.Config.reload_env_vars() (#26524) Record path for object store error raised from sinks (#26541) Use CRC64NVME for checksum in aws sinks (#26522) Add get() for binary Series (#26514) Add streaming AsOf join node (#26398) 🐞 Bug fixes Fix skip_batches not handling negation of bool dtype with None values (#27452) Use block_in_place_on for calls which can come from executor thread (#27855) Mismatch in max_threads -> pipeline configuration (#27854) Keep maintain_order on sliced unique (#27852) Fix SchemaError using lazy HConcat->Sink (#27770) Fix incorrect projection height when selecting only literals (#27825) Fix rolling aggregations with window_size=0 (#27812) Select with expr slice and len gave incorrect len (#27824) Prevent import panic when environment variable set to unexpected value (#27831) Broken link to AI Policy corrected (#27793) Update to new jemalloc (#27797) Swap PlHashMap for PlIndexMap to make Multiplexer insertion order stable (#27785) Compare length for inline slice as usize (#27779) Raise length mismatch in multiple sort_by in group_by (#27772) Respect min_samples for rolling_by ops with nulls (#27706) Fix memory usage regression affecting TPCH Q22 (#27758) Add POLARS_ALLOW_NESTED_CSPE env var and make nested CSPE opt-in (#27765) Post-apply residual pyarrow predicates (#27764) Fix loss of precision for smaller floating types(#27662) (#27732) Filter at scan dropped in CSPE filter pushdown (#27763) Fix portstate assertion error on is_in (#27757) Fix incorrect when/then after forward fill / reverse in groupby (#27745) Accept empty Thrift list encoded as bare 0x00 byte in parquet metadata (#27754) Stabilize object store credentialprovider cache key (#27712) Panic in scan of empty IPC with slice (#27708) Persist object_store rebuild state in cache (#27707) Sort flag on GroupsType only applies to first element (#27684) Invalid unwrap_unchecked when length isn't exact (#27685) Logic error in async executor block_in_place (#27698) Don't unwrap channel send in streaming join_asof (#27688) Fix merge_sorted panic when List in frame (#27568) Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658) Fix FixedRingBuffer allocation provenance (#27669) Fix skip_batches logic for NaN (#27673) Raise TypeError when calling next() directly on GroupBy objects (#27562) Data type comparison for extension types (#27632) Fix filter_scan_ir usize integer underflow (#27633) Share last-morsel split budget across files in streaming multi-scan (#27630) Reset the sort-options in Series::is_sorted() after row-encoding columns (#27614) Rayon deadlock with re-entrant io sources (#27600) Don't push negative-offset slices through HConcat (#27570) Logic error in streaming is_empty (#27602) Fix incorrect CSE with large is_in literal (#27575) AnonymousFunction can qualify as SQL aggregator (#26986) Fix CSPE panic in cloud (#27594) Set merge-join streaming node to Finished if its sending port is Done (#27572) Widen decimal precision on sum aggregation at runtime (#27579) Fix str.to_time was raising unnecessarily when input was all nulls (#27574) Prevent panic when switching from one extension dtype to another (#27566) Ensure json_decode doesn't fail for Date and Time string deserialization (#27554) Incorrect RUSTFLAGS passing in Makefile (#27555) Avoid panic on open-ended slice (#27550) Fix panic on reading IPC with 0-row compressed bitmap (#27551) Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548) Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506) Prevent join panic when suffix="" and coalesce=True (#27376) Do not make a FastCount for csv if pre_slice is set (#27536) Support duplicate names in over (#27544) Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538) Do not reverse dataframes when sorting with all-null key columns (#27517) Incorrect length check on streaming zip (#27505) Respect nulls_last for descending over(order_by) in group_by().agg() (#27486) Fix perf regression in scan_csv select(len()) when collected on streaming engine (#27504) Harden extend strictness (#27476) Prevent deadlock when using to_arrow() in a multithreaded context (#27472) Rebalance deep merge_sorted chains (#27065) Do not flatten sliced union (#27466) Prevent deadlock when using to_pandas() in multithreaded context (#27451) Struct rechunk bug and add Series::with_validity (#27446) Handle column indexing in read_parquet/read_csv with pyarrow reader (#27397) Export enum as ordered dictionary to arrow (#27432) Ensure index column is sorted in streaming rolling aggs (#27234) Ensure sample() respects shuffle=False (#27248) Return empty DataFrame from concat_list with lit and empty column (#27305) Read parquet MAP columns without LogicalType annotation (#27404) Raise DuplicateError on parquet files with duplicate column names (#27399) Honor having predicate in GroupBy iter (#27370) Use the physical dtype for NumUnorderedImplodeReducer arrow ListArray (#27375) Address bug in reduce_balanced for certain input length lists affecting pl.concat (#27352) Ensure list.sample() allows fraction > 1 when with_replacement=True (#27350) Ensure append() errors when upcast=False (#27346) Always rechunk sorts, prune sorts even in eager execution (#27356) Update groups to correct length for Implode (#27282) Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297) Raise on non-numeric inputs in pl.int_ranges (#27294) Do not skip nulls when enumerating over rows in grouped AsOf join (#27275) Fix pivot dropping data for null on values (#27273) Resolve multiple files deadlock in CSV async reader (#27073) Widen decimal precision on sum aggregation (#27270) Correct lf.remote type (#27261) Extend StructEval schema context in StackOptimizer (#27243) Prevent panic when casting Array to extension type with same inner type (#27220) Preserve nulls when casting from all-null Series to Struct (#27241) Off-by-one in lp.with_inputs length assertion (#27209) Fix scan_delta filter on empty dataframe (#27244) Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217) Skip null group entries when collecting AsOf-by groups (#27215) Fix panic with empty order_by in over expression (#27088) Write field ID from sink_parquet (#27196) Fix statistics for Null columns in Parquet (#27021) Do not prune sort nodes containing slice with dyn predicate (#27140) Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172) Fix scalar handling in str.replace during streaming (#27182) Resolve multiple files deadlock in NDJSON async reader (#27204) Overflow panic in interpolate nearest (#27205) Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129) Don't trigger csv fast count if predicate is pushed down (#27190) Streaming sort by-expressions were lowered incorrectly (#27158) Reset IO metrics instead of consuming (#27156) Output SVG if output_path ends with '.svg' in show_graph (#27144) Skip extension types for min/max in describe (#27120) Fix incorrect IO metrics on multi-phase streaming execution (#27123) Use delta stats for mixed hive and non-hive predicate pushdown (#27102) Make the files used in docs available locally (#27121) Apply scalar bound in clip when the Series bound contains nulls (#27087) Ignore ddof parameter in rolling_corr and deprecate (#27104) Preserve casts for horizontal ops with untyped literals (#27011) Reject invalid input to sql_expr (#27084) Ensure SQL COUNT(<lit>) expressions return the correct value (#27085) Regression in replace_strict for enums (#27066) Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048) Null count for aggregated list inside count aggregation (#27032) Panic in streaming MergeSortedNode (#27024) Prevent panic in transpose() with mixed List and non-List columns (#27038) Set sorted flag for Boolean and Time (#27035) Missing src/ subdirectory to CI Python docs step (#27025) Resolve stack overflow on merge_sorted and union (#27018) Make pl.DataFrame.fill_null work on columns with Null dtype (#27020) Fix initial MutableBooleanArray::extend_constant(count, None) calls (#26813) Fix repeated word typos in comments (#26917) Covariance with constant is zero, not NaN (#27015) Don't remove set_sorted in projection pushdown (#27006) Infer nulls when df create from empty-struct (#26991) Correct suggestion in multi-expr filter error (#27003) Implement agg_arg_min/agg_arg_max for boolean data type (#26997) Raise error instead of panic for unsupported pivot aggregate (#26863) Validate fraction is between 0.0 and 1.0 in list.sample (#26964) Informative error for multi-quantile in group_by (#26957) Raise for duplicate columns in over() (#26968) Preserve height when unnesting empty struct columns (#26947) Support Decimal32/64 in scan_parquet (#26941) Follow-up on streaming range-join PR (#26944) Fix ColumnNotFound due to projection between filter/cache in CSPE (#26946) Fix panic on upsample() with group_by parameter on empty DataFrame (#26936) Fix the loop bounds in BitmapBuilder::extend_each_repeated_from_slice_unchecked (#26928) Default engine as streaming for collect_batches (#26932) Set stricter maintain_order in test_schema_row_index_cse (#26931) Fix error passing Series of dates to business functions (#26927) Propagate null in min_by / max_by for all-null by groups (#26919) Fix panic on lazy concat->filter->slice with CSPE (#26907) Handle empty rolling windows in streaming engine (#26903) Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine (#26878) Fix sink to partitioned S3 from Windows corrupted slashes (#26889) Remove outdated warning about List columns in unique() (#26295) (#26890) Restore pyarrow predicate conversion for is_in (#26811) Release GIL before df.to_ndarray() to avoid deadlock (#26832) Fix panic on CSV count_rows with FORCE_ASYNC (#26883) Add scalar comparisons for UInt128 series (#26886) Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877) Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871) Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868) Incorrect arg_sort with descending+limit (#26839) Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866) Return ComputeError instead of panicking in map_groups UDF (#26665) Issue PerformanceWarning in LazyFrame.__contains__ (#26734) Segfault in JoinExec on deep plan (#26796) Fix unary expressions on literal in over context (#26827) Fix {min,max}_by in streaming engine for Boolean full {min,max} value column (#26848) Fix debug panic on clip with nan bound (#26854) Support grouped {arg_,}_{min,max} for Categoricals (#26856) Throw an error if a string is passed to LazyFrame.pivot on_columns (#26852) Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types (#26820) Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835) Prevent infinite recursion in streaming group_by fallback (#26801) Use RowEncodingContext::Struct when determining D::Struct encoded item len (#26817) Incorrectly applied CSE on different map_batches functions (#26822) Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING (#26792) Prevent predicate pushdown across Sort with baked-in slice (#26804) Fix panic on lazy sink_parquet created in pipe_with_schema (#26784) Support {column_name} and {index} placeholders in pl.format string (#26771) Do not use merge-join if nulls_last is unknown (#26778) Normalize float zeros in Parquet column statistics (#26776) Fix out-of-bounds for positive offset in windowed rolling (#26724) Raise error when .get() is out-of-bounds in group by context (#26752) Boolean bitwise_xor aggregation inverted when column contains nulls (#26749) Parameter nulls_last was ignored in over (#26718) Allow missing time in inexact strptime (#26714) Return NaN when using corr() with a literal and expr (#26697) Allow strict horizontal concat with empty df (#26345) Fix PoisonError panic caused by reentrant usage of file cache (#26627) Return null for int values exceeding 128-bit range with strict=False (#26674) Incorrect boolean min/max with nulls (#26671) Slice-slice pushdown for n_rows (#26673) Resolve panic in Enum struct slicing (#26643) Fix CSPE for group_by.map_groups (#26640) Remove non-existent parameter from SQLContext typing overloads (#26658) Replace panic with error when sorting object dtype columns (#26601) Fix to_pandas() on empty enum Series did not preserve enum dictionary (#26610) Rounding behaviour for f32 values with "HalfAwayFromZero" mode (#26624) Correct arg_(min|max) for scalar columns (#26609) Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616) Materialize unknown scalar int/float literals in collect_dtype() (#26595) Return error when by= is nested type in min_by / max_by (#26593) Fix assert_frame_not_equal() did not raise on dtype mismatch (#26590) Respect SQL semantics for cumulative functions mapped via OVER clause (#26570) Fix incorrect multiplexer output ordering on source token stop request (#26561) Fix PyIceberg filter on boolean column (#26550) Fix *_range exprs incorrectly marked as row separable (#26549) Set dictionary_page_offset when dictionary encoding is used and point data_page_offset to the first data page (#26542) Prevent GPU engine panic on SinkMultiple nodes (#26537) Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539) Implement PhysicalExpr for MinBy/MaxBy nodes (#26506) Refactor row-encoding logic in IR join lowering into separate function (#26512) Correctly check for path extensions (#26513) Change AsOf join to be based on TotalOrd (#26497) Correctly raise error on failing nested strict casts (#26499) Prevent invalid type casts in replace_strict() (#26453) Return null when dividing literals by 0 (#26343) 📖 Documentation Bump to patched version (#27851) Replace Typeform sign-up URL with new enterprise link (#27838) Correct wrong head call (#27848) Add Polars On-Prem 0.5.0 release (#27849) Correct onprem license helm values (#27847) Update connecting Polars Cloud to AWS documentation (#27823) Broken link to AI Policy corrected (#27793) Add release dates to the On-Prem releases page (#27787) Improve on-prem docs (#27788) Add query profiler video to On-Prem user guide (#27786) Add EKS/AKS/GKE guides (#27774) Sync from Polars Cloud (#27751) Document Expr.list.__getitem__ (#27689) Add cloudpickle requirement (#27703) Clarify from_arrow schema ordering (#27493) Clarify schema column order (#27681) Update DataFrame construction docs for Column (#27541) Document all valid engine options on LazyFrame collect/sink/explain methods (#27374) Drop redundant Pattern 2 from Dagster integration page (#27581) Update to remove Dockerhub PAT references (#27582) Modernize Dagster integration example for Polars Cloud (#27560) Use Polars random seed in sample example (#27537) Make expressions operations RNG deterministic (#27494) Document struct field order (#27492) Add See Also sections for datetime docstrings (#27316) Polars On-Prem release (#27439) Rename to Polars On-Prem (#27435) Split out openlineage docs into guide and configuration (#27371) Add explanation on the observatory sqlite db file (#27354) Add documentation for openlineage on-premises (#27334) Release page (#27335) Update uv pip install polars-on-premises cmd (#27330) Fix outdated LazyGroupBy.map_groups docstring (#27292) Add deny_anonymous_users to scheduler config (#27287) Slurm documentation (#27259) Add link to concepts in index.md (#27077) Add docs entry for merge_sorted (#27224) Fix typo (#27212) Make the files used in docs available locally (#27121) Put first-time contribution requirements in its own linkable section (#27113) Change Polars Cloud API to 0.6.0 (#27005) Query Profiler addition to User Guide (#26623) Add documentation for on_columns for LazyFrame pivot (#26859) Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692) Remove confusing join validation note (#26795) Fix broken AI policy link (#26728) Create Polars Cloud Glossary (#26690) Additional SQL documentation (#26662) Include invalidate_caches in bisect instructions (#26641) Add git bisect guide to contributing docs (#26634) Updated Airflow orchestration documentation (#26585) Improve SQL docs for EXTRACT and DATE_PART functions (#26575) Remove reference to MutableStructArray in module doc (#26557) Fix docstring for bitwise_count_zeros method (#26519) Add get() for binary Series (#26514) 📦 Build system Also split debug info in debug-release (#27609) Use split-debuginfo on linux (#27608) Bump deltalake to 1.5.1 in CI (#27387) Really do not install pyiceberg-core 0.9.0 (#27017) Bump up numpy and pyo3 to 0.28 (#26743) 🛠️ Other improvements Add statistics to spill contexts (#27859) Include license file in polars-ooc crate (#27864) Changes needed for Rust 0.54.x (#27853) Use Vec instead of PlHashMap for ProjectionInfo.map (#27856) Reduce codegen-units (#27835) Deduplicate thrift field-walk loops (#27790) Harden against async blocking deadlocks (take 2) (#27767) Added jlumbroso/free-disk-space cleaning action where relevant (#27769) Update runtime edition to 2024 (#27746) Remove redundant DSL::AGG::Unique (#27718) Harden against async blocking deadlocks (#27653) Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657) Remove last global static mut (#27704) Remove unused equal_element code (#27701) Remove unused suspect AsRef impl (#27699) Remove Box<dyn Iterator> IntoIterator for ChunkedArray (#27697) Remove trailing semicolons in fmt macros (#27705) Add dynamic slice to unoptimized dispatch (#27693) Format missed in previous PR (#27700) Bump pytest and remove codspeed (#27686) Store record batch row counts custom polars IPC metadata field (#27549) Remove client-side allow_local_scans option for prepare_cloud_plan (#27663) Remove superfluous test (#27676) Cleanup streaming flags (#27671) Expose unordered concatenation in python visitor (#27666) Bump deltalake and fix CI (#27660) Add impl IntoAExprBuilder for ExprIR (#27656) Update object_store patch repo (#27650) Bump up thiserror (#27648) Move async executor and primitives to polars-async (#27629) Add ImageVersion to rust-cache key (#27626) Rename POOL to RAYON (#27606) Use first_non_null for strptime infer (#27577) Add arg mapper to unoptimized dispatch (#27599) Fix is_empty test (#27597) Fix tz type difference pandas assert, take 2 (#27596) Fix CSPE panic in cloud (#27594) Fix tz type difference pandas assert (#27593) Add contributing note about conventional comments (#27543) Add AnonymousColumnsUdf to UnoptimizedOperation (#27513) Move Quantile to FunctionIRExpr (#27498) Nested common subplan elimination (#27340) Remove old projection pushdown code (#27499) Refactored projection pushdown with cache handling (#27422) Refactor CSPE (#27425) Deduplicate interns (#27470) Fix merge conflict in ColumnarFunction (#27464) Schema per port for PhysNode (#27302) Keep the schema ordered in scan projection pushdown (#27429) Remove redundant PhysNodeKind::AsOfJoin::{left_right}_by fields (#27400) Bump apache-avro version (#27419) Bump rustls-webpki (#27382) Disable debug symbols in macos coverage tests (#27361) Cargo deny (#27363) Add generic tree traversal with edge value propagation (#27249) Bump Python Polars version (#27315) Utility for identifying expr projection heights (#27198) Sink DSL and callback for Iceberg (#27258) Wait for morsel consumption in merge_sorted streaming node (#27288) Mark scan_ipc cache arguments as deprecated (#27216) Consolidate reordered compare functions (#27229) Add zip_eq to itertools (#27210) Remove unused attributes (#27191) Avoid unnecessary recompilation due to changing env vars (#27166) Update nightly Rust compiler version (#27145) Simplify pyarrow scan and process in batches (#26982) Make internal typing more precise (part ii) (#27117) Remove unused expression sorts (#27075) Add memory usage tracking to global allocator (#27103) Add sinked paths callback (#26995) Pin maturin due to compile time regression (#27062) Missing src/ subdirectory to CI Python docs step (#27025) Really do not install pyiceberg-core 0.9.0 (#27017) Naming for named scopes (#26999) Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818) Fix CI by excluding missing wheel version of pyiceberg (#27001) Replace clippy::never_loop with break on named scopes (#26983) Remove indirection in calling python scans (#26981) Polars versions (#26980) Polars version (#26971) Set stricter maintain_order in test_schema_row_index_cse (#26931) Bump build deps used in ARM64 Windows release pipeline (#26892) Use large linux-arm runner for release (#26898) Ensure .gitignore and .typos.toml exclude "_polars_runtime*" directories (#26842) Additional IR slice pushdown after filter pushdown (#26815) Add private _expand_paths scan function (#26798) Change Expr sortedness container to AExprSorted and add nulls_last to PyExpr.set_sorted() (#26781) Move stop_and_buffer_pipe_contents into joins/utils.rs (#26810) Replace iejoin is_supported_type macro with a closure in predicate_pushdown/join.rs (#26812) Fix first-time contributor auto-label (#26794) Move Series arrow export code from into.rs to arrow_export (#26775) Automatically add first-contribution label (#26780) Make contributing policy more strict (#26772) Add unused argument warning to ruff rules (#26720) Move shared streaming CSV/NDJSON code into shared mod (#26742) Undo pub removal of to_dyn_object_store (#26722) Remove unused proptest.rs DataFrame file (#26676) Add test for predicate before join (#26705) Fix file cache debug assertion failure (#26695) Put physical_plan join formatting code into a separate function (#26691) Remove PlanCallback from sql (#26686) Add dtype visitor (#26628) Bump Rust nightly compiler version (#26379) Remove unused problematic ArrayFromIter (#26639) Move more boolean code to polars_compute, reusing kernels (#26636) Move ? to assignment site and use extend() in StructEvalExpr (#26635) Cleanup assert_schema_equal (#26596) Replace some env var reading by polars-config (#26607) Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616) Remove string allocation from polars_err!(Variant: "str") (#26579) Add wrapper for clippy so it continues on warnings (#26527) Add Buffer::split_at / Buffer::split_off (#26583) Use LazyFrame.clear to clear sql (#26562) Update docs (#26560) Add backtrace coloring (#26544) Evaluate sql process_except_intersect during IR (#26516) Reformat LICENSE (#26532) Add a pipeline in which we test with POLARS_IDEAL_MORSEL_SIZE=4 (#26420) Remove test_file and have tests create test.parquet in tmp_path (#26525) Refactor row-encoding logic in IR join lowering into separate function (#26512) Fix mypy pyiceberg expression errors (#26523) Make nix flake mostly work (#26517) Switch to custom cloud writer with IO sink metrics (#26494) Remove Default on DataType (#26511) Propagate object-store error information (#26406) Have parameterized series rechunk() if not allow_chunks (#26504) Remove dead code (RevMapping) (#26508) Rename Arena get_many_mut to get_disjoint_mut (#26491) Thank you to all our contributors for making this release possible! @0guban0v, @0xRozier, @BJohnBraddock, @BitWeaverDev, @ButteryPaws, @EndPositive, @HCYT, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @NathanHu725, @NedJWestern, @NeejWeej, @NicoOhR, @RedZapdos123, @RenzoMXD, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abhidotsh, @abishop1990, @alexander-beedie, @andyjessen, @ankane, @aryansri05, @ashler-herrick, @azimafroozeh, @borchero, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @debnathshoham, @dependabot[bot], @dpinol, @dsprenkels, @dydev012, @erandagan, @etiennebacher, @farouk-01, @florianvazelle, @gab23r, @gautamvarmadatla, @henryharbeck, @hutch3232, @ilya-pevzner, @itamarst, @jberg5, @joaquinhuigomez, @johalnes, @jonasdedden, @jonathanchang31, @jonathansergio, @jorenham, @junnythemarksman, @kanenorman, @kdn36, @leudz, @lukas-reining, @lun3x, @moktamd, @mqqz, @mroeschke, @mzjp2, @nameexhaustion, @nicholaslegrand102, @ohmdelta, @orlp, @pablogsal, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tmimmanuel, @tolleybot, @toreerdmann, @toroleapinc, @uurl, @veeceey, @waamm, @wence-, @wmoss, @xenzh, @xronocode, @yangsong97, @yonatan-genai, @yuuuxt and dependabot[bot]

Found an issue? Give us feedback