Skip to content

DataFrame Workflow

Get a Polars DataFrame directly:

from filoma import probe_to_df
dfw = probe_to_df('.')  # filoma.DataFrame wrapper
print(dfw.head())

Wrap existing analysis:

from filoma import probe
analysis = probe('.')
# Prefer using probe_to_df() to get a DataFrame in one step. If you already
# have an analysis object, request a DataFrame explicitly when probing and
# access the raw Polars DataFrame via `wrapper.df` when needed.
analysis = probe('.', build_dataframe=True)
wrapper = analysis.to_df()
# access raw polars via wrapper.df

Enrichment helpers (chainable):

from filoma import probe_to_df
dfw = probe_to_df('.', enrich=True)  # depth, path parts, file stats

Manual enrichment:

from filoma import probe_to_df
dfw = probe_to_df('.', enrich=False)
from filoma.dataframe import DataFrame
wrapper = DataFrame(dfw.df)
wrapper = wrapper.add_depth_col().add_path_components().add_file_stats_cols()

Filtering & grouping:

wrapper.filter_by_extension('.py')
wrapper.extension_counts()
wrapper.directory_counts()

Export:

wrapper.save_csv('files.csv')
wrapper.save_parquet('files.parquet')

Convert to pandas:

pandas_df = probe_to_df('.', to_pandas=True)

Tips:

  • Use .add_file_stats_cols() sparingly on huge trees (it touches filesystem for each path). Pandas conversions and caching

filoma is Polars-first internally. For pandas interop use the following:

  • df.pandas — always returns a fresh pandas.DataFrame conversion.
  • df.pandas_cached or df.to_pandas(force=False) — returns a cached pandas conversion (converted once). Use for repeated reads.
  • df.to_pandas(force=True) — force reconversion and update the cache.
  • df.invalidate_pandas_cache() — clear the cached pandas object.

The wrapper automatically invalidates the cached pandas conversion on assignments (df[...] = ...) and when delegated Polars methods appear to mutate in-place (Polars commonly returns None or the same DataFrame object). For complex external mutations call invalidate_pandas_cache() or to_pandas(force=True) to ensure freshness.

  • Combine with Polars expressions for advanced analysis.