DataFrame Workflow¶
Get a Polars DataFrame directly:
from filoma import probe_to_df
dfw = probe_to_df('.') # filoma.DataFrame wrapper
print(dfw.head())
Wrap existing analysis:
from filoma import probe
analysis = probe('.')
# Prefer using probe_to_df() to get a DataFrame in one step. If you already
# have an analysis object, request a DataFrame explicitly when probing and
# access the raw Polars DataFrame via `wrapper.df` when needed.
analysis = probe('.', build_dataframe=True)
wrapper = analysis.to_df()
# access raw polars via wrapper.df
Enrichment helpers (chainable):
from filoma import probe_to_df
dfw = probe_to_df('.', enrich=True) # depth, path parts, file stats
Manual enrichment:
from filoma import probe_to_df
dfw = probe_to_df('.', enrich=False)
from filoma.dataframe import DataFrame
wrapper = DataFrame(dfw.df)
wrapper = wrapper.add_depth_col().add_path_components().add_file_stats_cols()
Filtering & grouping:
wrapper.filter_by_extension('.py')
wrapper.extension_counts()
wrapper.directory_counts()
Export:
wrapper.save_csv('files.csv')
wrapper.save_parquet('files.parquet')
Convert to pandas:
pandas_df = probe_to_df('.', to_pandas=True)
Tips:
- Use
.add_file_stats_cols()sparingly on huge trees (it touches filesystem for each path). Pandas conversions and caching
filoma is Polars-first internally. For pandas interop use the following:
df.pandas— always returns a fresh pandas.DataFrame conversion.df.pandas_cachedordf.to_pandas(force=False)— returns a cached pandas conversion (converted once). Use for repeated reads.df.to_pandas(force=True)— force reconversion and update the cache.df.invalidate_pandas_cache()— clear the cached pandas object.
The wrapper automatically invalidates the cached pandas conversion on
assignments (df[...] = ...) and when delegated Polars methods appear to
mutate in-place (Polars commonly returns None or the same DataFrame
object). For complex external mutations call invalidate_pandas_cache() or
to_pandas(force=True) to ensure freshness.
- Combine with Polars expressions for advanced analysis.