
import polars as plimport sklearnAlternative datasets include the California housing dataset and the Ames housing dataset. You can load the datasets as follows::
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()for the California housing dataset and::
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)for the Ames housing dataset.
ames_data = sklearn.datasets.fetch_openml("house_prices", as_frame=True)ames_data.keys()dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])cal_data = sklearn.datasets.fetch_california_housing()cal_data.keys()dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])df = pl.from_numpy(cal_data["data"], schema=cal_data["feature_names"])df = df.with_columns(
pl.Series(cal_data["target"]).alias(cal_data["target_names"][0]),
)dfLoading...
Data selection¶
Columns¶
df.select()Loading...
df.with_columns()Loading...
Rows¶
df.slice(2000)Loading...
df.filter()Loading...
Combination¶
Manipulating data¶
Addendum¶
Data- or LazyFrame? Lazy operations?¶
Using square brackets¶
It is recommended to use expressions to select and slice data.
However, you can use square brackets to select rows and columns,
df["MedInc"]Loading...
df[["MedInc", "Population"]]Loading...
df[0:2, ["MedInc", "Population"]]Loading...
df[pl.col("Population") > 350]---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/miniforge3/envs/datascience/lib/python3.13/site-packages/polars/_utils/getitem.py:167, in get_df_item_by_key(df, key)
166 try:
--> 167 return _select_rows(df, key) # type: ignore[arg-type]
168 except TypeError:
File ~/miniforge3/envs/datascience/lib/python3.13/site-packages/polars/_utils/getitem.py:328, in _select_rows(df, key)
327 msg = f"cannot select rows using key of type {qualified_type_name(key)!r}: {key!r}"
--> 328 raise TypeError(msg)
TypeError: cannot select rows using key of type 'Expr': <Expr ['[(col("Population")) > (dyn in…'] at 0x706C9B088C50>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
Cell In[17], line 1
----> 1 df[pl.col("Population") > 350]
File ~/miniforge3/envs/datascience/lib/python3.13/site-packages/polars/dataframe/frame.py:1395, in DataFrame.__getitem__(self, key)
1258 def __getitem__(
1259 self,
1260 key: (
(...) 1269 ),
1270 ) -> DataFrame | Series | Any:
1271 """
1272 Get part of the DataFrame as a new DataFrame, Series, or scalar.
1273
(...) 1393 └─────┴─────┴─────┘
1394 """
-> 1395 return get_df_item_by_key(self, key)
File ~/miniforge3/envs/datascience/lib/python3.13/site-packages/polars/_utils/getitem.py:169, in get_df_item_by_key(df, key)
167 return _select_rows(df, key) # type: ignore[arg-type]
168 except TypeError:
--> 169 return _select_columns(df, key)
File ~/miniforge3/envs/datascience/lib/python3.13/site-packages/polars/_utils/getitem.py:260, in _select_columns(df, key)
255 raise TypeError(msg)
257 msg = (
258 f"cannot select columns using key of type {qualified_type_name(key)!r}: {key!r}"
259 )
--> 260 raise TypeError(msg)
TypeError: cannot select columns using key of type 'Expr': <Expr ['[(col("Population")) > (dyn in…'] at 0x706C9B088C50>Resources¶
- https://
docs .pola .rs /api /python /stable /reference /dataframe /api /polars .DataFrame.getitem.html#polars.DataFrame.getitem - https://
docs .pola .rs /user -guide /migration /pandas / #selecting -data - https://
typethepipe .com /vizs -and -tips /python -polars -selectors -select -multiple -columns/
Images¶
- Photo by Igor Miske