Loading large data

Use datatable library

import datatable as dt
dt.fread("file").to_pandas()

Use DuckDB library

duckdb.sql(
    """
        select
         days_till_primary_close,
         days_till_final_close,
         loans_outstanding_balance,
         utilization,
         primary_close_flag, final_close_flag
           from df
        where primary_close_flag = 1 and final_close_flag = 0 limit 100
    """
).pl().sample(10)

where df is the dataframe and .pl() converts the dataframe into Polars dataframe

Convert float64 to float32

df = df.astype({c: np.float32 for c in df.select_dtypes(include='float64').columns})

Load pickle file

Last updated