Snips & Tips
Snips & Tips
  • Snips & Tips
  • 📊Data Science
    • Polars Dataframe Library
    • Loading large data
    • Pandas
      • Pandas Apply Function
    • Apache Spark
      • Custom Transformer
    • Data Visualizations
    • Jupyter Notebooks
      • Jupyter Notebook Structure
    • Probability
    • Statistics
      • Statistical Tests
      • Z - Test
      • Hypothesis Testing
    • SQL
      • SQL Tips
      • Creating new columns
  • ☘️Deep Learning
    • Backpropagation in Deep Learning
    • Pytorch Early Stopping
    • Optimizers
  • Pytorch Tensor Shapes
  • 🔖Machine Learning
    • Handling Imbalanced Dataset
    • Time Series Forecasting
      • Hierarchical Time Series Forecasting
      • Facebook Prophet
      • Misc
    • Handling high dimensionality data
      • Weight of evidence and Information value
    • Debugging ML Models
    • Feature Engineering
      • Time Series
      • Outlier Detection
      • Categorical Encoding
      • Feature Scaling
  • 🐲DSA
    • Arrays
  • 🖥️WEB DEV
    • Typescript
    • React State Management
    • Redux Boilerplate
    • Intercept a HTTP request or response
    • this keyword
    • Array Methods
    • Throttle Debounce
    • Media Queries
    • React Typeahead Search
  • Replace text with React Component
  • 💻Product Analytics
    • Product Sense
    • Customer Segmentation
  • 🖥️Terminal
    • Terminal Commands
    • Jupyter Notebook 2 HTML
  • 🪛Tools and Libraries
    • Web Based
    • Databases
  • 🚟Backend
    • Fast API CRUD
    • Scalable APIs
  • 💸Quant Finance
    • Misc
    • Factor Investing
  • 🎮Game Dev
    • Misc
  • 🛠️Architecture
    • Docker
    • AWS CDK
  • 🦠Artificial Intelligence
    • AI Engg
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Data Science

Loading large data

Use datatable library

import datatable as dt
dt.fread("file").to_pandas()

Use DuckDB library

duckdb.sql(
    """
        select
         days_till_primary_close,
         days_till_final_close,
         loans_outstanding_balance,
         utilization,
         primary_close_flag, final_close_flag
           from df
        where primary_close_flag = 1 and final_close_flag = 0 limit 100
    """
).pl().sample(10)

where df is the dataframe and .pl() converts the dataframe into Polars dataframe

Convert float64 to float32

df = df.astype({c: np.float32 for c in df.select_dtypes(include='float64').columns})

Load pickle file

import pickle
import pandas as pd
import datatable as dt

train_file = '/kaggle/input/jane-street-market-prediction/train.csv'
pickle.dump(dt.fread(train_file).to_pandas(), open('train.csv.pandas.pickle', 'wb'))

# load pickle file

train_pickle_file = '/kaggle/input/pickling/train.csv.pandas.pickle'
train = pickle.load(open(train_pickle_file, 'rb'))
PreviousPolars Dataframe LibraryNextPandas

Last updated 1 year ago

Was this helpful?

📊