Snips & Tips
Snips & Tips
  • Snips & Tips
  • 📊Data Science
    • Polars Dataframe Library
    • Loading large data
    • Pandas
      • Pandas Apply Function
    • Apache Spark
      • Custom Transformer
    • Data Visualizations
    • Jupyter Notebooks
      • Jupyter Notebook Structure
    • Probability
    • Statistics
      • Statistical Tests
      • Z - Test
      • Hypothesis Testing
    • SQL
      • SQL Tips
      • Creating new columns
  • ☘️Deep Learning
    • Backpropagation in Deep Learning
    • Pytorch Early Stopping
    • Optimizers
  • Pytorch Tensor Shapes
  • 🔖Machine Learning
    • Handling Imbalanced Dataset
    • Time Series Forecasting
      • Hierarchical Time Series Forecasting
      • Facebook Prophet
      • Misc
    • Handling high dimensionality data
      • Weight of evidence and Information value
    • Debugging ML Models
    • Feature Engineering
      • Time Series
      • Outlier Detection
      • Categorical Encoding
      • Feature Scaling
  • 🐲DSA
    • Arrays
  • 🖥️WEB DEV
    • Typescript
    • React State Management
    • Redux Boilerplate
    • Intercept a HTTP request or response
    • this keyword
    • Array Methods
    • Throttle Debounce
    • Media Queries
    • React Typeahead Search
  • Replace text with React Component
  • 💻Product Analytics
    • Product Sense
    • Customer Segmentation
  • 🖥️Terminal
    • Terminal Commands
    • Jupyter Notebook 2 HTML
  • 🪛Tools and Libraries
    • Web Based
    • Databases
  • 🚟Backend
    • Fast API CRUD
    • Scalable APIs
  • 💸Quant Finance
    • Misc
    • Factor Investing
  • 🎮Game Dev
    • Misc
  • 🛠️Architecture
    • Docker
    • AWS CDK
  • 🦠Artificial Intelligence
    • AI Engg
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Data Science
  2. Pandas

Pandas Apply Function

Consider the following data frame

total_bill
tip
sex
smoker
day
time
size

16.99

1.01

Female

No

Sun

Dinner

2

10.34

1.66

Male

No

Sun

Dinner

3

21.01

3.50

Male

No

Sun

Dinner

3

23.68

3.31

Male

No

Sun

Dinner

2

24.59

3.61

Female

No

Sun

Dinner

4

When we run the following code, all values of each column will be printed

df.apply(lambda x:print(x), axis=0)
    0      16.99
    1      10.34
    2      21.01
    3      23.68
    4      24.59
          ...
    Name: total_bill, Length: 244, dtype: float64
    0      1.01
    1      1.66
    2      3.50
    3      3.31
    4      3.61
          ...
    Name: tip, Length: 244, dtype: float64
    0      Female
    1        Male
    2        Male
    3        Male
    4      Female
            ...
    Name: sex, Length: 244, dtype: category
    Categories (2, object): ['Male', 'Female']
    0       No
    1       No
    2       No
    3       No
    4       No
          ...
    Name: smoker, Length: 244, dtype: category
    Categories (2, object): ['Yes', 'No']
    0       Sun
    1       Sun
    2       Sun
    3       Sun
    4       Sun
          ...
    Name: day, Length: 244, dtype: category
    Categories (4, object): ['Thur', 'Fri', 'Sat', 'Sun']
    0      Dinner
    1      Dinner
    2      Dinner
    3      Dinner
    4      Dinner
            ...
    Name: time, Length: 244, dtype: category
    Categories (2, object): ['Lunch', 'Dinner']
    0      2
    1      3
    2      3
    3      2
    4      4
          ..
    Name: size, Length: 244, dtype: int64

When we run the following code, all values of each row will be printed

df.apply(lambda x:print(x), axis=1)
total_bill     16.99
tip             1.01
sex           Female
smoker            No
day              Sun
time          Dinner
size               2
Name: 0, dtype: object
total_bill     10.34
tip             1.66
sex             Male
smoker            No
day              Sun
time          Dinner
size               3
Name: 1, dtype: object
...

Following are different ways to use apply function to create new columns in a data frame

def multiply(d):
    # print(d.shape) # (7,)
    d["res"]=d["total_bill"] * d["size"]
    d["res2"]=d["total_bill"] * d["size"]
    return d

def sum(d):
    return d["total_bill"] + d["size"]

df = df.apply(multiply, axis=1)
df["sum"]=df.apply(sum, axis=1)
PreviousPandasNextApache Spark

Last updated 1 year ago

Was this helpful?

📊