Pandas axis parameter: what is it?
If you're just starting out with Python and data science, you may have come across the Pandas library and the axis parameter. The axis parameter is used in many Pandas functions, such as mean()
, sum()
, and concatenate()
, to specify whether to apply the function across rows or columns. In this blog post, we'll dive deeper into what the axis parameter means and how to use it.
Introduction to the axis parameter
Firstly, let's define what "axis" means. In Pandas, an axis is either a row or a column of a DataFrame. By default, when you apply a function like mean()
or sum()
to a DataFrame, it applies the function to each column. So if you have a DataFrame with 3 rows and 2 columns, applying mean() with the default axis parameter would give you the mean of each of the 2 columns.
However, you can also apply these functions across rows by specifying axis=1
. For example, if you have a DataFrame with 3 rows and 2 columns, applying mean(axis=1)
would give you the mean of each of the 3 rows. This is important to keep in mind when working with data that has been organized by rows instead of columns.
To help visualize this concept, think of a DataFrame like a spreadsheet. The columns are like the vertical columns in a spreadsheet, while the rows are like the horizontal rows. When you specify axis=0
, you're telling Pandas to apply the function across each row (i.e. horizontally), while axis=1
means to apply the function across each column (i.e. vertically).
It's also worth noting that you can specify axis as either "rows" or "columns" instead of 0 or 1. For example, you can use mean(axis="rows")
or mean(axis="columns")
instead of mean(axis=0)
or mean(axis=1)
. This can make your code more readable and understandable, especially when working with larger and more complex DataFrames. Example
Here's an example code snippet that demonstrates how to use the axis parameter in pandas:
import pandas as pd
import numpy as np
# create a DataFrame
data = {'Name': ['John', 'Paul', 'George', 'Ringo'],
'Guitar': ['Rickenbacker', 'Hofner', 'Gretsch', 'Ludwig'],
'Age': [23, 25, 22, 28],
'Net Worth': [1000000, 2000000, 500000, 3000000]}
df = pd.DataFrame(data)
# calculate the mean of each numeric column
mean_by_col = df.mean(axis=0, numeric_only = True)
print(mean_by_col)
# calculate the mean of each row
mean_by_row = df.mean(axis=1, numeric_only = True)
print(mean_by_row)
# drop the 'Net Worth' column
df = df.drop('Net Worth', axis=1)
print(df)
# drop the second and third row
df = df.drop([1, 2], axis=0)
print(df)
In this example, we first create a DataFrame with information about a fictional band, The Beatles. We then calculate the mean of each numeric column (Age and Net Worth) using df.mean(axis=0, numeric_only = True)
, which returns a Series with the mean of each column.
Age 24.5
Net Worth 1625000.0
dtype: float64
Next, we calculate the mean of each row using df.mean(axis=1, numeric_only = True)
, which returns a Series with the mean of each row.
0 500011.5
1 1000012.5
2 250011.0
3 1500014.0
dtype: float64
We then drop the 'Net Worth' column from the DataFrame using df.drop('Net Worth', axis=1)
, which drops the specified column (axis=1
means columns).
Name Guitar Age
0 John Rickenbacker 23
1 Paul Hofner 25
2 George Gretsch 22
3 Ringo Ludwig 28
Finally, we drop the second and third row from the DataFrame using df.drop([1, 2], axis=0)
, which drops the specified rows (axis=0
means rows).
Name Guitar Age
0 John Rickenbacker 23
3 Ringo Ludwig 28
These examples illustrate some common uses of the axis
parameter in pandas, but there are many other operations where the axis
parameter can be used to specify the direction of the operation along the DataFrame or Series.
Conclusion
In summary, the Pandas axis
parameter specifies whether to apply a function across rows or columns of a DataFrame. The default axis is 0, which applies the function to each column, while axis=1
applies the function to each row. You can also specify axis as "rows" or "columns" instead of 0 or 1. Understanding how to use the axis parameter is crucial for working with DataFrames and analyzing data in Pandas.