Pandas axis parameter: what is it?


If you're just starting out with Python and data science, you may have come across the Pandas library and the axis parameter. The axis parameter is used in many Pandas functions, such as mean(), sum(), and concatenate(), to specify whether to apply the function across rows or columns. In this blog post, we'll dive deeper into what the axis parameter means and how to use it.

Introduction to the axis parameter

Firstly, let's define what "axis" means. In Pandas, an axis is either a row or a column of a DataFrame. By default, when you apply a function like mean() or sum() to a DataFrame, it applies the function to each column. So if you have a DataFrame with 3 rows and 2 columns, applying mean() with the default axis parameter would give you the mean of each of the 2 columns.

However, you can also apply these functions across rows by specifying axis=1. For example, if you have a DataFrame with 3 rows and 2 columns, applying mean(axis=1) would give you the mean of each of the 3 rows. This is important to keep in mind when working with data that has been organized by rows instead of columns.

To help visualize this concept, think of a DataFrame like a spreadsheet. The columns are like the vertical columns in a spreadsheet, while the rows are like the horizontal rows. When you specify axis=0, you're telling Pandas to apply the function across each row (i.e. horizontally), while axis=1 means to apply the function across each column (i.e. vertically).

It's also worth noting that you can specify axis as either "rows" or "columns" instead of 0 or 1. For example, you can use mean(axis="rows") or mean(axis="columns") instead of mean(axis=0) or mean(axis=1). This can make your code more readable and understandable, especially when working with larger and more complex DataFrames. Example

Here's an example code snippet that demonstrates how to use the axis parameter in pandas:

import pandas as pd
import numpy as np

# create a DataFrame
data = {'Name': ['John', 'Paul', 'George', 'Ringo'],
        'Guitar': ['Rickenbacker', 'Hofner', 'Gretsch', 'Ludwig'],
        'Age': [23, 25, 22, 28],
        'Net Worth': [1000000, 2000000, 500000, 3000000]}
df = pd.DataFrame(data)

# calculate the mean of each numeric column
mean_by_col = df.mean(axis=0, numeric_only = True)
print(mean_by_col)

# calculate the mean of each row
mean_by_row = df.mean(axis=1, numeric_only = True)
print(mean_by_row)

# drop the 'Net Worth' column
df = df.drop('Net Worth', axis=1)
print(df)

# drop the second and third row
df = df.drop([1, 2], axis=0)
print(df)

In this example, we first create a DataFrame with information about a fictional band, The Beatles. We then calculate the mean of each numeric column (Age and Net Worth) using df.mean(axis=0, numeric_only = True), which returns a Series with the mean of each column.

Age             24.5
Net Worth    1625000.0
dtype: float64

Next, we calculate the mean of each row using df.mean(axis=1, numeric_only = True), which returns a Series with the mean of each row.

0     500011.5
1    1000012.5
2     250011.0
3    1500014.0
dtype: float64

We then drop the 'Net Worth' column from the DataFrame using df.drop('Net Worth', axis=1), which drops the specified column (axis=1 means columns).

     Name        Guitar  Age
0    John  Rickenbacker   23
1    Paul        Hofner   25
2  George       Gretsch   22
3   Ringo        Ludwig   28

Finally, we drop the second and third row from the DataFrame using df.drop([1, 2], axis=0), which drops the specified rows (axis=0 means rows).

    Name        Guitar  Age
0   John  Rickenbacker   23
3  Ringo        Ludwig   28

These examples illustrate some common uses of the axis parameter in pandas, but there are many other operations where the axis parameter can be used to specify the direction of the operation along the DataFrame or Series.

Conclusion

In summary, the Pandas axis parameter specifies whether to apply a function across rows or columns of a DataFrame. The default axis is 0, which applies the function to each column, while axis=1 applies the function to each row. You can also specify axis as "rows" or "columns" instead of 0 or 1. Understanding how to use the axis parameter is crucial for working with DataFrames and analyzing data in Pandas.