Taking Column-Slices of DataFrames in Pandas


In this tutorial, we will learn how to take column-slices of DataFrames in Pandas, a powerful Python library for data manipulation and analysis. We will use pandas, numpy, and work with dataframe, slice, slice columns, df slice, and column range.

Loading Data

First, let's assume you have loaded some machine learning data from a CSV file using Pandas. The first two columns are observations, and the remaining columns are features. Here's an example:

import pandas as pd
import numpy as np

data = pd.read_csv('mydata.csv')

For the purpose of this tutorial, we'll create a DataFrame with random data:

data = pd.DataFrame(np.random.rand(10, 5), columns=list('abcde'))

Now, you want to slice this DataFrame into two DataFrames: one containing columns 'a' and 'b' (observations), and another containing columns 'c', 'd', and 'e' (features).

Slicing DataFrames with .loc

In Pandas, we can use the .loc indexer to select both rows and columns based on their labels. The labels being the values of the index or the columns. Slicing with .loc includes the last element. Here's how to do it:

import pandas as pd
data = pd.DataFrame(np.random.rand(10, 5), columns=list('abcde'))

observations = data.loc[:, 'a':'b']
features = data.loc[:, 'c':]

This will create two new DataFrames: observations containing columns 'a' and 'b', and features containing columns 'c', 'd', and 'e'.

More Examples with .loc

The .loc indexer accepts the same slice notation that Python lists use for both rows and columns. The slice notation is start:stop:step. Here are some examples using our data DataFrame:

import pandas as pd
data = pd.DataFrame(np.random.rand(10, 5), columns=list('abcde'))

# Slice from 'a' to 'c' by every 2nd column
data.loc[:, 'a':'c':2]

# Slice from the beginning to 'b'
data.loc[:, :'b']

# Slice from 'c' to the end by 3
data.loc[:, 'c'::3]

# Slice from 'c' to the end by 2 with the slice function
data.loc[:, slice('c', None, 2)]

# Select specific columns with a list (e.g., columns 'a', 'b', and 'd')
data.loc[:, ['a', 'b', 'd']]

You can also slice by rows and columns. For instance, if you have 5 rows with labels 'v', 'w', 'x', 'y', 'z':

import pandas as pd

data = {'a': [1, 2, 3, 4, 5],
        'b': [6, 7, 8, 9, 10],
        'c': [11, 12, 13, 14, 15],
        'd': [16, 17, 18, 19, 20],
        'e': [21, 22, 23, 24, 25]}

df = pd.DataFrame(data, index=['v', 'w', 'x', 'y', 'z'])

# Slice from 'w' to 'y' and 'a' to 'c' by 2
data.loc['w':'y', 'a':'c':2]

Conclusion

In this tutorial, we learned how to take column-slices of DataFrames in Pandas using the .loc indexer. Now you can easily slice your DataFrame into smaller DataFrames based on column ranges. Remember, Pandas is a powerful tool for data manipulation and analysis, and understanding how to work with DataFrames effectively is essential for any data scientist or analyst working with Python.