iloc() function: Learn to extract rows and columns

What is the iloc() function in Python?

Pandas Library in Python offers various ways for data manipulation and preprocessing. One of the efficient tools is the DataFrame which is used to arrange the datasets in tabular form. A DataFrame has three main components- Data, Index, and Columns, among which Index and Columns are used for accessing the data contained in DataFrame.

It is a well-known fact that DataFrame contains huge datasets. So, it is not mandatory to always retrieve all the rows and columns. In some Data Operations, only specific rows or columns are needed. That is where loc() and iloc() functions come into the picture. loc() and iloc() function in Python are used to access a group of rows and columns by their labels.

Is iloc() different from loc() function?

Both loc() and iloc() in DataFrame are used to retrieve the rows and columns. But, their implementation is different from each other. Let’s first understand the main difference between loc() and iloc(). loc() method is label-based and can be used with a Boolean array. Whereas the Pandas iloc() function in python is a pure integer-based indexing method to select a particular cell of a DataFrame.

It takes arguments in the form of a single label or a list of the label. For slicing too, labels are used. For example, we have created a DataFrame with five columns and ten rows and inserted random numbers using the NumPy Random method.

#Creating a DataFrame using numpy random method to insert random values
import panda as pd
import numpy as np
df = pd.DataFrame(data=np.random.randn(10, 5), index= list(‘ABCDEFGHIJ’), columns= (['a', 'b', 'c', 'd', 'e'])
df 

Output

      a         b         c         d         e
A -0.450453 -0.748244 -0.767587  1.358411  0.947180
1.664178 -1.866064 -1.563198  0.389666 -1.368218
C -0.114530  0.832652  1.395796  0.389280 -1.334556
D -1.250644  0.056873  1.144531  1.063658 -1.051090
1.393717  0.557422 -1.560048  0.756659  0.013160
0.360017  1.243927  1.684466 -0.761787  0.571070
G -0.989028  0.636875  1.554759 -0.043323 -0.421262
H -0.257132 -0.509199  1.406591 -0.614890  0.725658
I -1.116145  0.173992 -0.700085  0.712860 -1.584754
0.266349  1.155019  0.823797 -0.128998  0.260364

write your code here: Coding Playground

Now, a particular column or row can be accessed in the following way:

# accessing a row
df.loc['A']

Output

a   -0.450453
b   -0.748244
c   -0.767587
d    1.358411
e    0.947180
Name: A, dtype: float64

If you want to access a particular column, they use the following code. Here, we have fetched column label ‘a.’

# accessing column
print(df.iloc[:,1])

A -0.450453
1.664178
C -0.114530
D -1.250644
1.393717
0.360017
G -0.989028
H -0.257132
I -1.116145
0.266349

You can also retrieve particular rows and columns together using the slicing concept.

#Accessing particular cell using slicing

df.loc['A':'F','a':'c']

Output

      a         b         c
A -0.450453 -0.748244 -0.767587
1.664178 -1.866064 -1.563198
C -0.114530  0.832652  1.395796
D -1.250644  0.056873  1.144531
1.393717  0.557422 -1.560048
0.360017  1.243927  1.684466

You might have noticed that in loc() function, we have always used the labels of the rows and columns of DataFrame. But, the iloc() function in Python takes the pure integer-based index as an argument irrespective of the labels. DataFrame.iloc() method is used in the following way:

# Accessing first row
df.iloc[1]

Output

a    1.664178
b   -1.866064
c   -1.563198
d    0.389666
e   -1.368218
Name: B, dtype: float64

# Accessing all values of column a
df.iloc[0:,0]

Output

A   -0.450453
B    1.664178
C   -0.114530
D   -1.250644
E    1.393717
F    0.360017
G   -0.989028
H   -0.257132
I   -1.116145
J    0.266349
Name: a, dtype: float64

write your code here: Coding Playground

How to use the iloc() function in Pandas?

There are many ways to use DataFrame.iloc() function. The indexing in the iloc() function is from 0 to (length of axis-1) and raises IndexError if the index goes out of bounds. It can be invoked in the following ways:

  • Using a Scaler Integer
  • Using a List of Integer
  • With a Slice Object
  • Using Boolean Array

Let’s see the implementation of the above ways.

With scaler Integer, the iloc() method returns the rows with the given index. For example, to access the

Fourth Row, the statement would be df.iloc[3].

df.iloc[3]

Output

a   -1.250644
b    0.056873
c    1.144531
d    1.063658
e   -1.051090
Name: D, dtype: float64

But, if you want to access a single element, you can pass the row index with the column index also.

df.iloc[3,4]

Output

-1.051089741618703

Now, if a list of integers is passed, then a DataFrame is returned.

df.iloc[[0,1,2]]

Output

      a         b         c         d         e
A -0.450453 -0.748244 -0.767587  1.358411  0.947180
1.664178 -1.866064 -1.563198  0.389666 -1.368218
C -0.114530  0.832652  1.395796  0.389280 -1.334556

With a slice object also, a DataFrame is returned.

df.iloc[:3]

Output

      a         b         c         d         e
A -0.450453 -0.748244 -0.767587  1.358411  0.947180
1.664178 -1.866064 -1.563198  0.389666 -1.368218
C -0.114530  0.832652  1.395796  0.389280 -1.334556

You can also pass a Boolean array into the function iloc in DataFrame which will return all the rows with True values. The Boolean array should be of length equal to the total number of rows in the DataFrame. Otherwise, an error will be raised.

df.iloc[[True, True, True, True, True, False, False, True, True, False]]

Output

      a         b         c         d         e
A -0.450453 -0.748244 -0.767587  1.358411  0.947180
1.664178 -1.866064 -1.563198  0.389666 -1.368218
C -0.114530  0.832652  1.395796  0.389280 -1.334556
D -1.250644  0.056873  1.144531  1.063658 -1.051090
1.393717  0.557422 -1.560048  0.756659  0.013160

These were different ways to invoke the iloc() method while working with the DataFrame objects. One more method is to use the iloc() using a callable function with one argument. For example, we can use the iloc() function in Python with the lambda function in the following way.

df.iloc[lambda x: [0,2]]

Output

      a         b         c         d         e
A -0.450453 -0.748244 -0.767587  1.358411  0.947180
C -0.114530  0.832652  1.395796  0.389280 -1.334556

write your code here: Coding Playground

In the above code, the x is a DataFrame passed to the lambda function to be sliced. This selects the first and third rows. In this way, the DataFrame.iloc() method can be utilized to manipulate the datasets in the DataFrame.