Reading .xlsx files with xlrd fails

xlrd no longer supports .xlsx files. Use openpyxl to read .xlsx files.

Written by prakash.jha

Last published at: May 12th, 2022

Problem

You are have xlrd installed on your cluster and are attempting to read files in the Excel .xlsx format when you get an error.

XLRDError: Excel xlsx file; not supported

Cause

xlrd 2.0.0 and above can only read .xls files.

Support for .xlsx files was removed from xlrd due to a potential security vulnerability.

Solution

Use openpyxl to open .xlsx files instead of xlrd.

  1. Install the openpyxl library on your cluster (AWS | Azure | GCP).
  2. Confirm that you are using pandasversion 1.0.1 or above.
    %python
    
    import pandas as pd
    print(pd.__version__)
  3. Specify openpyxl when reading .xlsx files with pandas.
    %python
    
    import pandas
    df = pandas.read_excel(`<name-of-file>.xlsx`, engine=`openpyxl`)

Refer to the openpyxl documentation for more information.