Reading .xlsx files with xlrd fails

Problem

You are have xlrd installed on your cluster and are attempting to read files in the Excel .xlsx format when you get an error.

XLRDError: Excel xlsx file; not supported

Cause

xlrd 2.0.0 and above can only read .xls files.

Support for .xlsx files was removed from xlrd due to a potential security vulnerability.

Solution

Use openpyxl to open .xlsx files instead of xlrd.

  1. Install the openpyxl library on your cluster.

  2. Confirm that you are using pandas version 1.0.1 or above.

    import pandas as pd
    print(pd.__version__)
    
  3. Specify openpyxl when reading .xlsx files with pandas.

    import pandas
    df = pandas.read_excel(`<name-of-file>.xlsx`, engine=`openpyxl`)
    

Refer to the openpyxl documentation for more information.