Reading .xlsx files with xlrd fails


You are have xlrd installed on your cluster and are attempting to read files in the Excel .xlsx format when you get an error.

XLRDError: Excel xlsx file; not supported


xlrd 2.0.0 and above can only read .xls files.

Support for .xlsx files was removed from xlrd due to a potential security vulnerability.


Use openpyxl to open .xlsx files instead of xlrd.

  1. Install the openpyxl library on your cluster.

  2. Confirm that you are using pandas version 1.0.1 or above.

    import pandas as pd
  3. Specify openpyxl when reading .xlsx files with pandas.

    import pandas
    df = pandas.read_excel(`<name-of-file>.xlsx`, engine=`openpyxl`)

Refer to the openpyxl documentation for more information.