Problem
You are have xlrd installed on your cluster and are attempting to read files in the Excel .xlsx format when you get an error.
XLRDError: Excel xlsx file; not supported
Cause
xlrd 2.0.0 and above can only read .xls files.
Support for .xlsx files was removed from xlrd due to a potential security vulnerability.
Solution
Use openpyxl to open .xlsx files instead of xlrd.
- Install the openpyxl library on your cluster (AWS | Azure | GCP).
- Confirm that you are using pandasversion 1.0.1 or above.
%python import pandas as pd print(pd.__version__)
- Specify openpyxl when reading .xlsx files with pandas.
%python import pandas df = pandas.read_excel(`<name-of-file>.xlsx`, engine=`openpyxl`)
Refer to the openpyxl documentation for more information.