Problem: Cluster Cancels Python Command Execution after Installing Bokeh

Problem

The cluster returns Cancelled in a Python notebook. Inspect the driver log (std.err) in the Cluster Configuration page for a stack trace and error message similar to the following:

log4j:WARN No appenders could be found for logger (com.databricks.conf.trusted.ProjectConf$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See https://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Traceback (most recent call last):
  File "/local_disk0/tmp/1551693540856-0/PythonShell.py", line 30, in <module>
    from IPython.nbconvert.filters.ansi import ansi2html
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/__init__.py", line 6, in <module>
    from . import postprocessors
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/__init__.py", line 6, in <module>
    from .serve import ServePostProcessor
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/serve.py", line 29, in <module>
    class ProxyHandler(web.RequestHandler):
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/serve.py", line 31, in ProxyHandler
    @web.asynchronous
AttributeError: module 'tornado.web' has no attribute 'asynchronous'

Cause

When you install the bokeh library, by default tornado version 6.0a1 is installed, which is an alpha release. The alpha release causes this error, so the solution is to revert to the stable version of tornado.

Solution

Follow the steps below to create a cluster-scoped init script. The init script removes the newer version of tornado and installs the stable version.

  1. If the init script does not already exist, create a base directory to store it:

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. Create the following script:

    dbutils.fs.put("dbfs:/databricks/<directory>/tornado.sh","""
    #!/bin/bash
    pip uninstall --yes tornado
    rm -rf /home/ubuntu/databricks/python/lib/python3.5/site-packages/tornado*
    rm -rf /databricks/python/lib/python3.5/site-packages/tornado*
    /usr/bin/yes | /home/ubuntu/databricks/python/bin/pip install tornado==5.1.1
    """,True)
    
  3. Confirm that the script exists:

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/tornado.sh"))
    
  4. Go to the cluster configuration page and click the Advanced Options toggle.

  5. At the bottom of the page, click the Init Scripts tab:

    no-alternative-text
  6. In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. Restart the cluster.

For more information, see: