SSH to the cluster driver node

How to SSH to the Apache Spark cluster driver node in an Azure virtual network

Written by xin.wang

Last published at: December 8th, 2022

This article explains how to use SSH to connect to an Apache Spark driver node for advanced troubleshooting and installing custom software.

Delete

Warning

You can only use SSH if your workspace is deployed in an Azure Virtual Network (VNet) under your control. If your workspace is NOT VNet injected, the SSH option will not appear.

Configure an Azure network security group

The network security group associated with your VNet must allow SSH traffic. The default port for SSH is 2200. If you are using a custom port, you should make note of it before proceeding. You also have to identify a traffic source. This can be a single IP address, or it can be an IP range that represents your entire office.

  1. In the Azure portal, find the network security group. The network security group name can be found in the public subnet.
  2. Edit the inbound security rules to allow connections to the SSH port. In this example, we are using the default port.
Delete

Info

Make sure that your computer and office firewall rules allow you to send TCP traffic on the port you are using for SSH. If the SSH port is blocked at your computer or office firewall, you cannot connect to the Azure VNet via SSH.

Generate SSH key pair

  1. Open a local terminal.
  2. Create an SSH key pair by running this command:
    ssh-keygen -t rsa -b 4096 -C
Delete

Info

You must provide the path to the directory where you want to save the public and private key. The public key is saved with the extension .pub.

Configure a new cluster with your public key

  1. Copy the ENTIRE contents of the public key file.
  2. Open the cluster configuration page.
  3. Click Advanced Options.
  4. Click the SSH tab.
  5. Paste the ENTIRE contents of the public key into the Public key field.
  6. Continue with cluster configuration as normal.

Configure an existing cluster with your public key

If you have an existing cluster and did not provide the public key during cluster creation, you can inject the public key from a notebook.

  1. Open any notebook that is attached to the cluster.
  2. Copy the following code into the notebook, updating it with your public key as noted:
    %scala
    
    val publicKey = "<put your public key here>"
    
    def addAuthorizedPublicKey(key: String): Unit = {
      val fw = new java.io.FileWriter("/home/ubuntu/.ssh/authorized_keys", /* append */ true)
      fw.write("\n" + key)
      fw.close()
    }
    addAuthorizedPublicKey(publicKey)
  3. Run the code block to inject the public key.

SSH into the Spark driver

  1. Open the cluster configuration page.
  2. Click Advanced Options.
  3. Click the SSH tab.
  4. Note the Driver Hostname.
  5. Open a local terminal.
  6. Run the following command, replacing the hostname and private key file path:
    ssh ubuntu@<hostname> -p 2200 -i <private-key-file-path>