If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis.
Create the tcpdump init script
Run this sample script in a notebook on the cluster to create the init script.
%python
dbutils.fs.put("dbfs:/databricks/<path-to-init-script>/tcp_dump.sh",'''
#!/bin/bash
DB_CLUSTER_ID=$(echo $HOSTNAME | awk -F '-' '{print$1"-"$2"-"$3}')
if [[ ! -d /dbfs/databricks/tcpdump/${DB_CLUSTER_ID} ]] ; then
sudo mkdir -p /dbfs/databricks/tcpdump/${DB_CLUSTER_ID}
fi
BASEDIR="/dbfs/databricks/tcpdump/${DB_CLUSTER_ID}"
mkdir -p ${BASEDIR}
MYIP=$(ip route get 10 | awk '{print $NF;exit}')
echo "initiating tcpdump"
sudo tcpdump -w ${BASEDIR}/trace_%Y_%m_%d_%H_%M_%S_${MYIP}.pcap -W 1000 -G 1800 -C 200 &
echo "initiated tcpdump"''', True))
Remember the path to the init script. You will need it when configuring your cluster.
Configure the init script
Follow the documentation to configure a cluster-scoped init script (AWS | Azure | GCP).
Specify the path to the init script. Use the same path that you used in the sample script (dbfs://databricks/<path-to-init-script>/tcp_dump.sh)
After configuring the init script, restart the cluster.
Locate pcap files
Once the cluster has started, it automatically starts creating pcap files which contain the recorded network information.
The pcap files are located in the folder dbfs://databricks/tcpdump/${<cluster-id>}.
Download pcap files
Download the pcap files to your local host for analysis.
There are multiple ways to download files to your local machine. One option is the Databricks CLI (AWS | Azure).