Use tcpdump to create pcap files

Analyze network traffic between nodes on a specific cluster by using tcpdump to create pcap files.

Written by parth.sundarka

Last published at: December 23rd, 2024

If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis.

Create the tcpdump init script

  1. Use the workspace file browser to create a new file (AWS | Azure | GCP) in your home directory. Call it
  2. Open the file.
  3. Copy the sample script and paste it into the file.

Sample init script

DB_CLUSTER_ID=$(echo $HOSTNAME | awk -F '-' '{print$1"-"$2"-"$3}')
if [[ ! -d /dbfs/databricks/tcpdump/${DB_CLUSTER_ID} ]] ; then
sudo mkdir -p /dbfs/databricks/tcpdump/${DB_CLUSTER_ID}


mkdir -p ${BASEDIR}

MYIP=$(ip route get 10 | awk '{print $NF;exit}')
echo "initiating tcpdump"
sudo tcpdump -w ${BASEDIR}/trace_%Y_%m_%d_%H_%M_%S_${MYIP}.pcap -W 1000 -G 1800 -C 200 &
echo "initiated tcpdump"

Your init script is located at /Workspace/Users/<user-name>/

Remember the path to the init script. You will need it when configuring your cluster.

Configure the init script

Follow the documentation to configure a cluster-scoped init script (AWS | Azure | GCP).

You will need the path to the init script. Use the same path that you used in the sample script:


If you did not save the init script in your home directory, you will need to provide the full path to its location. For example, /Workspace/<path-to-init-script>/


After configuring the init script, restart the cluster.

Locate pcap files

Once the cluster has started, it automatically starts creating pcap files which contain the recorded network information.

The sample ini script stores the pcap files in the folder dbfs://databricks/tcpdump/<cluster-id>.

Download pcap files

Download the pcap files to your local host for analysis.

There are multiple ways to download files to your local machine. One option is to use the Databricks CLI (AWS | Azure | GCP).