Using the Databricks REST API to interact with your clusters programmatically can be a great way to streamline workflows with scripts.
The API can be called with various tools, including PowerShell. In this article, we are going to take a look at an example DBFS put command using curl and then show you how to execute that same command using PowerShell.
The DBFS API 2.0 put command (AWS | Azure) limits the amount of data that can be passed using the contents parameter to 1 MB if the data is passed as a string. The same command can pass 2 GB if the data is passed as a file. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload.
Curl example
This example uses curl to send a simple multipart form post request to the API to upload a file up to 2 GB in size.
Replace all of the values in <> with appropriate values for your environment.
# Parameters databricks_workspace_url="<databricks-workspace-url>" personal_access_token="<personal-access-token>" local_file_path="<local_file_path>" # ex: /Users/foo/Desktop/file_to_upload.png dbfs_file_path="<dbfs_file_path>" # ex: /tmp/file_to_upload.png overwrite_file="<true|false>" curl --location --request POST https://${databricks_workspace_url}/api/2.0/dbfs/put \ --header "Authorization: Bearer ${personal_access_token}" \ --form contents=@${local_file_path} \ --form path=${dbfs_file_path} \ --form overwrite=${overwrite_file}
PowerShell example
This PowerShell example is longer than the curl example, but it sends the same multipart form post request to the API.
The below script can be used in any environment where PowerShell is supported.
To run the PowerShell script you must:
- Replace all of the values in <> with appropriate values for your environment. Review the DBFS API 2.0 put documentation for more information.
- Save the script as a .ps1 file. For example, you could call it upload_large_file_to_dbfs.ps1.
- Execute the script in PowerShell by running ./upload_large_file_to_dbfs.ps1 at the prompt.
################################################## Parameters $DBX_HOST = "<databricks-workspace-url>" $DBX_TOKEN = "<personal-access-token>" $FILE_TO_UPLOAD = "<local_file_path>" # ex: /Users/foo/Desktop/file_to_upload.png $DBFS_PATH = "<dbfs_file_path>" # ex: /tmp/file_to_upload.png $OVERWRITE_FILE = "<true|false>" ################################################## # Configure authentication $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]" $headers.Add("Authorization", "Bearer " + $DBX_TOKEN) $multipartContent = [System.Net.Http.MultipartFormDataContent]::new() # Local file path $FileStream = [System.IO.FileStream]::new($FILE_TO_UPLOAD, [System.IO.FileMode]::Open) $fileHeader = [System.Net.Http.Headers.ContentDispositionHeaderValue]::new("form-data") $fileHeader.Name = $(Split-Path $FILE_TO_UPLOAD -leaf) $fileHeader.FileName = $(Split-Path $FILE_TO_UPLOAD -leaf) $fileContent = [System.Net.Http.StreamContent]::new($FileStream) $fileContent.Headers.ContentDisposition = $fileHeader $fileContent.Headers.ContentType = [System.Net.Http.Headers.MediaTypeHeaderValue]::Parse("text/plain") $multipartContent.Add($fileContent) # DBFS path $stringHeader = [System.Net.Http.Headers.ContentDispositionHeaderValue]::new("form-data") $stringHeader.Name = "path" $stringContent = [System.Net.Http.StringContent]::new($DBFS_PATH) $stringContent.Headers.ContentDisposition = $stringHeader $multipartContent.Add($stringContent) # File overwrite config $stringHeader = [System.Net.Http.Headers.ContentDispositionHeaderValue]::new("form-data") $stringHeader.Name = "overwrite" $stringContent = [System.Net.Http.StringContent]::new($OVERWRITE_FILE) $stringContent.Headers.ContentDisposition = $stringHeader $multipartContent.Add($stringContent) # Call Databricks DBFS REST API $body = $multipartContent $uri = 'https://' + $DBX_HOST + '/api/2.0/dbfs/put' $response = Invoke-RestMethod $uri -Method 'POST' -Headers $headers -Body $body $response | ConvertTo-Json