A small code snippet to recursively list all csv
files in a directory on a databricks notebook in Python. This code
can be used in a databricks python notebook cell. Given a directory path, either s3, dbfs or other, it will
list all files having .csv
extension in this directory and all subdirectories. It returns an array
containing the path of all those files.
def get_csv_files(directory_path):
"""recursively list path of all csv files in path directory """
csv_files = []
files_to_treat = dbutils.fs.ls(directory_path)
while files_to_treat:
path = files_to_treat.pop(0).path
if path.endswith('/'):
files_to_treat += dbutils.fs.ls(path)
elif path.endswith('.csv'):
csv_files.append(path)
return csv_files