In Airflow, you can create connection to S3 in order to, for instance, store logs in S3 bucket.
To do so, you have to go to airflow interface, go to "Admin" menu, "Connections" submenu, and then click on the blue +
sign.
However, once you’ve created your connection, there is no easy way to check that it is working. Here is a small process to test a newly created S3 connection, if you have ssh access to the server where airflow is deployed:
-
Connect to machine where you have deployed airflow:
ssh your_login@your_airflow_server
-
Create a
test.py
file with the following content, replaceyour_connection_id
with the connection id you’ve just created andyour_s3_bucket
with the name of the bucket you want to connect to:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook remote_conn_id = 'your_connection_id' remote_location = 'your_s3_bucket' hook = S3Hook(remote_conn_id, transfer_config_args={'use_threads': False}) print(hook.list_keys(remote_location)[0:10])
-
Execute this
test.py
script:
python3 test.py
If your connection is working, you should get the list of first 10 files in your S3 bucket:
[2021-10-11 15:33:12,934] {base_aws.py:368} INFO - Airflow Connection: aws_conn_id=your_connection_id [2021-10-11 15:33:12,972] {base_aws.py:179} INFO - No credentials retrieved from Connection [2021-10-11 15:33:12,973] {base_aws.py:82} INFO - Retrieving region_name from Connection.extra_config['region_name'] [2021-10-11 15:33:12,973] {base_aws.py:84} INFO - Creating session with aws_access_key_id=None region_name=eu-central-1 [2021-10-11 15:33:12,980] {base_aws.py:157} INFO - role_arn is None ['directory1/', 'directory1/file1.txt', 'directory1/file2.txt', 'directory1/file3.txt', 'directory1/file4.txt', 'directory1/file5.txt', 'directory1/file6.txt, 'directory1/file7.txt', 'directory1/file8.txt', 'directory1/file9.txt'']
If you can’t connect to your S3 bucket, you will have a python stacktrace. For instance if the bucket you try to connect does not exist:
[2021-10-11 15:39:45,558] {base_aws.py:368} INFO - Airflow Connection: aws_conn_id=your_connection_id [2021-10-11 15:39:45,588] {base_aws.py:179} INFO - No credentials retrieved from Connection [2021-10-11 15:39:45,588] {base_aws.py:82} INFO - Retrieving region_name from Connection.extra_config['region_name'] [2021-10-11 15:39:45,588] {base_aws.py:84} INFO - Creating session with aws_access_key_id=None region_name=eu-central-1 [2021-10-11 15:39:45,596] {base_aws.py:157} INFO - role_arn is None Traceback (most recent call last): File "test.py", line 7, in <module> print(hook.list_keys(remote_location)[0:10]) File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 62, in wrapper return func(*bound_args.args, **bound_args.kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 302, in list_keys for page in response: File "/home/ubuntu/.local/lib/python3.8/site-packages/botocore∕paginate.py", line 255, in __iter__ response = self._make_request(current_kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/botocore∕paginate.py", line 332, in _make_request_ return self._method(**current_kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/botocore∕client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/botocore∕client.py", line 676, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist