tag: python

Python script to test Airflow’s S3 connection

on 2021-10-11

Test S3 connection defined in airflow with a small python script that you can execute on your airflow server

#airflow #python

Install Fiona on Windows using pip

on 2021-04-07

Install fiona on windows using pip without gdal-config related hassle

#python #pip #fiona

List all csv files in a directory with databricks in python

on 2021-03-17

A small code snippet to recursively list all csv files in a directory on a databricks notebook in Python.

#databricks #python

Pyspark setup for IntelliJ IDEA

on 2021-01-24

Simple configuration of a new Python IntelliJ IDEA project with working pyspark. I was inspired by "Pyspark on IntelliJ" blog post by Gaurav M Shah, I just removed all the parts about deep learning libraries. I assume that you have a working IntelliJ IDEA IDE with Python plugin installed, and Python 3 installed on your machine. We will create a Python project in IntelliJ IDEA, change its Python SDK to a virtualenv based Python SDK, add Pyspark dependency to this VirtualEnv, install Pyspark in this VirtualEnv and finally test it using a small Pyspark hello world.

#pyspark #spark #python

Read more of Pyspark setup for IntelliJ IDEA

Pyspark gotchas for Scala Spark developers

on 2021-01-22

Apache Spark is developed in Scala. However Python API is more and more popular as Python is becoming the main language of Data Science. Although Python and Scala APIs are very close, there are some differences that can prevent a developer used to one API to smoothly use the other. This article lists those small differences, from the point of view of a Scala Spark developer wanting to use PySpark.

#pyspark #spark #scala #python

Read more of Pyspark gotchas for Scala Spark developers