Installing Apache Airflow on Windows | Easy & Fast Approach | DAGs
Guys on my way to become a Data Engineer, I and my classmates were installing Apache Airflow and following some thrown off documentation which has unnecessary steps.
So after that day, I decided to document the steps we followed during our session to install Apache Airflow on our Windows machines. Before starting the installation process.
Let me tell you, what is Apache Airflow?
Apache Airflow is designed by Airbnb to create, schedule, and monitor the workflows (ETLs) easily.
Let’s dive into the installation, What you would be needing:
- A computer or laptop (of course!) 😉
- Windows 11 or any updated Windows
- Ubuntu (Installed from store)
- Python (3.9)
- Apache Airflow (current version)
STEP 1: TURNING THE WINDOWS FEATURE ON FOR LINUX SUBSYSTEM
a. Simply click on Start button
b. Write down “Turn windows feature on or off”
c. Windows Features window will appear, search for “Windows Subsystem for Linux”
d. Check mark it, and a prompt will come asking you to restart your computer.
e. After the restart, head to the store and search “Ubuntu”
f. Get it installed.
STEP 2: DOWNLOAD & INSTALL C++ BUILD TOOLS
To get the Apache Airflow work you need to install C++ Build tools, that you can download from here.
Once your download is complete then let it halt in your system and move towards the next step..
STEP 3: SELECT YOUR USERNAME & PASSWORD FOR UBUNTU
Now Search for “Ubuntu” and open it.
Once it is opening, it will complete the initial process by asking the username and password. I had configured it while installing the Apache before.
You can see my username is setup as “Spidey” and you guys can personalize it.
STEP 4: INSTALLING PIP INSIDE UBUNTU
Here are some commands you would be copy pasting in Ubuntu but do you guys know how to paste??
sudo apt-get install software-properties-commonsudo apt-add-repository universesudo apt-get updatesudo apt-get install python-setuptoolssudo apt install python-pipsudo -H pip install --upgrade pip
Do run these commands one-by-one.
A tip for pasting, “Right-click” when you copied the command and wants to paste inside Ubuntu. (Thank me later 😉)
You can also verify your installation of PIP by using pip -V
STEP 5: INSTALLING DEPENDENCIES FOR APACHE AIRFLOW
We saved your time by getting things done before, just copy and paste the below mentioned commands to get your dependencies installed in Ubuntu.
sudo apt-get install libmysqlclient-dev sudo apt-get install libssl-dev sudo apt-get install libkrb5-dev sudo apt-get install libsasl2-dev
STEP 6: INSTALLING APACHE AIRFLOW
We once again saved your time by getting you a command, which you already know what to do with. 😉
sudo SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow
After installation, we move forward to make some of the changes required for smooth outcome.
Changing the path to your given username which would let it halt at the given location. Change <username> to your given in the command mentioned below:
export PATH=$PATH:/home/<username>/.local/bin
Like, my username is “Spidey” then,
export PATH=$PATH:/home/Spidey/.local/bin
Yayyyy! 😎 You just installed the Apache Airflow.
Now, open another instance of Ubuntu to run Airflow commands.
STEP 7: APACHE AIRFLOW COMMANDS W/ SETUP
The first time users will be needing to go through all the given steps below with commands:
a. Command to initialize the database
airflow db init
Once done, All the necessary files will be created inside your directory. We would be making some changes to Airflow’s setup.
b. Commands to open config file
cd airflowlssudo nano airflow.cfg
Now make the following changes:
dags_folder = /mnt/c/dagsbase_log_folder = /mnt/c/dags/logs
Note: The dags and log folder paths above map Airflow to your Windows C: drive. You will need to create two folders. One on your C: drive at C:\dags and a folder inside that folder at C:\dags\logs.
You can change the location and specify the folders of your choice. I used the above directory as it is easy to locate and access. You will also avoid any potential permissions issues in this directory.
Now run,
airflow db init
If you receive any error mentioning pyscope2 pkg then run the following commands:
sudo apt-get update -ysudo apt-get install -y libpq-devpip install psycopg2
Now run it again,
airflow db init
Hurrah! You did it, now startup the webserver and scheduler:
Open new instance and run the first command and let it run. Open a new terminal window and run the second prompt.
airflow webserver -p 8080airflow scheduler
Afterwards, open your browser and type:
localhost:8080
When you hit enter the following page would come up:
You can have a detailed guide on how to run Airflow from here. This is my first guide after completing the journey, stay tuned. Lots of stuff in the pipeline would come on YouTube.
If you need any of the help in it then comment and let me know!