Install Apache Spark in Ubuntu 24.04

In this article, we cover how to install Apache Spark in Ubuntu 24.04 release. Apache Spark is mainly used for large-scale data processing. At the time of writing, the latest stable release is: 3.5.0

As the package isn’t available through the standard Ubuntu repository. Therefore, we need to download the package first from its official website. Apart from that, the following operations require Administrative rights. So, if you have the necessary rights to install the packages then contact your System Administrator for assistance.

Download Apache Spark

We can download the .tgz package file: spark-3.5.0-bin-hadoop3.tgz from its official website.

Download Apache Spark

On the webpage, Choose a Spark release: which in our case was 3.5.0, and the package type: Pre-built for Apache Hadoop 3.3 and later. Download the version that fits your requirements. We went ahead with this configuration.

Then, it would show us the link in Step 3 use that to download the package file.

Install Java in Ubuntu 24.04

Open a terminal to install JRE:

sudo apt update 
sudo apt install default-jre

Install Apache Spark in Ubuntu 24.04

We need to extract the downloaded package first, and use the tar command-line utility:

tar -xvf spark-3.5.0-bin-hadoop3.tgz

It creates spark-3.5.0-bin-hadoop3/ in the current directory. Move it to the /opt/spark directory.

sudo mv spark-3.5.0-bin-hadoop3/ /opt/spark

Configure Environment variables:

nano ~/.bashrc

We have used the Nano text editor to edit .bashrc. Append it with the following:

export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin

Reload the .bashrc file in the terminal itself through the command:

source ~/.bashrc

Start Spark Master Server

This can be done by running the following command in the terminal:

start-master.sh

Spark’s master web interface will be available at:

http://localhost:8080/

On this webpage, note down the URL. It would be something like – spark://ubunx:7077

Now, to start a worker and connect it to the master:

start-worker.sh spark://ubunx:7077

The master Spark URL would be different for you. So, make changes accordingly.

Reload the page to see the connected worker.

To start Spark shell:

spark-shell

And, to exit the Spark shell – use – :q and Enter.

In conclusion, we have covered how to install Apache Spark in Ubuntu 24.04 release here.

Similar Posts