In this article, we cover how to install Apache Spark in Ubuntu 24.04 release. Apache Spark is mainly used for large-scale data processing. At the time of writing, the latest stable release is: 3.5.0
As the package isn’t available through the standard Ubuntu repository. Therefore, we need to download the package first from its official website. Apart from that, the following operations require Administrative rights. So, if you have the necessary rights to install the packages then contact your System Administrator for assistance.
Download Apache Spark
We can download the .tgz package file: spark-3.5.0-bin-hadoop3.tgz from its official website.
On the webpage, Choose a Spark release: which in our case was 3.5.0, and the package type: Pre-built for Apache Hadoop 3.3 and later. Download the version that fits your requirements. We went ahead with this configuration.
Then, it would show us the link in Step 3 use that to download the package file.
Install Java in Ubuntu 24.04
Open a terminal to install JRE:
sudo apt update sudo apt install default-jre
Install Apache Spark in Ubuntu 24.04
We need to extract the downloaded package first, and use the tar command-line utility:
tar -xvf spark-3.5.0-bin-hadoop3.tgz
It creates spark-3.5.0-bin-hadoop3/ in the current directory. Move it to the /opt/spark directory.
sudo mv spark-3.5.0-bin-hadoop3/ /opt/spark
Configure Environment variables:
nano ~/.bashrc
We have used the Nano text editor to edit .bashrc. Append it with the following:
export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin
Reload the .bashrc file in the terminal itself through the command:
source ~/.bashrc
Start Spark Master Server
This can be done by running the following command in the terminal:
start-master.sh
Spark’s master web interface will be available at:
http://localhost:8080/
On this webpage, note down the URL. It would be something like – spark://ubunx:7077
Now, to start a worker and connect it to the master:
start-worker.sh spark://ubunx:7077
The master Spark URL would be different for you. So, make changes accordingly.
Reload the page to see the connected worker.
To start Spark shell:
spark-shell
And, to exit the Spark shell – use – :q and Enter.
In conclusion, we have covered how to install Apache Spark in Ubuntu 24.04 release here.