Split files in Linux distributions

Split is a command-line utility which is used to split files in linux. The utility is provided with coreutils (GNU Core Utilities) package. The package also contains cat command which will be used later in the article. Through split command, we can split larger files into smaller pieces. The utility also allows us choose the size of smaller pieces.

How to Split larger files

Let’s say you have a file xyz-amd64-DVD-1.iso of size 3.6 GB and you want to split the iso file in smaller pieces of size 500M. To do so, you need to run the following command in terminal,

split --bytes=500M xyz-amd64-DVD-1.iso split-file-

where,
split – command line utility,
–bytes=size – we define the size of smaller pieces here,
xyz-amd64-DVD-1.iso – is the file that will be split and,
split-file- – prefix for smaller pieces, in case you don’t mention any prefix – default prefix ‘x‘ will be used.

The default syntax for the command is,

split --bytes=size <file-to-be-split> <prefix-smaller-pieces>

When we run the above command, our 3.6 GB file would be split to 8 pieces. Seven of those pieces would of 500M in size and last one would be of just 100M size. The name of files so generated would be split-file-aa, split-file-ab, split-file-ac … split-file-ag.

So far, we have split our larger file to smaller pieces. Next, we will discuss how to join all the smaller pieces to get our file back.

Join smaller pieces

Once we have split our larger file, the smaller pieces are of no use till we join them together. To do so, we would have to use cat command –

cat split-file-* > xyz-amd64-DVD-1-combined.iso

This will combine all the smaller pieces i.e. split-file-ab, split-file-ac … split-file-ag to form our original file xyz-amd64-DVD-1-combined.iso.

Verify data integrity using the md5sum

The command line utility md5sum is provided with the coreutils (GNU Core Utilities) package. The utility helps us to verify data integrity using Message-Digest Algorithm 5. MD5 generates 128-bit cryptographic hash, which can later be used to verify file integrity.

We need to check md5sum for both original as well as combined file. To do so, run the following in the terminal –

md5sum xyz-amd64-DVD-1.iso

this would return with the output –

673khkj343klff98324klj324398u454 xyz-amd64-DVD-1.iso

also,

md5sum xyz-amd64-DVD-1-combined.iso

This should also return with the same hash value.

673khkj343klff98324klj324398u454 xyz-amd64-DVD-1-combined.iso

If md5sum hashes for both the files match then our files are intact. Otherwise, the file has been altered and has lost its integrity. To look for more options related to split command, it is advisable to go through man pages available on Linux distributions.

man split

Conclusion, in this article we have discussed how to split a file in smaller pieces and then combine them again. To check integrity of our files we have used md5sum command line utility.