Last Updated on July 22, 2020
Data compression is the process of storing data in a format that uses less space than the original representation would use. Compressing data can be very useful particularly in the field of communications as it enables devices to transmit or store data in fewer bits. Besides reducing transmission bandwidth, compression increases the amount of information that can be stored on a hard disk drive or other storage device.
There are 2 main types of compression. Lossy compression is a data encoding method which reduces a file by discarding certain information. When the file is uncompressed, not all of the original information will be recovered. Lossy compression is typically used to compress video, audio and images, as well as internet telephony. The fact that information is lost during compression will often be unnoticeable to most users. Lossy compression techniques are used in all DVDs, Blu-ray discs, and most multimedia available on the internet.
Images take up massive amounts of internet bandwidth because they often have large file sizes. They are the most popular resource type on the web. According to the HTTP Archive, 60% of the data transferred to fetch a web page is images composed of JPEGs, PNGs and GIFs. 45% of the images seen on sites crawled by HTTP Archive are JPEGs.
JPEG is an image file format that’s been around since the early 1990s, and it uses lossy compression. Its best suited for photographs or images with a number of color regions. Ever since its introduction, we’ve seen improvements to its compression algorithms seeking to reduce file sizes, improve quality, and speed up encoding time.
Is your image collection consuming an inordinate amount of space? You’ve probably already identified and removed duplicate images from your collection. Whether you keep your collection stored locally and/or in the cloud, you’ll probably need to take further steps to prune the size of your photographs. While it’s relatively cheap to store your files on the cloud, especially when you use an infrequent access storage class, your monthly outlay can start to mount up when storing many thousands of photographs. Any compression software which allows you to minimize your outgoings is definitely worthy of investigation.
Step forward open source tools that are dedicated to JPEG compression. In this article, we’ll put three 3 popular and very different open source tools through their paces. There’s other alternatives available — it’s Linux after all.
We’ll first turn our attention to Guetzli, an open source licensed JPEG encoder developed by Google. We’ll then examine MozJPEG developed by Mozilla. MozJPEG is based on a separate project called libjpeg-turbo. Finally, we cover a very different tool called Lepton, developed by Dropbox. The tools have very different objectives. But they share one thing in common; they have saved petabytes in storage space and bandwidth.
I don’t offer compiling/installing instructions, as you just need to follow the respective project’s instructions. And the tools are available in popular Linux distribution repositories if you don’t fancy compiling.
Let’s begin with Guetzli.
Next page: Guetzli – Compression charts
Pages in this article:
Page 1 – Introduction
Page 2 – Guetzli – Compression charts
Page 3 – Guetzli – Time chart
Page 4 – MozJPEG – Compression charts
Page 5 – MozJPEG – Time chart
Page 6 – Lepton – Compression charts
Page 7 – Lepton – Time chart
Page 8 – Summary
According to the tests I made some years ago to be able to store efficiently invoices and other enterprise documents scans into a document management software without artifacts, even on very small letters, tests that I also extended to web sites pictures processing, here is what I found :
using GIMP as the graphic manipulation program with a :
* progressive .jpg encoding for slow connections enabled,
* 4:4:4 Subsampling method,
* Floating-point for the DCT method,
* only exif data kept (no thumbnail or other orientation/dimensions system)
*** and the most important feature : image colors indexed into a 256 colors optimized palette,
the threshold not to cross to be absolutely sure there will be no artifacts in the final picture, nor color problems (except, of course, with gradations, that do not fit in this process) is : 65 %.
My 2¢