The ins and outs of downloading large files

Let’s unravel the online methodologies for downloading files from the internet.

No matter what you’re doing on the internet, you’re pretty much always downloading.

Sometimes these downloads are comparatively tiny, like the text in an email or on a search engine result, but other times file sizes are much larger, such as movies, software or video games.

Let’s break down the different networking systems that go into downloading these larger files. 

Data packets

The first thing to understand is that everything on the internet is broken down into the sending and receiving of data packets.

Whether you’re surfing the web, sending emails, streaming music or video, gaming, or downloading large files, your connected devices are transferring data packets, big or small.

These data packets are sent and received from your connected devices in a common networking language that is understood by the online service you’re trying to access.

For instance, to load a website, your device must first send a request to visit that website.

That request is received by the site, which sends a reply to your device to accept a connection, and the webpage loads.

The same thing happens when you download large files from the internet: a request and a reply happen before the download even commences.

Data is being transferred in the form of requests and approvals before the main download even kicks off. 

Downloading a file

Any file you download is a combination of data packets that must all be transferred in full from the web service to your device for it to be fully functional once the data packets are assembled into their final form on your side.

This is different to how streaming works, which you can learn about here.

If all the packets that make up a file aren’t properly transferred, that file will be corrupted on your device in some way and will, at best, display some oddities and, at worst, be unusable.

To minimise this, the transferred data packets that make up a file have identifying information that allows them to be counted, ensuring everything is downloaded. If something is missed, there will be a gap in the identifying information.

This identifying data is also used to put the data packets together on the device you’re using.

The source of the download usually won’t transmit the next data block until the downloading device confirms that it has the previous one, which also means the device can request the re-transmission of a data block that’s not identical to the one on the source.

For a large file to be downloaded in its entirety, it’s assumed that the downloading device has a constant and stable internet connection, and isn’t impacted by connectivity issues like packet loss. 

Ultimately, a file’s maximum download speed is determined by a combination of several factors including your internet bandwidth, the upload capacity of the place the file is being downloaded from, plus the round-trip time (latency) between your device and the location from which it’s downloading.

For instance, the farther away a downloading device is physically from the location where the file is stored, the more likely it is that the device will not reach the maximum download speed of its internet connection.

Larger web services sometimes bypass this latency issue by mirroring their content on data centres around the globe in what are called content distribution networks.

Client-server downloads

When downloading from a website, or certain other web services, the transfer takes place on what’s called a ‘client-server networking protocol’.

The place you’re downloading from is the server, and the device you’re downloading to is the client.

A common form of the client-server networking protocol is ‘hypertext transfer protocol’, which you may recognise in its more familiar form as ‘HTTP’.

The server’s function is to host (store) and transmit data packets to clients that request access to downloadable files, in order to display web pages.

Another type of the client-server networking protocol is file transfer protocol (FTP).

File transfer protocol may require a user to enter a username and password to connect to the stored content, or it may allow anonymous connection and download.

An FTP can usually be accessed via web browser or specialised FTP software.

It’s different from a standard website (which uses HTTP or the newer ‘hypertext transfer protocol secure’, HTTPS) in that an FTP’s primary purpose is to store and transmit files, while the primary function of HTTP and HTTPS is to transmit webpages.

This means FTPs tend to take the form of folder and file directories, as opposed to a website that’s been designed with user-experience and other considerations in mind.

When engaging with FTP or HTTP client-server networking protocols, the exchange is usually one-way. Beyond the initial request for access, the client (your device) generally doesn’t need to do any more uploading; it’s all downloads. 

Peer-to-peer downloads

The other core networking protocol for downloading files is peer-to-peer (P2P).

Unlike the client-server networking protocol, for P2P connections there’s no central download server.

Instead of using a central server where files are stored, all compatible devices (clients) connected to a P2P network may simultaneously be downloading and sharing (uploading) files with each other.

In essence, when using a P2P networking protocol, users are acting as both client and server, and if one user disappears from the network, the files are still technically available from the other online users.

The more people that are connected to a P2P network and sharing the same files, the better the hypothetical speeds are for everyone downloading those files.

Peer-to-peer software, such as BitTorrent, forces users to upload and download to promote this kind of sharing synergy.

Files are still transferred in parts on P2P, as they are in client-server networking, but these parts can be downloaded out of order and assembled correctly later.

Resuming downloads after pausing or stopping is generally a lot more straightforward on P2P than trying to resume with an HTTP or FTP networking protocol.

Drawbacks of networking protocols

One of the main hurdles of a client-server networking protocol is bandwidth.

Bandwidth refers to the total capacity for transmitting data and, in this instance, would refer primarily to a server’s upload speed.

If that upload speed is maxed out, the speeds for everyone downloading files from that server will be slowed and/or new connections may be blocked to preserve bandwidth.

Files may also be removed from a server at any time, which makes them impossible to download, whereas files on P2P networks often only disappear if people stop sharing them.

Peer-to-peer networking can be limited by the number of users sharing (uploading) a file.

For less-popular or rarer files that aren’t widely shared on P2P, download speeds can be slower.

In some instances, there may be a collection of incomplete data packets that are being shared for a file or files that may never finish downloading if someone with the full collection of data packets doesn’t reconnect to the same P2P network.

When it comes to downloading larger files on the internet, client-server and peer-to-peer networking protocols both have pros and cons, but understanding how they work can help streamline your interaction with the mostly-hidden networking systems. 

Are you set to net? Wi-fi vs Ethernet in the home.