Application Layer Protocols

Application layer protocols provide interface methods for applications to allow them to communicate over computer networks. The application layer is described in a similar way by both, TCP/IP and OSI network models. Their specifications differ only in details.

Hypertext Transfer Protocol

HTTP is the most important application layer protocol of all that are used in the Internet. It allows to request data from servers (websites), and send data to servers.

HTTP is as old as the Internet itself. The first version of the protocol (HTTP/0.9) allowed only to request HTML (HyperText Markup Language) pages. Two new HTTP versions, HTTP/1.0 and HTTP/1.1, were published in 1996 and 1997 respectively. The latest documentation of HTTP/1.1 consists of six specifications: RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, and RFC 7235. The most popular version of the protocol is the latest one, HTTP/1.1. Some older or simpler browsers use HTTP/1.0. The earliest version, HTTP/0.9, is used very rarely (however it is still supported by many applications).

Another version of the protocol (HTTP/2) was officially published as RFC 7540 in May 2015.

HTTP Overview

HTTP is as a request-response protocol between a client and a server. Every HTTP session is initiated by creating a TCP connection to the server (usually to port 80) and then sending a request with a desired URL address. The server returns the requested file (or other content) and, usually, terminates the connection.

In HTTP/1.1 just one connection is used to download a page file and all other files needed to display its context (images, stylesheets, scripts). There is no need, as in the previous versions, to establish a connection for every file separately. However, the HTTP protocol itself is connectionless - after sending all requested data, the server ends the connection between it and the client.

What is more, HTTP it stateless. This means that neither the client nor the server stores information about each other between different requests. Finally, HTTP is media independent. Any data can be transmitted as long as both the client and the server know how to handle it. The data type has to be specified using an appropriate MIME-type.

HTTP Protocol Design

The most common ports used for the protocol are port 80 and sometimes port 8080. An HTTP server listens on the port for a client's request message. After receiving the request, the server sends back an acknowledgement message and then a response message. The response usually contains the requested file, though the server may also return an error or other information.

Messages in HTTP/0.9

In HTTP/0.9 the requests sent to the server were quite simple. They contained only the GET keyword followed by the URL path (with a query string), and ending with a newline character. A simple HTTP/0.9 request may be similar to the one below:

GET http://www.example.com/directory

The server should immediately return the requested document (ASCII text with HTML data or plain text prefixed with PLAINTEXT). The server doesn't hold any state related to the browser.

Messages in HTTP/1.0 and HTTP/1.1

Both modern protocol versions, HTTP/1.0 and HTTP/1.1, allow a user to specify additional data in a request. First of all, there are more methods available:

  • GET - has the same functionality as requests in earlier HTTP/0.9, it is used for retrieve data from the server using a given path,
  • HEAD - is similar to GET but the response will contain only the Status-Line (a protocol version, a Status-Code, a Reason-Phrase) and other server-related data in the Response-header,
  • POST - is used to send data to the server or file upload using HTML forms,
  • PUT - replaces the resource pointed to by a given path with the uploaded content,
  • DELETE - removes the resource pointed to by a given path,
  • CONNECT - establishes a tunnel to the server identified by a given path; this method is used mostly with SSL/TLS, it allows to configure a proxy server to forward client's requests to URL specified as a parameter,
  • TRACE - is a method used for testing and debugging; it performs a test along a request message's track, and provides logs to the user,
  • OPTIONS (HTTP/1.1 and newer) - requests information about the capabilities of a server without requesting a resource.

Some of these methods are considered to be safe, they can be used only for retrieving information from a server (HEAD, OPTIONS, GET, TRACE). On the other hand there are a few methods which can cause side effects on the server or even on external sites (DELETE, PUT, POST). In practice handling of GET requests is not limited in any way, thus this particular method should be also considered as dangerous.

The first line of a request message contains the request method name, together with the URL path, and protocol version information. It may look similar to that one:

POST /first_directory/second_directory/doit.php HTTP/1.1

The first line can be followed by some more lines called headers. Each header contains one name-value pair (the name and the value are separated by a colon). There are a few common request headers included in most client's requests:

  • Host - contains the host's URL,
  • User-Agent - stores information about the user's browser,
  • Content-Type or Content-Length - provides information about an optional payload that user attached to the message,
  • Accept - information about supported MIME document types (for example audio/mpeg or image/jpeg),
  • Accept-Language - information about supported languages,
  • Accept-Encoding - encoding supported in response messages,
  • Referer (note, it's misspelled) - provides information about the originating page.

The headers are terminated by a single empty line followed by an optional payload (which may contain for example form data). The rest of the previous request message might be defined in this way:

Host: www.server.com
User-Agent: Mozilla/5.0
Referer: www.server.com/first_directory/select_action.html
Content-Length: 12
Content-Type: text/plain

Do first job

The server responds by sending its own message. The message starts with the Status-Line (that contains, as mentioned above, the protocol version, a numerical Status-Code, and a human-readable Reason-Phrase).

The first line can be followed by some response headers, for example:

  • Server - a server name,
  • Via - a list of proxies used for sending the response,
  • Connection - control options for the current connection (keep-alive if the server waits for more queries or close if the connection is closed after completing the last query),
  • Content-Encoding - the type of compression used on the data,
  • Content-Type, Content-Length, Content-Language - for content-related information.

The response headers are followed by a single empty line which may be followed by an optional content (the message body). The response message may be similar to this one:

HTTP/1.1 200 OK
Server: Apache/2.4.1 (Unix)
Content-Length: 19
Content-Type: text/plain
Connection: close

First job completed

The message body would contain the html code of the requested webpage. Below there is an example of a message from a server that returns the website hello_world.html:

HTTP/1.1 200 OK
Server: Apache/2.4.1 (Unix)
Date: Sat, 27 Dec 2014 17:58:58 GMT
Last-Modified: Sat, 26 Jul 2014 13:19:25 GMT
Content-Type: text/html
Content-Length: 87
Connection: close

<html>
<body>
<h1>Hello World!</h1>
</body>
</html>

Security of HTTP

The biggest weakness of HTTP is the fact that the communication between the client and the server is not encrypted. All messages can be easily eavesdropped on their way from one computer to another. HTTPS is the most popular method of establishing secure HTTP connections.

It seems to be worth mentioning that an intruder can also use HTTP request messages with the enabled method TRACE, to gather information related to the attacked system. Sometimes it may be recommended to disable the HTTP TRACE method using mod_rewrite.

Security issues caused by different implementations

There are a few problems related to HTTP security caused by the fact that different server and client applications provide different type of support for older versions of the protocol.

For example, the specification of HTTP/1.0 describes that HTTP/1.0 clients must be able to handle valid responses in both HTTP/0.9 and HTTP/1.0 protocol formats. Although in some other documents one can find different approaches, most browsers support HTTP/0.9 and thus they are vulnerable for attacks performed as shown below:

  1. The intruder prepares the GET request message to be sent to the server, to one of its unsupported ports. The requested URL is fake and contains a <html> tag. The first line of the message could be similar to this one: GET /<html><body>something HTTP/1.0, the next line would contain the attacked server's path and port values. The parameters of the lower layers of the message suggest that it was sent by another, victim's, computer.
  2. Because the server doesn't expect HTTP messages on the selected port, it will return an error message. The chosen port is not related to HTTP in any way, so the response will not be understandable for the sender (client's browser). However, the message will probably contain the original request message, quoted somewhere with the explanation that the request message was incorrect. Therefore the browser can interpret the message in only one way that is possible for it - as HTTP/0.9 response message (which consists only of an HTML file). The client's browser will display the website which contents depends on what data was located after the <html> tag in the fake request message created by the attacker. The whole TCP/IP message received by the client (that is, by the attacked system) will look like a genuine message from the server (indeed, it would be sent by the server itself).

Other kinds of problems are related to the fact that HTTP/1.1 allows to encode a newline character as a single CR character. At present, most web servers (for example Apache or IIS) do not fulfil this requirement and do not treat a single CR as a newline character. However, most browsers do recognize it as a newline (Internet Explorer, Safari, Opera). Therefore the messages containing CR characters may be interpreted and handled in different ways by various browsers and servers.

The similar issue is caused by the concept of multiline headers which was introduced in HTTP/1.1. The idea is that if a header line begins with a whitespace, it should be treated as a continuation of the previous header line. Again, most web servers (included Apache or IIS) handle multiline headers while most browsers (including Internet Explorer, Safari, Opera) don't. Again, the same message with multiline headers can be treated in different ways.

The HTTP specification doesn't specify how web servers and client applications should treat duplicate or ambiguous headers. As one could expect, different software handle them in different ways (accept first, accept last, discard all).

Security of HTTP, like security of the whole Internet, suffers from the way in which web structure and ideas were developed - a lot of vendors and producers, incompatibility, features often developed bottom-top, and lack of concern about security and privacy at the beginning of Web.

Hypertext Transfer Protocol Secure

HTTPS allows to browse the Internet using the HTTP protocol with additional encryption. The first version of HTTPS was created in 1994 by Netscape Communications.

All the HTTP content is encrypted by the SSL or TLS algorithms, which operate also on the Application Layer but below the HTTP itself. The attacker who overhear the communication is able to see only the SSL or TLS frames with some encrypted data. The underlying SSL/TLS algorithm typically uses long-term public and private keys to generate a symmetric key for each communication session.

Encryption

As it was mentioned before, the whole HTTPS algorithm consists in fact of two algorithms, HTTP and SSL/TLS. It operates over a Transport Layer protocol, usually TCP. The TCP connection is not encrypted, which means that the IP addresses and port numbers are visible to others. Thus, the attacker may know the domain name but won't be able to determine the full URL path.

The strength of the encryption depends strictly on the underlying SSL/TLS protocol.

It is recommended to use algorithms that provide perfect forward secrecy. This means that if an attacker posses one of the long-term asymmetric secret keys (used to create the HTTPS session), he (or she) won't be able to derive the short-time session key, and to decrypt the actual messages. Not all asymmetric algorithms provide such security. Two known algorithms that do provide it are Diffie-Hellman and Elliptic curve Diffie-Hellman key exchange algorithms.

Authentication

What is more, HTTPS provides website authentication. This protects against man-in-the-middle attacks. Web servers can be authenticated by proving certificates to web browsers. The web browsers contain the certificates of major authorities (such as Symantec, Comodo, GoDaddy, etc.) already pre-installed and are able to determine the true authenticity of the other side.

When a web browser fails to authenticate the web server it tries to connect to (for example due to an invalid certificate), it will usually display a proper warning to the user. Also, encrypted connections are usually presented with some kind of a green lock icon located somewhere near to the URL bar.

HTTP vs. HTTPS

The HTTPS URL is the same as the HTTP one, aside from its scheme token (https:// vs. http://). As mentioned above, most web browsers indicate usage of HTTPS by showing an icon of a green padlock.

HTTPS uses a different port number than the ordinary HTTP protocol: 443 by default, as opposed to port 80 used by HTTP.

To make sure that communication using HTTPS is secure, it is necessary that all content of a web-page (that means not only text but also images, scripts, etc.) is loaded over HTTPS. To avoid surveillance and various types of attacks, strictly no data should be provided over the HTTP protocol.

Secure Socket Layer and Transport Layer Security

Both Transport Layer Security (TLS) and Secure Sockets Layer (SSL) refer to the same set of Application Layer protocols. They are used for protecting data exchanged by other Application Layer protocols.

SSL was originally developed in a company called Netscape. There are three versions of SSL protocol, invented in 1994, 1995 and 1996 years respectively. The first version of TLS was presented in 1999, as an improvement of the existing SSL 3.0 protocol. After 1999, two other TLS versions have been officially released: TLS 1.1 in 2006 and TLS 1.2 in 2009. The third version, TLS 1.3, is currently being prepared (as for 2016) and is due to be released soon. Generally, all the newer TLS and SSL versions were introducing new more reliable cryptographic algorithms, whereas the older and insecure versions were being removed. The protocol name changed from SSL to TLS to avoid potential legal issues from Netscape.

Nowadays, public and private keys using by TLS/SSL asymmetric algorithms contain thousands of bits. There exit a few popular implementations of TLS/SSL protocol for major programming languages and operating systems. OpenSSL is perhaps the most popular one.

TLS/SSL protocols operate on the Application Layer under other Application Layer protocols and they are supposed to protect the messages exchanged by the later ones. Currently, TLS/SSL protocols are used to secure all major web functionalities:

TLS/SSL usually cooperates with the reliable TCP protocol operating on the Transport Layer. However, there exist also implementations that work with other Transport Layer protocols, including the unreliable ones like UDP.

There are three main functionalities provided by TLS/SSL:

Each functionality is described below in more details.

TLS/SSL protocol may be split into two sub-protocols, used in different phases of communication:

  • Handshake Protocol which is used to negotiate all the connection parameters and establish a secure session.
  • Record Protocol which protects all the messages exchanged later by the underlying Application Layer protocol (thus, the messages with actually important data, like web contents or email messages).

TLS messages are called records. Each record contains several control fields which describe the protocol version, the message type, the message length, etc. The control fields are followed by the actual data, and then by the (optional) message MAC and the (also optional) padding bytes.

Encryption

Before establishing the connection, both sides negotiate the encryption parameters during so called TLS handshake protocol. They must agree which encryption algorithm will be used and create proper cryptographic keys. The encryption used later for securing all messages is symmetric and usually the negotiated symmetric key is valid only for the time of one session.

The process of establishing the shared secret key is secure and the eavesdropper cannot obtain it even if he intercepted all the messages exchanged between the client and the server. What is more, the handshake protocol guarantees that the negotiated secret key was intact during transmission by the intruder, that is, that the communication is reliable.

The whole process of establishing the secure connection is protected against man-in-the-middle attacks.

Authentication

Both sides may authenticate themselves before creating the session. The authentication is performed by using the digital certificates signed by trusted third parties and asymmetric encryption with public and private keys.

The authentication step is optional and one or both sides may not require it. Usually, for convenience reasons, only the server authenticate itself.

The client may authenticate the other side by using the other side's public key (available from the certificate received from trusted Certificate Authorities) to decrypt some information encrypted earlier by the other side by using the corresponding private key. If the information can by properly decrypted, then the client should assume that the other side can be trusted.

Message Integrity

The whole communication protected by TLS/SSL is reliable and the protocol itself checks the integrity of all received messages.

The integrity checks are based on message authentication codes attached to all messages. They are supposed to secure the messages against damages and alteration.

Similarly to other TLS/SSL functionalities, message integrity may also be provided by various different cryptographic algorithms, depending on the client and server capabilities.

Handshake Protocol

The handshake procedure begins just after the sides agreed to use TLS. The client and the server choose all the parameters of the secure connection they are going to create.

  1. First, the client sends a list of supported ciphers and hash functions.
  2. Then, the server selects the ones that it supports as well, and notifies the client of the decision.
  3. Usually (and also optionally), the server identifies itself by presenting a valid digital certificate, which contains several information like the name of the server and its public key. The public key is used by the client to check the server validity.
  4. The client may use the server public key to encrypt a random number and send it to the server, thus establishing the secret key which only the server will be able to decrypt.
  5. Alternatively, the even better approach is to use a more secure asymmetric algorithm to establish a stronger symmetric key. There exist two asymmetric key exchange algorithms, Diffie-Hellman and Elliptic curve Diffie-Hellman, which provide an additional level of security by having the property of perfect forward secrecy. It means that secret symmetric keys established for each session will remain secure even if the long-term public and private keys used during the handshake protocol are compromised.

If any of the steps described above fails (on either side), the connection is cancelled. The second phase of communication, the record protocol will not be started.

Due to the fact that session negotiating by using an asymmetric encryption algorithm is a rather expensive procedure, then instead of creating a new symmetric key, either side may try to resume the previously used session. If the other side accepts that, they will use the secret keys created for the previous session.

Security of TLS/SSL

The secure TLS/SSL connection may be configured to use various underlying symmetric and asymmetric encryption algorithms. The strength of the protection depends strongly on the selected cipher and its implementation.

The two first SSL protocol versions are generally considered to be unsafe, whilst the third SSL version is comparable to TLS 1.1. As opposite to that, the newer TLS versions are much more refined and provide much better security. Although there exist several attacks targeting various TLS algorithm implementations, it is considered to be a strong and efficient tool for providing security during communicating over computer networks.

It is recommended to create secret keys by algorithms which provide perfect forward secrecy. That guarantees that private keys compromising (that belong for example to trusted Certificate Authorities) will not compromise the privacy of all communications protected by the derived private keys. Certificate Authority organisations were recently targeted by many attacks which led to disclosure of many long-term private keys and compromised many digital certificates.

Internet Relay Chat

IRC is an application layer protocol which allows to exchange text messages between users.

The protocol was created in 1988 by a Finnish software engineer, Jarkko Oikarinen. It was designed mainly for group communication via various discussion forums called channels but the protocol allows also to send and receive private messages or data.

IRC Overview

IRC works in client/server model. At first, every user has to install a client application. Using the client application, it is possible to send text messages to the IRC server, which transfers messages to other clients. The servers are connected together and form larger groups, so they can exchange messages between themselves.

There are several IRC services that provide some additional functionalities, like bots (sending messages generated by computer programs to channels) or bouncers (daemon processes that provide IRC communication to offline users or to computers without any IRC client installed).

The image below presents an example of the IRC network:

IRC Network Model

IRC Protocol Design

Usually IRC runs over the TCP protocol. The official TCP port assigned to IRC is 194, however to avoid having to run the server application with root privileges, the most common port to run IRC is 6667/TCP and a few other ports nearby (6660-6669 and 7000).

IRC specification is covered by several documents, RFC 1459 and a couple of later ones: RFC 2811, RFC 2812, and RFC 2813. However, most client and server applications don't follow the design strictly.

IRC was used originally only for sending text messages. Each character was encoded using 8 bits, without specifying the type of encoding. This could cause problems when conversing users were using different encoding. At present, UTF-8 is the most popular encoding used in IRC messages and it is supported by most IRC applications.

IRC users communicate with server and other users by sending simple text commands. Every command specifies who is the recipient (a server, a channel or another user) and additional parameters like the text of the message.

Security of IRC

The original design of IRC is insecure. Most servers don't require users to register an account and usually people can choose nicknames just before connecting to the channels.

Every process of changing the network structures is usually problematic and it may cause various issues (for example, because of several users having the same nicknames not necessarily with the same privileges). Also, it is assumed that servers trust one another during exchanging messages. A server that behaves incorrectly can cause problems to the whole network.

In the early 2000s some IRC networks were often attacked using DDoS and other more sophisticated attacks. This caused many users migrated to different IRC networks or abandoned that way of communication completely.

The limitations of the protocol are well known, and therefore improvements are often introduced in modern implementations. A lot of IRC servers have already started to support secure SSL/TLS connections.

IRC Today

IRC was the most popular in 2003. It is estimated that it was using by over one million people on hundreds of thousands of channels. Nowadays, the number of users have decreased to less than half a million in 2014. The reasons why people use IRC applications have also changed.

At the beginning, the IRC networks were used for social networking, however now websites like Facebook or Twitter took over these functions. People used to use IRC networks to broadcast unofficial or illegal news and information. At present, there are much better ways to do it (like TOR). IRC channels were used to exchange information about piracy software and warez. Nowadays, bad guys prefer to look for such information in other places, like P2P.

Due to commercialization of the Internet, a lot of companies have decided to invest money in their own products and to create their own ways of communication instead of using publicly available IRC. On the other hand, there are several IRC-based commercial or open source projects that are widely used by development teams and various firms and organizations for internal and external communication.

IRC is a very old protocol and it has been using for many years. The way of using the protocol has changed over that time. One may predict that IRC technology will be still used in various applications and services, at least over the next several years.