RC4
 Description
 Algorithm
 Block diagrams
 Mathematical functions
 Implementation
RC4 is one of the most popular ciphers. It is widely used in popular protocols, for example to protect Internet traffic  TLS (Transport Layer Security) or to protect wireless networks  WEP (Wired Equivalent Privacy).
RC4 is a stream symmetric cipher. It operates by creating long keystream sequences and adding them to data bytes.
RC4 encrypts data by adding it XOR byte by byte, one after the other, to keystream bytes. The whole RC4 algorithm is based on creating keystream bytes. The keystream is received from a 1d table called the T table.
Creating the Table
The T table is 256byte long, and is created based on the secret key. It is created as a first step of both encryption and decryption. The following operations must be performed in order to create the table:
 Every cell in the table is filled with a number equal to its position. The positions of the table are numbered from 0 to 255.
 A new temporary helper variable is created and set to 0.
 For each element in the array the two following operations are performed (note, that the values are from 0 to 255):
Encryption and Decryption
During encryption and decryption the keystream bytes are constantly generated. They are added XOR to message bytes. The keystream bytes are produced based on the T table. The following steps are performed:
 Two helper variables p_{1} and p_{2} are created and set to 0.
 The variable p_{1} is increased by 1 and the result is modulo divided by 256.
 The variable p_{2} is increased by the value in the array T at the position determined by the temporary variable p_{1} (T[p_{1}]). Then, the result is divided modulo by 256.
 The value in the array at the position p_{1} is swapped with the value in the array at the position p_{2}.
 The value in the array at the position p_{1} is added to the value in the array at the position p_{2}. Then, the result is modulo divided by 256 and assigned to the new helper variable p_{3}.
 The value in the array at the position p_{3} is a new keystream byte.
 If more keystream bytes are needed, all the steps from the point II onwards should be repeated.
Speed of RC4
The RC4 algorithm is designed especially to be used in software solutions because it only manipulates single bytes. Unlike many other stream ciphers, it doesn't use LFSR registers, which can be implemented optimally in hardware solutions but they are not so fast in applications.
Security of RC4
The cipher was created quite long time ago and it has some weaknesses which have been improved in modern stream ciphers. It is possible to find keystream byte values that are slightly more likely to occur than other combinations. In fact, over the last 20 years, several bytes like that have been found. Some attacks based on this weakness were discovered.
Probably the most important weakness of RC4 cipher is the insufficient key schedule. Because of that issue, it is possible to obtain some information about the secret key based on the first bytes of keystream. It is recommended to simply discard a number of first bytes of the keystream. This improvement is known as RC4dropN, where N is usually a multiple of 256.
RC4 does not take a separate nonce alongside the key for every encryption. Therefore, the cryptosystem must take care of unique values of keystream and specify how to combine the nonce with the original secret key. The best idea would be to hash the nonce and the key together to generate the base for creating the RC4 keystream. Unfortunately, many applications simply concatenate key and nonce, which make them vulnerable to so called related key attacks. This weakness of RC4 was used in Fluhrer, Mantin and Shamir (FMS) attack against WEP, published in 2001.

Keystream Initialisation
Initialisation a T table, used for generation of keystream bytes. K is the secret key, that is an array of length k_len.
for i from 0 to 255
T[i] := i
endfor
x_temp := 0
for i from 0 to 255
x_temp := (x_temp + T[i] + K[i mod k_len]) mod 256
swap(T[i], T[x_temp])
endfor
Keystream Generation
For keystream bytes generation, the loop below is executed as long as new bytes are needed.
p1 := 0
p2 := 0
while GeneratingOutput
p1 := (p1 + 1) mod 256
p2 := (p2 + T[p1]) mod 256
swap(T[p1], T[p2])
send(T[(T[p1] + T[p2]) mod 256])
endwhile