## Salsa20

- Key length = 32 bytes

Salsa20 is a modern and efficient stream symmetric cipher. It was designed in 2005 by Daniel Bernstein, research professor of Computer Science at the University of Illinois at Chicago.

### Usage

Salsa20 is a cipher that was submitted to eSTREAM project, running from 2004 to 2008, which was supposed to promote development of stream ciphers. It is considered to be a well-designed and efficient algorithm. There aren't any known and effective attacks on the family of Salsa20 ciphers.

### Algorithm

Salsa20 is a stream cipher that works on data blocks of size of 64 bytes.

#### Encryption

For each 64-byte data block, the algorithm uses the Salsa20 expansion function. The input to the function is the secret key (which can have either 32 or 16 bytes) and an 8-byte long nonce concatenated with an additional block number, which values change from 0 to 2^{64}-1 (it is also stored on 8 bytes). Every call to the expansion function increases the block number by one.

The core of Salsa20 encryption algorithm is a hash function which receives the 64-byte long input data from the Salsa20 expansion function, mixes it, and eventually returns the 64-byte long output. The Salsa20 hash function works on the received sequence of bytes, which consists of:

- the secret key.
- the nonce with the block number.
- four constant vectors received from the expansion function, which values depend on the size of the secret key.

The hash function operates on data divided into words. Every word contains 4 bytes and can have values from 0 to 2^{32}-1. Therefore, the input data is 16-word long, a key contains 8 or 4 words, and the nonce has 2 words.

The output from the Salsa20 expansion function is added XOR to the 64-byte block of data. The result is a 64-byte block of ciphertext.

#### Decryption

The same algorithm should be used during decryption. The data should be divided into parts of the same size.

The output from the Salsa20 expansion function should be added XOR to the 64-byte block of ciphertext. The result is a 64-byte block of plaintext.

#### Other Salsa20 ciphers

There are also some other ciphers, which are based on the Salsa20 algorithm but differ in details.

- Salsa20/8 and Salsa20/12 - they work exactly as the original Salsa20 algorithm but instead of 10 doublerounds inside the hash function, they perform 4 or respectively 6 doublerounds.
- ChaCha family - published by Bernstein in 2008. They provide better security than the original Salsa20 cipher, by using slightly better hash functions. The hash function input data was rearranged, to allow to implement the algorithm in a more efficient way.

### Block Diagram of Salsa20 Algorithm

### Maths:

The operations for Salsa20 algorithm are presented in the order from the low-level functions, to the more complex functions, which use the functions described above them.

#### Sum of two words

The addition of two 4-byte words is denoted in the algorithm description as a + b. The result is divided modulo by 2^{32} (so, by the maximum value that can be stored in one word).

The sum of two words a and b is equal to a+b mod 2^{32}. The result is a valid 4-byte long word.

#### Exclusive-Or of two words

The Exclusive-Or of two 4-byte words is denoted in the algorithm description as a XOR b.

To perform XOR addition, two words are compared bit by bit, and XOR addition is performed for each pair. The result is also a valid proper 4-byte long word.

#### Binary Left Rotation

The a-bit left rotation of a 4-byte word w is denoted in the algorithm as w <<< a. The leftmost a bits move to the rightmost positions.

A rotation of bits to the left w <<< a of a 4-byte word w may be represented as the multiplication:

2^{a}·w mod(2^{32}-1)

The result is a valid 4-byte long word.

#### Quarterround Function

The Quarterround Function takes 4 words as input and returns another 4-word sequence.

If x is a 4-word input:

x = (x_{0}, x_{1}, x_{2}, x_{3})

then the function can be defined as follow:

quarterround(x) = (y_{0}, y_{1}, y_{2}, y_{3})

where:

y_{1} = x_{1} XOR ((x_{0} + x_{3}) <<< 7)

y_{2} = x_{2} XOR ((y_{1} + x_{0}) <<< 9)

y_{3} = x_{3} XOR ((y_{2} + y_{1}) <<< 13)

y_{0} = x_{0} XOR ((y_{3} + y_{2}) <<< 18)

The Quarterround Function can be performed in place, without the need of allocating any additional memory. First, x_{1} changes to y_{1}, then x_{2} changes to y_{2}, next x_{3} changes to y_{3}, then x_{0} changes to y_{0}. The Quarterround Function is invertible because all the modifications above are invertible.

#### Rowround Function

The Rowround Function takes 16 words as input, transforms them, and returns 16-word sequence.

This function is very similar to the Columnround Function but it operates on the words in a different order.

If x is a 16-word input:

x = (x_{0}, x_{1}, x_{2}, ..., x_{15})

then the function can be defined as follow:

rowround(x) = (y_{0}, y_{1}, y_{2}, ..., y_{15})

where:

(y_{0}, y_{1}, y_{2}, y_{3}) = quarterround(x_{0}, x_{1}, x_{2}, x_{3})

(y_{5}, y_{6}, y_{7}, y_{4}) = quarterround(x_{5}, x_{6}, x_{7}, x_{4})

(y_{10}, y_{11}, y_{8}, y_{9}) = quarterround(x_{10}, x_{11}, x_{8}, x_{9})

(y_{15}, y_{12}, y_{13}, y_{14}) = quarterround(x_{15}, x_{12}, x_{13}, x_{14})

The 16-word input can be presented as a square matrix:

x_{0} | x_{1} | x_{2} | x_{3} |

x_{4} | x_{5} | x_{6} | x_{7} |

x_{8} | x_{9} | x_{10} | x_{11} |

x_{12} | x_{13} | x_{14} | x_{15} |

The rows in the matrix can be changed in parallel. Each of them is modified by the Quarterround Function.

In the first row, the words are changed in the following order:

- x
_{1} - x
_{2} - x
_{3} - x
_{0}

In the second row, the words are changed in the following order:

- x
_{6} - x
_{7} - x
_{4} - x
_{5}

In the third row, the words are modified in the order:

- x
_{11} - x
_{8} - x
_{9} - x
_{10}

Finally, in the last, fourth row, the words are changed in the order:

- x
_{12} - x
_{13} - x
_{14} - x
_{15}

#### Columnround Function

The Columnround Function takes 16 words as input and returns 16-word sequence.

This function is very similar to the Rowround Function but operates on the words in different order.

If x is a 16-word input:

x = (x_{0}, x_{1}, x_{2}, ..., x_{15})

then the function can be defined as follow:

columnround(x) = (y_{0}, y_{1}, y_{2}, ..., y_{15})

where:

(y_{0}, y_{4}, y_{8}, y_{12}) = quarterround(x_{0}, x_{4}, x_{8}, x_{12})

(y_{5}, y_{9}, y_{13}, y_{1}) = quarterround(x_{5}, x_{9}, x_{13}, x_{1})

(y_{10}, y_{14}, y_{2}, y_{6}) = quarterround(x_{10}, x_{14}, x_{2}, x_{6})

(y_{15}, y_{3}, y_{7}, y_{11}) = quarterround(x_{15}, x_{3}, x_{7}, x_{11})

The 16-word input can be presented as a square matrix:

x_{0} | x_{1} | x_{2} | x_{3} |

x_{4} | x_{5} | x_{6} | x_{7} |

x_{8} | x_{9} | x_{10} | x_{11} |

x_{12} | x_{13} | x_{14} | x_{15} |

The columns in the matrix can be changed in parallel. Each of them is modified by the Quarterround Function.

In the first column, the words are changed in the following order:

- x
_{4} - x
_{8} - x
_{12} - x
_{0}

In the second column, the words are modified in the following order:

- x
_{9} - x
_{13} - x
_{1} - x
_{5}

In the third column, the words are changed in the order:

- x
_{14} - x
_{2} - x
_{6} - x
_{10}

In the last fourth column, the words are modified in the order:

- x
_{3} - x
_{7} - x
_{11} - x
_{15}

#### Doubleround Function

The Doubleround Function takes 16 words as input and returns 16-word sequence.

If x is a 16-word input, then the Doubleround Function can be defined as follow:

doubleround(x) = rowround(columnround(x))

#### Littleendian Function

The Littleendian Function changes the order of a 4-byte sequence.

If b is a sequence of four bytes:

b = (b_{0}, b_{1}, b_{2}, b_{3})

then the function is defined as:

littleendian(b) = b_{0} + 2^{8}b_{1} + 2^{16}b_{2} + 2^{24}b_{3}

The Littleendian Function is invertible. It just simply changes the order of bytes in a word.

#### Salsa20 Hash Function

The Salsa20 Hash Function takes 64 bytes as input and returns a 64-byte sequence.

First, the Hash Function creates 16 words from the received 64-byte input. If input is a sequence of 64 bytes:

input =(b_{0}, b_{1}, b_{2}, ..., b_{63})

then 16 words are created as below:

w_{0} = littleendian(b_{0}, b_{1}, b_{2}, b_{3})

w_{1} = littleendian(b_{4}, b_{5}, b_{6}, b_{7})

[...]

w_{15} = littleendian(b_{60}, b_{61}, b_{62}, b_{63})

Then, all 16 words are modified by 10 iterations of the Doubleround Function:

(x_{0}, x_{1}, ..., x_{15}) = doubleround^{10}(w_{0}, w_{1}, ..., w_{15})

Finally, the 16 words received as input are added (as described above) to the modified 16 words and changed to 64 new bytes using the Littleendian Function. The bytes are output from the Salsa20 Hash Function:

output = littleendian^{-1}(x_{0}+w_{0}) + littleendian^{-1}(x_{1}+w_{1}) + ... + littleendian^{-1}(x_{15}+w_{15})

#### Salsa20 Expansion Function

The Salsa20 Expansion Function takes two sequences of bytes. The first sequence can have either 16 or 32 bytes and the second sequence (n) is always 16-byte long. The function returns another sequence of 64 bytes.

If the first sequence is 32-bytes long, then it is divided into two shorter sequences of 16 bytes (k_{0} and k_{1}). The Salsa20 Expansion Function is defined by using the Salsa20 Hash Function, as shown below:

Salsa20Expansion_{k0, k1}(n) = Salsa20Hash(a_{0}, k_{0}, a_{1}, n, a_{2}, k_{1}, a_{3})

where:

a_{0} = (101, 120, 112, 97)

a_{1} = (110, 100, 32, 51)

a_{2} = (50, 45, 98, 121)

a_{3} = (116, 101, 32, 107)

If the first sequence is 16-bytes long (k), then the Salsa20 Expansion Function is defined by using the Salsa20 Hash Function as below:

Salsa20Expansion_{k}(n) = Salsa20Hash(b_{0}, k, b_{1}, n, b_{2}, k, b_{3})

where:

b_{0} = (101, 120, 112, 97)

b_{1} = (110, 100, 32, 49)

b_{2} = (54, 45, 98, 121)

b_{3} = (116, 101, 32, 107)

The constant values of the vector (a_{0}, a_{1}, a_{2}, a_{3}) mean 'expand 32-byte k' in ASCII code. Similarly, the constant values of the second vector (b_{0}, b_{1}, b_{2}, b_{3}) mean 'expand 16-byte k' in ASCII.