Encryption Series: The Vigenère Cipher

Encryption Series: The Vigenère Cipher
Photo by Europeana / Unsplash

I have always had a particular fondness for this cipher. When I first embarked upon a serious exploration of cyber security, I found the introductory Caesar and Rot13 ciphers, ubiquitous staples of cryptographic tutorials, rather tiresome. They seemed closer to simple encoding exercises than true encryption methods.

The Vigenère cipher, however, captivated my interest by presenting a genuine intellectual challenge. Unlike its predecessors, it was no trivial matter for a human to decipher by hand. Indeed, it exploited the inherent limitations of human pattern recognition to such an extent that it remained resilient for nearly three centuries, though, of course, it was not impervious.

How does the Vigenère Cipher work

In our previous exploration of the monoalphabetic substitution cipher, we addressed a specific weakness of Caesar's cipher: that deciphering just a single letter would immediately expose the entire key. Yet, both the Caesar and substitution ciphers still exhibit substantial vulnerabilities. For example, patterns remain readily discernible, allowing the plaintext to be reconstructed incrementally by solving each letter individually. The Vigenère cipher remedies this particular weakness by preventing a letter-by-letter solution.

The Vigenère cipher introduces complexity through a cyclic shift mechanism governed by a keyword, repeated as necessary to match the length of the plaintext. Consider, for instance, encrypting the phrase: "As far as the eye can see"
using the keyword "brain".

To execute this, we write the plaintext on one line and repeat the keyword beneath it, thus aligning the cyclic shifts:

Plainttext: as far as the eye can see
Shift Key:  br ain br ain bra inb rai

Ciphertext: bj fie bj tpr fpe kno jem

Note: The keyword encoding starts with 0, so A = 0, B = 1... Z = 25. 

In essence, the Vigenère cipher applies a distinct Caesar shift to each letter, dictated by the keyword character currently in use. Historically, encryption and decryption were conducted manually via a tabular device known as the tabula recta, or Vigenère square: a 26×26 grid containing the alphabet written out repeatedly, each row cyclically shifted by one character relative to the preceding row.

Vigenère square (or tabula recta), a 26×26 table containing the alphabet written out 26 times in rows, each one shifted cyclically to the left compared to the one above

Fortunately, owing to the pioneering efforts of Charles Babbage, Ada Lovelace, Alan Turing, and numerous other luminaries, we now have computers to spare us from laborious letter-grid scrutiny. We shall, therefore, rely upon mathematics.

Mathematics of the Vigenère Cipher

Formally, let us define:

  • \(P = P_{0}P_{1} \ldots P_{n}\) be the plaintext
  • \(K = K_{0}K_{1} \ldots K_{m}\) be the keyword (repeated to match the length of the plaintext)
  • \(C = C_{0}C_{1} \ldots C_{n}\) be the ciphertext

Each character is mapped to its alphabetic index, where A = 0, B = 1, ..., Z = 25.

Encryption is defined mathematically as:

Cᵢ = (Pᵢ + Kᵢ) mod 26

Correspondingly, decryption is naturally the inverse operation:

Pᵢ = (Cᵢ - Kᵢ + 26) mod 26

We add 26 to ensure the result remains a positive integer within the set of natural numbers (\(\N\)), before applying modulus 26 to constrain the output within the bounds of the English alphabet.

Implementing the Vigenère Cipher in Rust

Now, let us illustrate this concept in practical terms, using Rust.

The astute reader will recall my previous remark: the Vigenère cipher can be understood as the repeated application of the Caesar cipher, each letter with a unique shift determined by the keyword. Conveniently, we already possess suitable Caesar cipher code from our previous exploration: A Civilised Introduction to the Caesar Cipher.

If your Caesar cipher implementation currently resides in main.rs, move and rename it to caesar.rs within your project's lib directory. Ensure the module is exposed via your lib.rs by adding the following line:

pub mod caesar;

Then, within caesar.rs, prefix your encrypt and decrypt functions with pub, commenting out any additional functions such as main() or brute-force methods:

pub fn caesar_encrypt(text: &str, shift: u8) -> String {
  ...
}

pub fn caesar_decrypt(text: &str, shift: u8) -> String {
    ...
}

// Comment out anything else like fn main() or fn brute_force_caesar()

Assuming your package is named encryption , within the main module, you can now import your Caesar cipher functions:

use encryption::caesar::{caesar_encrypt, caesar_decrypt};

The actual implementation of the Vigenère cipher itself, while sophisticated in concept, remains quite straightforward in practice. We iterate through the keyword, applying Caesar shifts letter-by-letter:

// src/main.rs

use encryption::caesar::{caesar_encrypt, caesar_decrypt};

/// Encrypts `plaintext` with Vigenère by applying a Caesar shift
/// for each letter of the (cycled) keyword.  Non-alphabetic
/// characters are passed through unchanged.
pub fn vigenere_encrypt(plaintext: &str, keyword: &str) -> String {
    // Cycle the keyword letters endlessly
    let mut key_iter = keyword.chars().cycle();

    plaintext
        .chars()
        .map(|c| {
            if c.is_ascii_alphabetic() {
                // Determine shift 0..25 from this keyword letter
                let k = key_iter.next().unwrap();
                let shift = k.to_ascii_uppercase() as u8 - b'A';

                // Delegate to your existing Caesar encrypt on a one-char string
                let s = &c.to_string();
                caesar_encrypt(s, shift)
                    .chars()
                    .next()
                    .expect("caesar_encrypt always returns at least one char")
            } else {
                c
            }
        })
        .collect()
}

/// Decrypts `ciphertext` by reversing each per-letter Caesar shift
/// according to the (cycled) keyword.
pub fn vigenere_decrypt(ciphertext: &str, keyword: &str) -> String {
    let mut key_iter = keyword.chars().cycle();

    ciphertext
        .chars()
        .map(|c| {
            if c.is_ascii_alphabetic() {
                let k = key_iter.next().unwrap();
                let shift = k.to_ascii_uppercase() as u8 - b'A';

                // Delegate to your existing Caesar decrypt on a one-char string
                let s = &c.to_string();
                caesar_decrypt(s, shift)
                    .chars()
                    .next()
                    .expect("caesar_decrypt always returns at least one char")
            } else {
                c
            }
        })
        .collect()
}

Finally, we test this cipher implementation with the following code snippet:

fn main() {
    let plaintext  = "as far as the eye can see";
    let keyword    = "BRAIN";
    let ciphertext = vigenere_encrypt(plaintext, keyword);

    println!("Plaintext : {}", plaintext);
    println!("Keyword   : {}", keyword);
    println!("Cipher    : {}", ciphertext);

    let recovered = vigenere_decrypt(&ciphertext, keyword);
    println!("Decrypted : {}", recovered);
}

Execute your Rust program with cargo run, and you should witness successful encryption and decryption:

vscode ➜ /workspaces/vscode-remote-try-rust (main) $ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/encryption`
Plaintext : as far as the eye can see
Keyword   : BRAIN
Cipher    : bj fie bj tpr fpe kno jem
Decrypted : as far as the eye can see

Thus, we have neatly implemented the Vigenère cipher.

Breaking Vigenère

The Vigenère cipher, as we have seen, was considered secure enough to resist practical cryptanalysis for several centuries. Today, however, computer-assisted methods make short work of its security, though not by the simple expedient of frequency analysis alone. Instead, cryptanalysis of the Vigenère cipher relies upon subtler forms of pattern recognition.

One of the enduring fascinations of encryption (and its counterpart, cryptanalysis) is its unique blend of technical ingenuity and practical exploitation of human behaviour. Often, weaknesses in cryptographic methods stem less from their theoretical implementation than from their practical application. Until now, we have only encrypted English text. Consequently, although we utilise 26 characters for shifting or substitution, linguistic patterns invariably remain.

The keyword in a Vigenère cipher amplifies this vulnerability. Should the keyword itself be a common English word rather than a random assortment of letters, we can employ a dictionary-based brute-force approach against the ciphertext. Even when the keyword appears to be a random sequence, patterns will inevitably emerge that offer opportunities to gradually decode the text. Indeed, the strongest possible keyword in the context of a Vigenère cipher is one that matches the plaintext in length precisely, thereby precluding repetition.

Let us now apply these principles to a practical demonstration, attempting to cryptanalyse an actual ciphertext. I shall initially avoid simplistic dictionary brute-forcing or incremental keyword guessing; while these methods have a certain cinematic charm, they lie beyond the educational scope of this exercise.

Below is the ciphertext I have prepared for analysis:

Mhp Eatpbminey Xrjbnp lnl rr irpxrgwlhnd autxhoec xb hvlziyegx eqrtsmaz Mw vay hb pldmegie pi ngoh lbp xr hroie bx wh ppvshvp Bt nea ysoeoh eatpblid fhm mw aad rb iszxr zj ngxlviaegbrj tnj eatpbminey kiotttsal su mrfxul Mwl pcsibrfx id xb twvbse yf ms ptktrt tzdbllfyx aktt hi nki derpeqr efjulmamig piel

Ciphertext encrypted using Vigenère

This ciphertext has been deliberately composed at length, to increase the likelihood of identifiable patterns emerging. A shorter ciphertext may yield insufficient statistical evidence, hampering our analysis. Currently, we lack knowledge of the key length; our initial task, therefore, is to infer this critical detail. The simplest method involves seeking out repeated ciphertext words. Should a word appear multiple times and be identically encrypted on each occurrence, this strongly suggests that the keyword has repeated itself at corresponding intervals. Measuring the distance between these repetitions will typically yield multiples of the keyword length.

To identify such repetitions, one could write a short script to automate the process. Such a script is straightforward enough to implement, thus I leave it as an exercise for the reader. For reference, here is what my own script identified (case-insensitive positions given):

Word            | Positions
----------------+-----------
xb              | [50, 236]
eatpbminey      | [4, 189]
mw              | [72, 153]

Our script indicates that the words xb, eatpbminey, and mw repeat in the ciphertext at certain intervals. The presence of eatpbminey is particularly notable, as a repeated 10-letter word strongly suggests that the keyword's repetition aligns exactly between its occurrences.

However, my script positions account for spaces, complicating distance measurements. Thus, we must count the exact number of ciphertext letters between repetitions to ascertain their separation more precisely:

Word            | Positions            | Distances
----------------+----------------------+----------
mw              | [72, 153]            | [64]
eatpbminey      | [4, 189]             | [152]
xb              | [50, 236]            | [152]

Notably, we find commonality among the distances: specifically, 64 and 152 share common factors. Analysing these distances:

The prime factors of 152 are \(2 \times 2 \times 2 \times 19\).

The prime factors of 64 are \(2 \times 2 \times 2 \times 2 \times 2 \times 2\).

Both distances share divisors of 2, 4, and crucially 8. While smaller factors such as 2 or 4 might divide the distances, the presence of a larger common divisor, such as 8, strongly suggests a keyword length of precisely 8 characters. Larger common factors are statistically less likely to arise purely by coincidence.

Assuming, therefore, an 8-character keyword, we may advance to the next step in cryptanalysis. We are particularly fortunate here: the repeated word eatpbminey exceeds our inferred keyword length, thus guaranteeing a repeated shift pattern within the word itself.

Considering the ciphertext word eatpbminey in detail, we know:

  • The first letter and the penultimate letter share the same shift (both encrypting to 'e').
  • The second letter and the final letter share the same shift, but produce distinct ciphertext letters (a and y), indicating they differ in plaintext by 24 positions (since 'a' is position 0 and 'y' is 24 in the alphabet).

Armed with this pattern, we search our dictionary for English words matching these constraints:

  • 10 letters in length.
  • First and penultimate letters identical.
  • Second and last letters differing by 24 alphabetic positions.

Two plausible dictionary candidates immediately emerge:

opposition
analytical

Using these words as potential plaintext candidates, we derive two possible keywords:

qlebjepf
entedtal

We test each keyword to decrypt the ciphertext accordingly:

wwl drpawwxjdp tcelcl keh cm sglwichgrcz zlpicyty ws dggjxudxt plbiolrv xr fpu gs lwywtchv lt iqdd ksl im rgkhv xi rr elujdgk li jdr udjodd drpawvxz eyi xr kpz qs eduhg vi ecigfxwdxxce dcf drpawwxjdp gtjdiprrh dp wgbwlh xrv eyrzxcah xz ws phqlha xw id kdzpqk pkylahept lfdi dh egt yogldhn patjhlritb zxak

Decrypt with qlebjepf

iuw axapqivuav eryxaw hks rg eewtonwadak wraxwkrj ty ovavvfade efngzixg ml rnf dy wlsirneb wi ccbo hyw xg deveb ix ld cwrpove xg uax fsdabo axapqhvk bet ml wnk ny psote gf knxarvhadiry paq axapqivuav ridpgaoxs sj iemtrs mlh cjofirut vk ty awkxfl uc ts epxanq azsxysbve azpg oe kri saewany eufhsixtiv lvlh

Decrypt with entedtal

Disappointingly, neither yields meaningful English plaintext. However, recall that these plaintext words might not occur at the very start of the ciphertext. Observing again, we notice a three-letter shift before the repeating ciphertext word occurs. If we adjust our keywords accordingly, shifting three letters, we yield two new possibilities:

epfqlebj
talented

One can readily discern which of these candidates is more promising.

Decrypting once more:

isk opposition tqaxyk vch qi eckhgcvcdyy kjpwykpx hq ducvttovt dhnenwpv ln rlt rq lkuipbst lh eczc vql wi dcjst xw nd akfhdug xe iop urfazc opposhty pwi ln wly bq erqtc ut ccwcrtvovxqa pye opposition ghfpeocph rl icahjh lnh axcxxqwt ty hq pvmxdz iu ir gpvobi pyuxwgpnt zbpe cs cgh uackofn dwffgwpihx ltzv

Decrypt with epfqlebj

the analytical engine has no pretensions whatever to originate anything it can do whatever we know how to order it to perform it can follow analysis but it has no power of anticipating any analytical relations or truths its province is to assist us to making available what we are already acquainted with

Decrypt with talented

As anticipated, the adjusted keyword talented (aligned with analytical) reveals coherent English text:

“The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths. Its province is to assist us to making available what we are already acquainted with.
[Describing Charles Babbage's machine.]”
― Ada King Lovelace

We have succeeded in breaking the ciphertext. Admittedly, fortune favoured us here: the repetition of the distinctive word analytical, combined with an eight-letter keyword, greatly facilitated our analysis. Cryptanalysis, therefore, often hinges upon the nature of the text encrypted as much as the cipher itself.

I shall leave the dictionary-based brute-force approach as an exercise for the keen reader or as the subject of a future article. For now, our exploration concludes here, having demonstrated both the elegance and limitations of the Vigenère cipher.