Cracking a Monoalphabetic cipher
Imagine that we have some encrypted text. It was encrypted with a mono-alphabetic cipher, but we do not have the key.
How can we get into it?
“Shi hzu, vqhjcqfqh,” wsti i’Smjsvhsh, utjpzxj wjzkkthv jz qekcsth ptw yzhixyj jz Kzmjpzw, “Scc gzm zhq, zhq gzm scc--jpsj tw zxm fzjjz, tw tj hzj?”
“Shi dqj--” wsti Kzmjpzw.
“Pzci zxj dzxm pshi shi wuqsm!” ymtqi Sjpzw shi Smsftw sj zhyq.
Zoqmyzfq nd qesfkcq, vmxfncthv jz ptfwqcg, hqoqmjpqcqww, Kzmjpzw wjmqjypqi zxj ptw pshi, shi jpq gzxm gmtqhiw mqkqsjqi utjp zhq oztyq jpq gzmfxcs ityjsjqi nd i’Smjsvhsh:
“Scc gzm zhq, zhq gzm scc.”
“Jpsj’w uqcc! Hzu cqj xw qoqmdzhq mqjtmq jz ptw zuh pzfq,” wsti i’Smjsvhsh, sw tg pq psi izhq hzjpthv nxj yzffshi scc ptw ctgq; “shi sjjqhjtzh! Gzm gmzf jptw fzfqhj uq smq sj gqxi utjp jpq ysmithsc.”
First, let’s check if we have a Caesar Cipher. After all, a Caesar cipher is a form of a mono-alphabetic cipher, just not a very jumbled one.
We’ll take some text (I’ve used the fourth paragraph), and under each letter write out the alphabet. Each line seems to contain gibberish:
00 : ZOQMYZFQ ND QESFKCQ, VMXFNCTHV JZ PTFWQCG
01 : APRNZAGR OE RFTGLDR, WNYGODUIW KA QUGXRDH
02 : BQSOABHS PF SGUHMES, XOZHPEVJX LB RVHYSEI
03 : CRTPBCIT QG THVINFT, YPAIQFWKY MC SWIZTFJ
04 : DSUQCDJU RH UIWJOGU, ZQBJRGXLZ ND TXJAUGK
05 : ETVRDEKV SI VJXKPHV, ARCKSHYMA OE UYKBVHL
06 : FUWSEFLW TJ WKYLQIW, BSDLTIZNB PF VZLCWIM
07 : GVXTFGMX UK XLZMRJX, CTEMUJAOC QG WAMDXJN
08 : HWYUGHNY VL YMANSKY, DUFNVKBPD RH XBNEYKO
09 : IXZVHIOZ WM ZNBOTLZ, EVGOWLCQE SI YCOFZLP
10 : JYAWIJPA XN AOCPUMA, FWHPXMDRF TJ ZDPGAMQ
11 : KZBXJKQB YO BPDQVNB, GXIQYNESG UK AEQHBNR
12 : LACYKLRC ZP CQERWOC, HYJRZOFTH VL BFRICOS
13 : MBDZLMSD AQ DRFSXPD, IZKSAPGUI WM CGSJDPT
14 : NCEAMNTE BR ESGTYQE, JALTBQHVJ XN DHTKEQU
15 : ODFBNOUF CS FTHUZRF, KBMUCRIWK YO EIULFRV
16 : PEGCOPVG DT GUIVASG, LCNVDSJXL ZP FJVMGSW
17 : QFHDPQWH EU HVJWBTH, MDOWETKYM AQ GKWNHTX
18 : RGIEQRXI FV IWKXCUI, NEPXFULZN BR HLXOIUY
19 : SHJFRSYJ GW JXLYDVJ, OFQYGVMAO CS IMYPJVZ
20 : TIKGSTZK HX KYMZEWK, PGRZHWNBP DT JNZQKWA
21 : UJLHTUAL IY LZNAFXL, QHSAIXOCQ EU KOARLXB
22 : VKMIUVBM JZ MAOBGYM, RITBJYPDR FV LPBSMYC
23 : WLNJVWCN KA NBPCHZN, SJUCKZQES GW MQCTNZD
24 : XMOKWXDO LB OCQDIAO, TKVDLARFT HX NRDUOAE
25 : YNPLXYEP MC PDREJBP, ULWEMBSGU IY OSEVPBF
So, it looks a real jumble; nothing jumps out as it would for a Caesar shift. What can we do?
In any language, not all letters are used equally. In English, the most common letters are ETARIO and N, the most common pair of letters is TH followed by HE. We can use clues like this to begin to decipher our text.
Ordering the letters in our text by how common they are, we get J as the most common. The order is: JQZSHMTIPWCFGXYUKVDNOEABLR.
We could naively use this to help us - we might get lucky. In this case we do not:
sjjqhjtzh! Gzm gmzf jptw fzfqhj uq smq sj gqxi utjp jpq”
becomes
reetie.ai! .a. ..a. e... .a.tie .t r.t re .t.. ..e. e.t”
This seems to be a dead end. What couled reetie.ai be? Maybe our letters don’t exactly match up to what we might expect for English. It’d be a surprise if they matched perfectly!
What next?
Looking at the whole text, I note the three letter words ‘jpq’ and ‘shi’ both appear, so I will guess one is ‘and’ and the other is ‘the’. In our ciphertext, the most common letters are J and Q, so I will guess that ‘jpq’ represents ‘the’.
If this guess is right, our message has ‘T’ as the most common, followed by ‘E’ (instead of ‘E’ then ‘T’). This is not so bad. Let’s go with it.
This technique is called ‘using a crib’. ‘AND’ and ‘THE’ are ‘cribs’. If you have any reason to suspect some word appears, you can try it out and see if it makes other things make sense.
If I apply the crib to a few paragraphs, I get the following:
.t.et.hed ..t h.. hand, and the .... ...end. .e.eated ..th .ne ....e the ......a d..tated .. d’A.ta.nan:
“That’. .e..! N.. .et .. e.e...ne .et..e t. h.. ..n h..e,” .a.d d’A.ta.nan, a. .. he had d.ne n.th.n. ..t ....and a.. h.. ...e; “and attent..n! ... .... th.. ...ent .e a.e at .e.d ..th the .a.d.na..”
This is encouraging, words are starting to appear. There is a weird d’A, and that is of concern - but it is good to see ‘That’ and ‘hand’ appear.
Now I can see further clues elsewhere in the text.
sjjqhjtzh! translates to attent..n!.
Let us guess this is ‘attention!’. If we apply this through the text we soon see that, hzjpthv translates to nothin., so let us try the guess that ‘v’ represents ‘g’
“And no., gent.e.en,” .aid d’A.tagnan, .itho.t .to..ing to e...ain hi.
.ond..t to .o.tho., “A.. .o. one, one .o. a..--that i. o.. .otto, i. it
not?”
If we continue to build like this, it looks like “Shi hzu, vqhjcqfqh,” becomes “And now, gentlemen”. This is feeling like we are on the right track. If our earlier guesses were wrong, I might get gibberish.
“Shi hzu, vqhjcqfqh,” wsti i’Smjsvhsh, utjpzxj wjzkkthv jz qekcsth
becomes
“And now, gentlemen,” .aid d’A.tagnan, witho.t .to..ing to e..lain
I note that wjzkkthv has a double k which looks like it could be a double p in stopping. Could this translate to: “said d’Artagnan, without stopping to explain….?
If we make some further (increasingly easier) guesses, we eventually get the following:
“And now, gentlemen,” said d’Artagnan, without stopping to explain his conduct to Porthos, “All for one, one for all--that is our motto, is it not?”
“And yet--” said Porthos.
“Hold out your hand and swear!” cried Athos and Aramis at once.
Overcome by example, grumbling to himself, nevertheless, Porthos stretched out his hand, and the four friends repeated with one voice the formula dictated by d’Artagnan:
“All for one, one for all.”
“That’s well! Now let us everyone retire to his own home,” said d’Artagnan, as if he had done nothing but command all his life; “and attention! For from this moment we are at feud with the cardinal.”
The hardest part with a monoalphabet is making a start. If you’re on the right tracks it gets quicker and quicker.
I’ve put code at the bottom of this post.
Recent Cryptography Posts
More Cryptography posts
Extension
This is the code that I used to help crack the cipher. It is not elegant, for instance it could crash if you mistype, and it will not prevent you from making the same substitution for multiple candidates. It does nothing automatic for you except deal with the substitutions. It doesn’t even make the output pretty.
For best results - use a monospaced font.
A more sophisticated piece of code would look not only at common letters, but common pairings and triplets, and try to make a match.
To clear a bad guess, replace the letter with a full stop.
The text to be decrypted should be in a text file called ciphertext.txt
def display(lines,final):
blankline=""
output=blankline
for line in lines:
if not final:
output+=line
newline=blankline
for letter in line:
asc=ord(letter)
if (64 < asc < 91) or ( 96 < asc < 123):
if (asc > 96):
base=97
else:
base=65
lett=asc-base
subst=cipher[lett]
if subst=='.':
letter='.'
else:
letter=chr(base+subst)
newline+=letter
output+=newline
print(output)
def analyse(text):
analysis=[]
for i in range(26):
analysis.append([chr(65+i),0])
for line in text:
for letter in line:
asc=ord(letter)
if (64 < asc < 91) or ( 96 < asc < 123):
if (asc > 96):
base=97
else:
base=65
lett=asc-base
analysis[lett][1]+=1
sorted=False
while not sorted:
sorted=True
for i in range(25):
j=analysis[i][1]
k=analysis[i+1][1]
if k>j:
sorted=False
temp=analysis[i]
analysis[i]=analysis[i+1]
analysis[i+1]=temp
res=''
for i in range(26):
res+=analysis[i][0]
return(res)
#initialise cipher
cipher=[]
for i in range(26):
cipher.append('.')
# Read the input, and put each line in a list of lines!
filetimetable = open("ciphertext.txt","r")
lines=filetimetable.readlines()
filetimetable.close()
done=False
common=analyse(lines)
while not done:
display(lines,False)
print('Frequencies :',common)
print()
print('Type = to finish')
repl=ord(input('What letter to replace? '))
if repl==61:
done=True
else:
# there is no input validation
if (64 < repl < 91) or ( 96 < repl < 123):
if (repl > 96):
base=97
else:
base=65
ct=repl-base
print('Type . to clear the letter')
repl=ord(input('What to replace with? '))
# there is no input validation
if (64 < repl < 91) or ( 96 < repl < 123):
if (repl > 96):
base=97
else:
base=65
repl=repl-base
cipher[ct]=repl
print()
display(lines,True)
Extra
The image used in the title is public domain, from the Sherlock Holmes story “The Adventure of the Dancing Men”