Skip to content

Instantly share code, notes, and snippets.

@Samuirai
Created June 14, 2012 20:59
Show Gist options
  • Save Samuirai/2932918 to your computer and use it in GitHub Desktop.
Save Samuirai/2932918 to your computer and use it in GitHub Desktop.
G-WAN Captcha Decode

G-WAN is a new free web server. They seem to be very proud of it, or at least just want to make a lot of money. Well anyway, in almost every sentence they write, they claim that they are 20% cooler than anything else. It feels a bit arrogant. I have to admit, I don't know a lot about web servers, so I can't speak to how good they are.

However, then I saw their Captcha example. I also don't know much about machine learning algorithms, OCR, and stuff like that, but I do know how to read pixels. I also know how to compare values with python :P

demo

They say the following about their Captcha:

[...] difficult or even completely impossible for robots.

Wait wat? If this is true, this is something really outstanding and maybe an alternative to reCaptcha...

But then I was like:

are you kidding me?

So I wrote this basic stupid pixel by pixel reading and comparing code, to decode the captcha.

smrrd$ python crack_captcha.py
GIF Image
---------
R0lGODlhGAAZAJEAAP///9//v4SkZAAAACH5BAEAAAAALAAAAAAYABkAAAJfhI+pGB0rmHuGAmtEPJj7E23VYlmbeDnMB2guu44J2lWqQi/6Drl0k7hlSKwSiHeBgV5BTK2FNOKIsmQVJekIkdzgTEOVIERY4ApDPoczTOvzCbVtq/G6kt4CK+BdRQEAOw==

Captcha Data Matrix
-------------------
                                                
  1 1 1 1               2         1 1 1 1 1     
  1       1           2 2         1             
  1       1         2   2         1             
  1       1       2     2         1 1 1 1       
  1       1       2 2 2 2 2       1             
  1       1             2         1             
  1 1 1 1               2         1             
                                                
  2 2 2 2 2         1 1 1           1 1 1       
  2               1       1       1       1     
  2                       1       1             
  2 2 2 2             1 1         1             
  2                       1       1             
  2               1       1       1       1     
  2                 1 1 1           1 1 1       
                                                
        1           2 2 2               1       
      1 1         2       2           1 1       
    1   1         2       2         1   1       
        1           2 2 2 2       1     1       
        1                 2       1 1 1 1 1     
        1         2       2             1       
        1           2 2 2               1       
                                            
color | pixel count
-------------------
    0 |        472
    1 |         81
    2 |         44

color   1 | color   2
---------------------
        3 |         4
        1 |         9
        4 |          
---------------------
        8 |        13

I also don't understand, what they think this means and why they are so excited about it:

The two sums are: 13 and 8... for the same Captcha image!
By just changing the HTML background color [...]

In the end, this was the first time I tried to solve a Captcha. I think this is the best example of how not to implement it.

YouTube Video Demo

kind regards,
samuirai

personal Website http://www.smrrd.de
I'm a member of the Stuttgart Hackerspace - shackspace

edit: to see really cool stuff with reCaptcha, check out what they did: http://www.dc949.org/projects/stiltwalker/

import base64, sys, io, Image, urllib2, re
bg = 0 # background color
cw = 8 # character width
# get the new captcha
url = urllib2.urlopen("http://62.75.175.163:8080/?captcha.c")
html = url.read().replace('\n','').replace('\r','')
url.close()
# get the base64 gif image code
# <img src="data:image/gif;base64,R0lGODlhG...AAADs=" alt="A tree" width="48" height="50" />
# => R0lGODlhG...AAADs=
regex = re.compile('.*base64,(.*)" alt')
gif_b64 = regex.match(html).group(1)
print "GIF Image"
print "---------"
print gif_b64
# load the string as image
f = io.BytesIO(base64.b64decode(gif_b64))
img = Image.open(f)
pix = img.load()
# print and analyse the pixels
print "Captcha Data Matrix"
print "-------------------"
pixels = {}
for y in xrange(0,25):
for x in xrange(0,24):
# collect data
if pix[x,y] not in pixels: pixels[pix[x,y]]=0
else: pixels[pix[x,y]]+=1
# print pixels
if pix[x,y]!=bg: print pix[x,y],
else: print ' ',
print ''
# print the analyse - total useless, but looks cool
print "color | pixel count"
print "-------------------"
for color in pixels:
print "%5d | %10d" % (color,pixels[color])
# define all characters
charset = {
0:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
1:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
2:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0]],
3:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
4:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
5:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
6:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
7:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]],
8:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
9:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
}
def find_character(tmp_char):
highest_char = ['X',0,0,]
for char in xrange(0,10):
match = 0
fails = 0
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
#print str(tmp_char[x][y])+str(charset[char][x][y]),
if (charset[char][x][y] != 0 and tmp_char[x][y] != 0) or (charset[char][x][y] == tmp_char[x][y]):
match+=1
else: fails+=1
#print
#print [char,match,fails,]
if match>highest_char[1]:
highest_char=[char,match,fails,]
return highest_char
def get_color(tmp_char):
color = {}
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
if tmp_char[x][y]!=bg:
if tmp_char[x][y] not in color:
color[tmp_char[x][y]] = 0
else:
color[tmp_char[x][y]] += 1
return color.keys()
# analyse each character
color = {}
for yl in xrange(0,3):
for xl in xrange(0,3):
tmp_char = []
for y in xrange(yl*cw,yl*cw+cw):
tmp_line = []
for x in xrange(xl*cw,xl*cw+cw):
tmp_line.append(pix[x,y])
#print pix[x,y],
#print
tmp_char.append(tmp_line)
match = find_character(tmp_char)
col = get_color(tmp_char)
if not match[2]:
if col[0] not in color:
color[col[0]] = [match[0]]
else:
color[col[0]].append(match[0])
# print the sums
print ""
if len(color.keys())>=2:
print "color %3d | color %3d" % (color.keys()[0],color.keys()[1])
else:
print "color %3d | " % (color.keys()[0])
print "---------------------"
left_list = color[color.keys()[0]]
if len(color.keys())>=2:
right_list = color[color.keys()[1]]
else:
right_list = []
longest_length = 0
if len(left_list)>len(right_list): longest_length = len(left_list)
else: longest_length = len(right_list)
for row in xrange(0,longest_length):
left_val = ''
right_val = ''
if len(left_list)>row: left_val = str(left_list[row])
if len(right_list)>row: right_val = str(right_list[row])
print "%9s | %9s" % (left_val,right_val)
print "---------------------"
print "%9s | %9s" % (sum(left_list),sum(right_list))
@nixtoshi
Copy link

I'm reading this after 13 years in 2025.

CAPTCHAs have evolved and with the popularization and advancements in AI it is fair to say that most CAPTCHAs of the past and even current CAPTCHAs, even Google's, can be beat using AI.

I'm only recently finding out about the GWAN server and find it VERY interesting for high performance web applications, I'm definitely a believer in using C modules for performance, like Node.JS. I frankly hope GWAN keeps being developed and becomes more popular. Making its code openly auditable, not necessarily open source (different license), would probably help a lot since trusting non-auditable code for web servers is always a risk. There are several closed source web servers, but generally speaking they haven't been very successful, with the exception of Microsoft's IIS, and I mainly think it's because Microsoft as a company is trusted.

Regarding this thread, I can't help but think that it is immature not to accept that GWAN's captcha generating module was beat by @Samuirai's code when this was posted. Samuirai's code is interesting, easy to understand and achieves what it claims. We could say that Samuirai proved, in practice, that it wasn't a very secure or hard to beat CAPTCHA, that's that, we should just accept when our security is beat in order to improve said security. Many CAPTCHAs have come and gone, like the ones that were generated via PHP.

Public scrutiny of security methods and the publishing of exploits is in the end, good. It helps our industry reach better standards. Like what happened with cryptography, which is now a solid cornerstone of modern computing and society.

Many standard crypto algorithms are secure against any programmer, academic, scientist, mathematician, or powerful attacker if the source of randomness is unpredictable enough, and this has only been possible because people have collectively tried to break hundreds of encryption methods until we were left with the unbreakable ones. Every iteration of these, even failed ones, are a success, because they were theorized, built and tested in the field, all of them got us closer to the current standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment