Earlier today I went online to check the status of my tax return refund. (Yes, I take the appropriate withholding allowances. It's unbelievable to me how online commenters are quick to roast people for "giving the government a free loan" without understanding anything of their tax situation.) It turns out that unlike the California or Federal systems, the New York State tax refund procedure requires you to fill out a captcha before going anywhere. Not just any captcha – check out this beauty:
It never occurred to me to try to break a captcha before but this looked like the easiest possible target. Five minutes later I was done. Three of those minutes involved setting up dependencies: Tesseract, Pytesseract, and Pillow. Thanks to modern packaging tools (thanks, brew and wheel) this was painless and fast.
$ brew install tesseract $ pip install pytesseract pillow
(OK, I actually created a new directory and virtualenv for this, just like I do with all of my projects, but you get the idea.)
Breaking the captcha was then as easy as saving it (as
running the following python code:
import pytesseract as tes from PIL import Image img = Image.open('captcha.jpg') print 'captcha =', tes.image_to_string(img)
$ python ocr.py captcha = 740120
Two logical steps – kind of incredible. Further proof that there's no point in using a captcha if it isn't made by Google. Sure, older versions of captcha were susceptible to automated humans, but how can you defeat ARTIFICIAL INTELLIGENCE?