Remember back in your college AI class when the professor made you use Lisp to program your AI experiments? Or maybe you were in a programming language class and you had to use Lisp. Remember the pain and suffering imposed by the use of ( and )))))))))))))))))))? I do.. and as such I've always had a particular dislike for Lisp and any other languages like Lisp... namely, Scheme.
I was working on a project recently that required me to reverse engineer a particular binary responsible for handling a key piece of data manipulation for the system under review. So I loaded the binary up in IDA and gdb and off I went to see what was under the hood. No sooner than I traced the one and only call from the main(...) did it occur to me that something was very, very wrong. Of the nearly 1000 functions in the binary, about 99.5% of them ended in either
call eax
or
call eax+4
and then the function would end. That's it, just end. Not a
retn in sight. So obviously this was frustrating. Moreover, the eax in question was the result of some obscure function call that passed a variable that never seemed to be assigned a value. So how the hell was I supposed to trace this program? Or more importantly, how the hell did it even execute?
As a reverse engineer with years of x86 experience, there are certain things I expect to see in a function.. a stack, a jmp, a call to a defined function, a function that returns. You know, the basics. So what was going on with this program?
Digging around a bit more I found that this program called a library named chicken. Seriously, it calls a library called chicken. Digging around the web lead me to the source of this mysterious library...
www.call-with-current-continuation.org.
As it turns out the binary I was reversing wasn't originally written in C or assembly or even Fortran for that matter.. it was written in Scheme (a dialect of Lisp). My blood immediately turned cold. My hard feelings toward the language of the ( and )))) had come back to haunt me. So not only did I feel that the language was evil in its original form, but this compiled version did nothing more than to strengthen my dislike of it.
Well, I had to reverse this part of the system.. I had no choice in that matter. How the hell was I going to do this? The program was used as a CGI script so I could execute it through the web and watch how it interacted. That was an option, but I'm old school in that I love doing static analysis. I love it! Give me a binary, ask me how it works down to the level of how the structures are defined and leave me alone for a while, and I'll come back to you with a report on the insane details of the program. Its what I do, it's what I enjoy. But static analysis was out of the question it looked like, so with gdb fired up and ready to go, I ran the program single stepping all the way. This painful process only deepened my hatred of Scheme and this bastardized compiled version of it.
I quickly realized that single stepping through this program would be a waste of my time and the few remaining threads of sanity I had were quickly being eroded by this program. Time for a different technique....
A few weeks passed and a lot of research was done. I read the website for CHICKEN, I read the wiki on CHICKEN. I learned a lot, I forgot even more. I found a paper on the wiki that described the process by which Scheme is changed to C by CHICKEN and then compiled. You heard me, CHICKEN takes Scheme code, converts it to C and then compiles it (typically with gcc). In this paper they gave two examples of how the conversion works, neither of them were easy to follow for someone who didn't understand Scheme (which, I will admit, I still don't). But the paper did give me enough insight into what is happened to start making some educated guesses. As it turns out, CHICKEN converts Scheme into C by changing the way the program is executed. Instead of a standard stack being used to hold local variables for a function and return addresses for functions to, well, return to, CHICKEN converts the Scheme code into "continuous passing style" ... in a nutshell this means that each function simply calls the next function and passes along a data block for local variables the function will use and a data block to be used (at some point in the future) to call after that. There is a lot more to it, trust me.. but that'll give you an idea.
This was great... I was looking at a program that malware authors would send back saying "um, yeah, even we don't make programs that hard to follow." And the web, the web was useless when it came to trying to find resources for reverse engineering CHICKEN applications.
But never fear... long story (and this was a long story) short, I come to you with a paper on how to reverse engineer CHICKEN applications.. ta-da (this is where the trumpets would start sounding). After a few long weeks of reversing, researching and experimenting, I have finally figured out how to read CHICKEN applications. I can't convert them back to Scheme (and really, why would anyone want to?) but I can now give you the information you need to reverse these horrible little programs yourself.
I hope my suffering can help you in the future. As a friend of mine said when I told him I was writing a paper on this topic once told me "I'm sure the other 3 guys on the Internet trying to reverse CHICKEN will thank you."
greg.
The paper to date:
Reverse Engineering CHICKEN with IDA