What is a while(2) loop in Hex-Rays?
2021-03-04 08:59:27 Author: www.msreverseengineering.com(查看原文) 阅读量:140 收藏

Hex-Rays uses while(1) to represent infinite loops in the output. However, sometimes you might see while(2) loops in the output instead, as in the following:

Logically, while(2) behaves the same as while(1) -- both loops are infinite -- but I wondered where they came from, what they meant, and why Hex-Rays produces them. Given that somebody asked me about it on Twitter, it's clear that I'm not the only one who's had this question. I recently learned the answer, so I decided to document it for posterity.

Answering this question requires some discussion of Hex-Rays internals. The decompiler operates in two major phases, known internally as "microcode" and "ctree". The microcode phase covers the core decompilation logic, such as: translating the assembly instructions into an intermediate representation; applying compiler-esque transformations such as data flow analysis, constant propagation, forward substitution, dead store elimination, and so on; analyzing function calls; and more. To learn more, I'd recommend reading Ilfak's blog entry and installing the Lucid microcode explorer.

The ctree analysis phases, on the other hand, are more focused on the listing that gets presented to the user. The ctree phase contains relatively little code that resembles standard compiler optimizations -- some pattern transformations are close -- whereas much of the code in the microcode phase resembles standard compiler analysis. The major differences between the two are that the microcode does not have high-level control flow structures such as loops (it uses goto statements and assembly-like conditional branches instead), and that type information plays a relatively minor role in the microcode phase, whereas it plays a major role in the ctree phase.

Between the microcode and ctree phases, there is a brief phase known internally as hxe_structural, which performs so-called "structural analysis". This phase operates on the final microcode, after all analysis and transformation, but before the ctree has been constructed. Its role is to determine which high-level control flow structures should be presented to the user in the ctree listing. I.e., the information generated by this phase is used during ctree generation to create if, if/else, while, do/while, switch, and goto statements.

After ctree generation is complete, Hex-Rays applies two sets of transformations (known internally as CMAT_TRANS1 and CMAT_TRANS2) to the decompilation listing, to clean up common patterns of suboptimal output. For example, given the following code:

Hex-Rays will convert this into simply if(cond) return 1;. These phases might also transform while loops into for loops, rewrite while loops as do-while loops or vice versa, convert while(1) { if(cond) break; /* ... */ } to while(!cond) { /*...*/ }, and so on.

There is one transformation in the CMAT_TRANS2 phase that is capable of creating a loop where none previously existed. Namely, it looks for patterns of instructions like the following:

Importantly, the label and the goto must be in the same scope (i.e. inside of the same set of curly braces { }). If this pattern matches -- and the code inside satisfies a few technical restrictions -- Hex-Rays will create a while(2) { /* ... */ }.

while(2) loops are created at CMAT_TRANS2, whereas the other loop transformations mentioned previously take place at CMAT_TRANS1. Therefore, while(2) loops will not be rewritten as a while loop with a conditional, or a do/while loop, or a for loop. If Hex-Rays creates a while(2) loop internally, there will always be a while(2) loop in the final output.

So the short answer to the question is: it doesn't matter what a while(2) loop is. When you see one, you have no reason to think of it any differently from a while(1) loop. Hex-Rays is not trying to communicate anything special to you as a user. while(2) loops are introduced to improve the decompilation listing for common patterns that the structuring algorithms are unable to recognize, for whatever reason. (This is a common pattern in compiler design -- having a set of rewrite rules that can clean up suboptimal results of previous phases. For example, peephole optimizers in compilers operate on a similar principle.)


文章来源: https://www.msreverseengineering.com/blog/2021/3/3/what-is-a-while2-loop-in-hex-rays
如有侵权请联系:admin#unsafe.sh