众所周知IDA的F5在逆向中是非常强大的,然而也存在一些限制,IDA F5出来的结果有时候会稍微有点乱,比如一大堆强制类型转换,一堆不知道什么东西的指针偏移等等。
这篇文章会以一个类似32位x86的虚拟机的CTF逆向题为例,简单记录一下如何根据已有信息推测还原出一个结构体类型,并且对F5的结果进行优化,让代码看起来更加直观和易读。
相关文件下载:https://github.com/Inv0k3r/pwnable_files/raw/master/mvze.zip
D:修改数据类型,db是字节,dw是双字,dd是四字节,dq是八字节
小键盘*:创建数组
Y:修改数据类型,可以把光标放在函数名字上改函数,或者放在变量上修改变量 N:修改函数/变量名字
U:取消定义,可以把识别为函数的汇编重新变回字符
C:定义为代码,把没识别成代码的字符识别成代码
P:创建函数,前面用C识别出来的仅仅是反汇编代码,要F5还得创建函数
上来先找主函数:
main
进来之后根据字符串可知需要输入一个二进制文件,那么合理猜测sub_27BE
是用来处理二进制文件的,点进去之后有两个函数:
根据内容来看,第一个是用来做初始化的:
第二个是用来读取指令序列文件的:
而下面的while循环的两个函数,一个可以看出是取指令函数,另一个点开是一套switch,是一个用来执行虚拟机指令的函数:
对于稍微复杂一些的二进制文件,恢复数据结构是非常重要的,下面我会写一下如何根据IDA的信息逐步恢复一个结构体。
首先我们已知的是,程序是一个读取指令并执行的程序,那么我们就要先确定指令格式。
根据while循环里的第一个函数来看:
显然变量v6存储了指令。
可以看到指令长度应该都是12字节,即定长指令,然后a1+12000
的位置应该是保存着下一条指令的地址,即rip寄存器,作为返回内容的v2则是**(a1+12000)
,也就是先从a1+12000
拿到一个地址,再把这个地址里的内容返回。
这里先按照意义给它们改个名字,同时把12000用16进制表示:
然后新建一个结构体,在0x2ee0
的位置设置为rip寄存器:
然后根据之前初始化函数,把我们的结构体划分开,名字暂时先不管,需要注意的是初始化函数里的a1是QWORD,8字节,所以后面的+500和+1000都是以8字节为单位:
同时根据上层函数确定我们的结构体最小为12064字节(3015*4+4):
有个4次循环malloc,推测可能是一个数组:
修复结果:
然后看一下这个函数:
大概意思是把参数1变成4字节的形式,以参数2作为index返回地址,第一个调用相当于返回&(a1->field_fa0[0])
,然后把a1都改成新建的结构体类型,再看读取文件函数:
可以看到读取到fa0的是指令序列,命名为code,此函数返回了指令数,所以上层函数的2f1c是指令字节数,命名为binary_size
field_0
暂时不清楚作用,所以直接建一个0x2ee0的数组
程序是64位,int是32位,所以指令名称的长度应该是4字节,根据case 0的参数来看,指令高4字节作为参数2,低4字节作为指令本身(参考https://www.cnblogs.com/goodhacker/p/7692443.html),然后指令总长度是12,那么还有4字节的指令就是我们的参数a3了,暂且命名为param2,建个结构体:
然后修改函数参数的各种类型:
此时程序已经非常清晰了,field_2ef8
是前面malloc分配的4个长度为4的堆数组,使用的时候是用指令的第二个操作数做index的,可以推测这个数组应该是一个类似通用寄存器的东西,我们命名为regs,同时注意到每个函数的第一个参数都是一样的,很明显这是一个C++的类,第一个参数是this。
点进第一个case看一下:
emmmm不太对啊,我们切到汇编取消函数定义(快捷键U)再重新创建函数(快捷键C,快捷键P),一次不行可以多来几次,我ida的重新分析功能没有起作用,可能是一些BUG:
这次就没有问题了,返回值根据上层执行指令的函数来看是没什么用的,所以这条指令就是field_2ee8+4
,
按照上面的方法把所有函数的类型之类的修改一下,然后根据功能推测一下结构体内各个数据的意义,以及函数的参数、返回值信息,对F5的结果进行人工修改。
这里要说的一点是,像上面图里的一些语句前面有(xxxx)的强制类型转换,一般说明这个数据的类型前后不匹配,我们可以根据强制类型转换的信息来修改数据类型,举例:
这个函数首先可以看出来这个返回值没有任何用处,所以直接给他改成void类型的函数
然后改结构体的regs的类型,标记成了unsigned int*,我前面设置的是int,改成unsigned int看看:
这时候再看其他函数:
说明之前给函数参数的类型定义也不太对,则根据这个信息全改成unsigned int,最终结果:
接下来就是对每条指令进行逆向
case0:
先自增,再赋值,可以推测这个应该是push param
,而0x2ee8则是栈顶寄存器,改成rsp,函数命名为push_int
case1:
类似上一个,命名为push_reg
case2:
这个是出栈,pop_reg
case3:
给寄存器赋值mov_reg_int
然后有的指令不知道是什么可以暂时先跳过,接着看别的
这个是根据参数大小来设置寄存器值的,可以认为是一个条件寄存器,指令为cmp
输入一个字符到0号寄存器
输出0号寄存器内容
1f40这里是一个1000大小的int数组,推测这里可能是一个用来存放数据的数组,记为data
这个可以看出2ef0应该是rbp,指令是mov rbp, rsp
根据这个可以看出栈是用rsp作为index,所以数据结构开头的1000个int应该是stack,这个指令是leave
这个应该是ret指令
这个稍微有些复杂,实际上就是把rbp的地址转成字符串,再转回地址到变量i,然后根据变量i去栈里找到对应的idx,再压入栈里,可以理解成push rbp,其他指令类似,最终效果:
然后根据这些结果,我们就可以写个脚本把binary转换成汇编代码了
codes = open("binary", "rb").read()
reg_list = {
0: 'rax',
1: 'rbx',
2: 'rcx',
3: 'rdx'
}
for i in range(len(codes) // 12):
code = codes[i*12:i*12+12]
instrument = int.from_bytes(code[0:4], 'little')
data1 = int.from_bytes(code[4:8], 'little')
data2 = int.from_bytes(code[8:12], 'little')
print("{:3}. {} -> ".format(i, hex(int.from_bytes(code, 'little'))[2:].rjust(24, '0')), end='')
if instrument == 0:
print("push {}".format(data1), end='')
if instrument == 1:
print("push {}".format(reg_list[data1]), end='')
if instrument == 2:
print("pop {}".format(reg_list[data1]), end='')
if instrument == 3: # mov
if data1 >= 32 and data1 <= 126:
char = chr(data1)
else:
char = '\\x' + hex(data1)[2:].rjust(2, '0')
print("mov {}, {}({})".format(reg_list[data2], hex(data1), char), end='')
if instrument == 4:
print("mov {}, data[{}]".format(reg_list[data2], data1), end='')
if instrument == 5:
print("add {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 6:
print("sub {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 7:
print("mul {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 8:
print("div {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 9:
print("xor {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 10:
print("mov {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 11: #
print("mov data[{}], {}".format(data2, reg_list[data1]), end='')
if instrument == 12:
print("mov data[{}], {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 13:
print("inc {}".format(reg_list[data1]), end='')
if instrument == 14:
print("dec {}".format(reg_list[data1]), end='')
if instrument == 15:
print("cmp {}, {}".format(data2, reg_list[data1]), end='')
if instrument == 16:
print("cmp {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 17:
print("jl code[{}]".format(data1), end='')
if instrument == 18:
print("push {}; call code[{}]".format(data2, data1), end='')
if instrument == 19:
print("push rbp", end='')
if instrument == 20:
print("mov rbp, rsp", end='')
if instrument == 21:
print("mov rsp, rbp", end='')
if instrument == 22:
print("pop rbp", end='')
if instrument == 23:
print("pop rip, ret", end='')
if instrument == 24:
print("mov [rsp-{}], {}".format(((~data1+1)&0xffffffff), reg_list[data2]), end='')
if instrument == 25:
print("add {}, {}".format(reg_list[data2], data1), end='')
if instrument == 26:
print("sub {}, {}".format(reg_list[data2], data1, end=''))
if instrument == 27:
print("mov {}, data[{}]".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 28:
print("mov data[{}], {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 29:
print("jne code[{}]".format(data1), end='')
if instrument == 30:
print("jmp code[{}]".format(data1), end='')
if instrument == 50:
print("in rax(int)", end='')
if instrument == 51:
print("out rax(int)", end='')
if instrument == 52:
print("out rax(char)", end='')
if instrument == 53:
print("in rax(char)", end='')
if instrument == 54:
print("hlt", end='')
print()
得到一个结果:
0. 000000000000008f00000003 -> mov rax, 0x8f(\x8f)
1. 00000000000000000000000b -> mov data[0], rax
2. 000000000000007700000003 -> mov rax, 0x77(w)
3. 00000001000000000000000b -> mov data[1], rax
4. 000000000000003300000003 -> mov rax, 0x33(3)
5. 00000002000000000000000b -> mov data[2], rax
6. 000000000000006e00000003 -> mov rax, 0x6e(n)
7. 00000003000000000000000b -> mov data[3], rax
8. 000000000000006c00000003 -> mov rax, 0x6c(l)
9. 00000004000000000000000b -> mov data[4], rax
10. 000000000000003100000003 -> mov rax, 0x31(1)
11. 00000005000000000000000b -> mov data[5], rax
12. 000000000000006e00000003 -> mov rax, 0x6e(n)
13. 00000006000000000000000b -> mov data[6], rax
14. 000000000000006700000003 -> mov rax, 0x67(g)
15. 00000007000000000000000b -> mov data[7], rax
16. 000000000000006c00000003 -> mov rax, 0x6c(l)
17. 00000008000000000000000b -> mov data[8], rax
18. 000000000000006f00000003 -> mov rax, 0x6f(o)
19. 00000009000000000000000b -> mov data[9], rax
20. 000000000000007600000003 -> mov rax, 0x76(v)
21. 0000000a000000000000000b -> mov data[10], rax
22. 000000000000003300000003 -> mov rax, 0x33(3)
23. 0000000b000000000000000b -> mov data[11], rax
24. 000000000000006a00000003 -> mov rax, 0x6a(j)
25. 0000000c000000000000000b -> mov data[12], rax
26. 000000000000006300000003 -> mov rax, 0x63(c)
27. 0000000d000000000000000b -> mov data[13], rax
28. 000000000000006800000003 -> mov rax, 0x68(h)
29. 0000000e000000000000000b -> mov data[14], rax
30. 000000000000006500000003 -> mov rax, 0x65(e)
31. 0000000f000000000000000b -> mov data[15], rax
32. 000000000000003100000003 -> mov rax, 0x31(1)
33. 00000010000000000000000b -> mov data[16], rax
34. 000000000000003400000003 -> mov rax, 0x34(4)
35. 00000011000000000000000b -> mov data[17], rax
36. 000000000000003300000003 -> mov rax, 0x33(3)
37. 00000012000000000000000b -> mov data[18], rax
38. 000000000000009e00000003 -> mov rax, 0x9e(\x9e)
39. 00000013000000000000000b -> mov data[19], rax
40. 00000000000000c000000003 -> mov rax, 0xc0(\xc0)
41. 00000014000000000000000b -> mov data[20], rax
42. 00000000000000cd00000003 -> mov rax, 0xcd(\xcd)
43. 00000015000000000000000b -> mov data[21], rax
44. 00000000000000cb00000003 -> mov rax, 0xcb(\xcb)
45. 00000016000000000000000b -> mov data[22], rax
46. 000000000000008500000003 -> mov rax, 0x85(\x85)
47. 00000017000000000000000b -> mov data[23], rax
48. 000000000000008400000003 -> mov rax, 0x84(\x84)
49. 00000018000000000000000b -> mov data[24], rax
50. 000000000000009800000003 -> mov rax, 0x98(\x98)
51. 00000019000000000000000b -> mov data[25], rax
52. 000000000000008e00000003 -> mov rax, 0x8e(\x8e)
53. 0000001a000000000000000b -> mov data[26], rax
54. 000000000000009d00000003 -> mov rax, 0x9d(\x9d)
55. 0000001b000000000000000b -> mov data[27], rax
56. 000000000000008300000003 -> mov rax, 0x83(\x83)
57. 0000001c000000000000000b -> mov data[28], rax
58. 000000000000008e00000003 -> mov rax, 0x8e(\x8e)
59. 0000001d000000000000000b -> mov data[29], rax
60. 000000000000008e00000003 -> mov rax, 0x8e(\x8e)
61. 0000001e000000000000000b -> mov data[30], rax
62. 00000000000000d200000003 -> mov rax, 0xd2(\xd2)
63. 0000001f000000000000000b -> mov data[31], rax
64. 00000000000000fb00000003 -> mov rax, 0xfb(\xfb)
65. 00000020000000000000000b -> mov data[32], rax
66. 000000000000001a00000003 -> mov rax, 0x1a(\x1a)
67. 00000021000000000000000b -> mov data[33], rax
68. 000000000000005700000003 -> mov rax, 0x57(W)
69. 00000022000000000000000b -> mov data[34], rax
70. 000000000000005200000003 -> mov rax, 0x52(R)
71. 00000023000000000000000b -> mov data[35], rax
72. 00000000000000ef00000003 -> mov rax, 0xef(\xef)
73. 00000024000000000000000b -> mov data[36], rax
74. 000000000000006900000003 -> mov rax, 0x69(i)
75. 000000000000000000000034 -> out rax(char)
76. 000000000000006e00000003 -> mov rax, 0x6e(n)
77. 000000000000000000000034 -> out rax(char)
78. 000000000000007000000003 -> mov rax, 0x70(p)
79. 000000000000000000000034 -> out rax(char)
80. 000000000000007500000003 -> mov rax, 0x75(u)
81. 000000000000000000000034 -> out rax(char)
82. 000000000000007400000003 -> mov rax, 0x74(t)
83. 000000000000000000000034 -> out rax(char)
84. 000000000000002000000003 -> mov rax, 0x20( )
85. 000000000000000000000034 -> out rax(char)
86. 000000000000007900000003 -> mov rax, 0x79(y)
87. 000000000000000000000034 -> out rax(char)
88. 000000000000006f00000003 -> mov rax, 0x6f(o)
89. 000000000000000000000034 -> out rax(char)
90. 000000000000007500000003 -> mov rax, 0x75(u)
91. 000000000000000000000034 -> out rax(char)
92. 000000000000007200000003 -> mov rax, 0x72(r)
93. 000000000000000000000034 -> out rax(char)
94. 000000000000002000000003 -> mov rax, 0x20( )
95. 000000000000000000000034 -> out rax(char)
96. 000000000000006600000003 -> mov rax, 0x66(f)
97. 000000000000000000000034 -> out rax(char)
98. 000000000000006c00000003 -> mov rax, 0x6c(l)
99. 000000000000000000000034 -> out rax(char)
100. 000000000000006100000003 -> mov rax, 0x61(a)
101. 000000000000000000000034 -> out rax(char)
102. 000000000000006700000003 -> mov rax, 0x67(g)
103. 000000000000000000000034 -> out rax(char)
104. 000000000000003a00000003 -> mov rax, 0x3a(:)
105. 000000000000000000000034 -> out rax(char)
106. 000000000000000a00000003 -> mov rax, 0xa(\x0a)
107. 000000000000000000000034 -> out rax(char)
108. 000000010000002500000003 -> mov rbx, 0x25(%)
109. 000000000000000000000035 -> in rax(char)
110. 00000001000000000000000c -> mov data[rbx], rax
111. 00000000000000010000000d -> inc rbx
112. 00000036000000010000000f -> cmp 54, rbx
113. 000000000000006d00000011 -> jl code[109]
114. 000000010000000000000003 -> mov rbx, 0x0(\x00)
115. 000000000000000100000001 -> push rbx
116. 000000020000000000000004 -> mov rcx, data[0]
117. 000000000000000200000001 -> push rcx
118. 00000077000000dd00000012 -> push 119; call code[221]
119. 000000000000000200000002 -> pop rcx
120. 000000000000000100000002 -> pop rbx
121. 00000000000000000000000b -> mov data[0], rax
122. 00000000000000010000000d -> inc rbx
123. 00000012000000010000000f -> cmp 18, rbx
124. 00000000000000730000001d -> jne code[115]
125. 000000000000001300000004 -> mov rax, data[19]
126. 000000010000002500000004 -> mov rbx, data[37]
127. 000000010000000000000010 -> cmp rbx, rax
128. 00000000000000d20000001d -> jne code[210]
129. 000000000000001400000004 -> mov rax, data[20]
130. 000000010000002600000004 -> mov rbx, data[38]
131. 000000010000000000000010 -> cmp rbx, rax
132. 00000000000000d20000001d -> jne code[210]
133. 000000000000001500000004 -> mov rax, data[21]
134. 000000010000002700000004 -> mov rbx, data[39]
135. 000000010000000000000010 -> cmp rbx, rax
136. 00000000000000d20000001d -> jne code[210]
137. 000000000000001600000004 -> mov rax, data[22]
138. 000000010000002800000004 -> mov rbx, data[40]
139. 000000010000000000000010 -> cmp rbx, rax
140. 00000000000000d20000001d -> jne code[210]
141. 000000000000001700000004 -> mov rax, data[23]
142. 000000010000002900000004 -> mov rbx, data[41]
143. 000000010000000000000010 -> cmp rbx, rax
144. 00000000000000d20000001d -> jne code[210]
145. 000000000000001800000004 -> mov rax, data[24]
146. 000000010000002a00000004 -> mov rbx, data[42]
147. 000000010000000000000010 -> cmp rbx, rax
148. 00000000000000d20000001d -> jne code[210]
149. 000000000000001900000004 -> mov rax, data[25]
150. 000000010000002b00000004 -> mov rbx, data[43]
151. 000000010000000000000010 -> cmp rbx, rax
152. 00000000000000d20000001d -> jne code[210]
153. 000000000000001a00000004 -> mov rax, data[26]
154. 000000010000002c00000004 -> mov rbx, data[44]
155. 000000010000000000000010 -> cmp rbx, rax
156. 00000000000000d20000001d -> jne code[210]
157. 000000000000001b00000004 -> mov rax, data[27]
158. 000000010000002d00000004 -> mov rbx, data[45]
159. 000000010000000000000010 -> cmp rbx, rax
160. 00000000000000d20000001d -> jne code[210]
161. 000000000000001c00000004 -> mov rax, data[28]
162. 000000010000002e00000004 -> mov rbx, data[46]
163. 000000010000000000000010 -> cmp rbx, rax
164. 00000000000000d20000001d -> jne code[210]
165. 000000000000001d00000004 -> mov rax, data[29]
166. 000000010000002f00000004 -> mov rbx, data[47]
167. 000000010000000000000010 -> cmp rbx, rax
168. 00000000000000d20000001d -> jne code[210]
169. 000000000000001e00000004 -> mov rax, data[30]
170. 000000010000003000000004 -> mov rbx, data[48]
171. 000000010000000000000010 -> cmp rbx, rax
172. 00000000000000d20000001d -> jne code[210]
173. 000000000000001f00000004 -> mov rax, data[31]
174. 000000010000003100000004 -> mov rbx, data[49]
175. 000000010000000000000010 -> cmp rbx, rax
176. 00000000000000d20000001d -> jne code[210]
177. 000000000000002000000004 -> mov rax, data[32]
178. 000000010000003200000004 -> mov rbx, data[50]
179. 000000010000000000000010 -> cmp rbx, rax
180. 00000000000000d20000001d -> jne code[210]
181. 000000000000002100000004 -> mov rax, data[33]
182. 000000010000003300000004 -> mov rbx, data[51]
183. 000000010000000000000010 -> cmp rbx, rax
184. 00000000000000d20000001d -> jne code[210]
185. 000000000000002200000004 -> mov rax, data[34]
186. 000000010000003400000004 -> mov rbx, data[52]
187. 000000010000000000000010 -> cmp rbx, rax
188. 00000000000000d20000001d -> jne code[210]
189. 000000000000002300000004 -> mov rax, data[35]
190. 000000010000003500000004 -> mov rbx, data[53]
191. 000000010000000000000010 -> cmp rbx, rax
192. 00000000000000d20000001d -> jne code[210]
193. 000000000000002400000004 -> mov rax, data[36]
194. 000000010000003600000004 -> mov rbx, data[54]
195. 000000010000000000000010 -> cmp rbx, rax
196. 00000000000000d20000001d -> jne code[210]
197. 000000000000007200000003 -> mov rax, 0x72(r)
198. 000000000000000000000034 -> out rax(char)
199. 000000000000006900000003 -> mov rax, 0x69(i)
200. 000000000000000000000034 -> out rax(char)
201. 000000000000006700000003 -> mov rax, 0x67(g)
202. 000000000000000000000034 -> out rax(char)
203. 000000000000006800000003 -> mov rax, 0x68(h)
204. 000000000000000000000034 -> out rax(char)
205. 000000000000007400000003 -> mov rax, 0x74(t)
206. 000000000000000000000034 -> out rax(char)
207. 000000000000000a00000003 -> mov rax, 0xa(\x0a)
208. 000000000000000000000034 -> out rax(char)
209. 00000000000000dc0000001e -> jmp code[220]
210. 000000000000007700000003 -> mov rax, 0x77(w)
211. 000000000000000000000034 -> out rax(char)
212. 000000000000007200000003 -> mov rax, 0x72(r)
213. 000000000000000000000034 -> out rax(char)
214. 000000000000006f00000003 -> mov rax, 0x6f(o)
215. 000000000000000000000034 -> out rax(char)
216. 000000000000006e00000003 -> mov rax, 0x6e(n)
217. 000000000000000000000034 -> out rax(char)
218. 000000000000006700000003 -> mov rax, 0x67(g)
219. 000000000000000000000034 -> out rax(char)
220. 000000000000000000000036 -> hlt
221. 000000000000000000000013 -> push rbp
222. 000000000000000000000014 -> mov rbp, rsp
223. 00000003fffffffd00000018 -> mov [rsp-3], rdx
224. 000000030000002500000019 -> add rdx, 37
225. 00000002000000030000001b -> mov rcx, data[rdx]
226. 00000003fffffffd00000018 -> mov [rsp-3], rdx
227. 000000030000000100000019 -> add rdx, 1
228. 00000000000000030000001b -> mov rax, data[rdx]
229. 00000003fffffffd00000018 -> mov [rsp-3], rdx
230. 000000020000000300000005 -> add rcx, rdx
231. 000000000000000200000009 -> xor rax, rcx
232. 00000003fffffffe00000018 -> mov [rsp-2], rdx
233. 000000000000000300000009 -> xor rax, rdx
234. 00000003fffffffd00000018 -> mov [rsp-3], rdx
235. 000000030000002500000019 -> add rdx, 37
236. 00000003000000000000001c -> mov data[rdx], rax
237. 000000000000000000000015 -> mov rsp, rbp
238. 000000000000000000000016 -> pop rbp
239. 000000000000000000000017 -> pop rip, ret
汇编不是很长,直接看汇编大概也可以看懂逻辑,实际上就是经典异或,根据汇编写出逆向脚本:
data = [119, 51, 110, 108, 49, 110, 103, 108, 111, 118, 51, 106, 99, 104, 101, 49, 52, 51]
data2 = [158, 192, 205, 203, 133, 132, 152, 142, 157, 131, 142, 142, 210, 251, 26, 87, 82, 239]
flag = ''
temp = 0x8f
for i in range(18):
flag += chr(((data2[i]) ^ data[i] ^ temp) - i)
temp = data2[i]
print(flag)
在逆向过程中如果能有效恢复结构体和数据类型,会让代码可读性提高很多,可以加快理解程序功能。