实践:
程序分析对象:太阿目前的污点分析需要指定程序入口,否则只使用 Soot 前端生成 Jimple 三地址码
思路:Javaee 指的是 Web 程序,在 Javaee 中存在多个 Servlet/Controller,需要将多个路由对应的方法加入到太阿分析的入口点 Entrypoint(这也是调用图和 ICFG 的入口)。
太阿的一个特点是它的分析插件系统,通过与指针分析交互能够以模块化的方式添加新的分析到框架中,相关目录是 pascal/taie/analysis/pta/plugin。
插件系统是如何工作的:
太阿调用指针分析类的方法 pascal/taie/analysis/pta/PointerAnalysis#runAnalysis() 进行分析
private PointerAnalysisResult runAnalysis(HeapModel heapModel, ContextSelector selector) { AnalysisOptions options = getOptions(); Solver solver = new DefaultSolver(options, heapModel, selector, new MapBasedCSManager()); // The initialization of some Plugins may read the fields in solver, // e.g., contextSelector or csManager, thus we initialize Plugins // after setting all other fields of solver. setPlugin(solver, options); solver.solve(); return solver.getResult(); }
跟进插件配置方法 pascal/taie/analysis/pta/PointerAnalysis#setPlugin(),创建一个插件组并加载插件
private static void setPlugin(Solver solver, AnalysisOptions options) { CompositePlugin plugin = new CompositePlugin(); // add builtin plugins // To record elapsed time precisely, AnalysisTimer should be added at first. plugin.addPlugin( new AnalysisTimer(), new EntryPointHandler(), new ClassInitializer(), new ThreadHandler(), new NativeModeller(), new ExceptionAnalysis() ); ... plugin.setSolver(solver); solver.setPlugin(plugin);
solver 用于装载指针分析的环境信息和插件,进行指针分析后返回指针分析的结果。
查看接口文件 pascal/taie/analysis/pta/core/solver/Solver.java,结合 pascal/taie/analysis/pta/PointerAnalysis.java 指针分析类的代码,可以看到
如下代码是 Solver 接口的代码,逐个查看并进行注解,主要包含:
public interface Solver { // 运行参数 AnalysisOptions getOptions(); // 类继承关系 ClassHierarchy getHierarchy(); // 查询类型 TypeSystem getTypeSystem(); // 获取堆模型 HeapModel getHeapModel(); // 管理上下文敏感要素 CSManager getCSManager(); // 表示上下文敏感变体 ContextSelector getContextSelector(); // 方法调用图,包括方法节点和调用边 CallGraph<CSCallSite, CSMethod> getCallGraph(); // 获取指针集,即指针指向的对象集合 PointsToSet getPointsToSetOf(Pointer pointer); // 创建一个空的指向关系集合 PointsToSet makePointsToSet(); // 为求解器设置插件 void setPlugin(Plugin plugin); // 求解器的起始点 void solve(); // 添加指针及其指针集合 void addPointsTo(Pointer pointer, PointsToSet pts); // 添加指针及其上下文敏感对象 void addPointsTo(Pointer pointer, CSObj csObj); // 添加指针、上下文的堆、抽象对象 void addPointsTo(Pointer pointer, Context heapContext, Obj obj); // 添加指针、空的上下文、抽象对象 default void addPointsTo(Pointer pointer, Obj obj) { addPointsTo(pointer, getContextSelector().getEmptyContext(), obj); } // 添加指向关系集合 void addVarPointsTo(Context context, Var var, PointsToSet pts); void addVarPointsTo(Context context, Var var, CSObj csObj); void addVarPointsTo(Context context, Var var, Context heapContext, Obj obj); default void addVarPointsTo(Context context, Var var, Obj obj) { addVarPointsTo(context, var, getContextSelector().getEmptyContext(), obj); } void addPointerFilter(Pointer pointer, Predicate<CSObj> filter); // 添加指针流向图的边 default void addPFGEdge(Pointer source, Pointer target, FlowKind kind) { addPFGEdge(source, target, kind, Identity.get()); } // 添加指针流向图 default void addPFGEdge(Pointer source, Pointer target, FlowKind kind, Type type) { addPFGEdge(source, target, kind, new TypeFilter(type, this)); } void addPFGEdge(Pointer source, Pointer target, FlowKind kind, Transfer transfer); // 添加入口点 void addEntryPoint(EntryPoint entryPoint); // 添加调用边 void addCallEdge(Edge<CSCallSite, CSMethod> edge); // 添加上下文敏感方法 void addCSMethod(CSMethod csMethod); void addStmts(CSMethod csMethod, Collection<Stmt> stmts); void addIgnoredMethod(JMethod method); void initializeClass(JClass cls); // 获取指针分析的结果 PointerAnalysisResult getResult(); }
查看文件 pascal/taie/analysis/pta/core/solver/DefaultSolver.java,找到入口方法 solve(),作用是运行指针分析算法。
@Override public void solve() { initialize(); analyze(); }
查看初始化方法 initialize(),
查看分析方法 analyze(),
太阿的插件用于执行指针分析的各种算法任务,pascal/taie/analysis/pta/plugin/Plugin.java 接口类有几个生命周期函数:
查看默认求解器 DefaultSolver 发现,初始化求解器时会执行所有插件的 onStart() 方法。
查看指针分析主类 pascal/taie/analysis/pta/PointerAnalysis.java 的 setPlugin() 方法,其中内置加载的插件有 EntryPointHandler()。
plugin.addPlugin( new AnalysisTimer(), new EntryPointHandler(), new ClassInitializer(), new ThreadHandler(), new NativeModeller(), new ExceptionAnalysis() );
查看 pascal/taie/analysis/pta/plugin/EntryPointHandler.java,只有两个方法,其中 onStart() 方法:
@Override public void setSolver(Solver solver) { this.solver = solver; } @Override public void onStart() { // process program main method JMethod main = World.get().getMainMethod(); if (main != null) { solver.addEntryPoint(new EntryPoint(main, new DeclaredParamProvider(main, solver.getHeapModel(), 1))); } // process implicit entries if (solver.getOptions().getBoolean("implicit-entries")) { for (JMethod entry : World.get().getImplicitEntries()) { solver.addEntryPoint(new EntryPoint(entry, EmptyParamProvider.get())); } } }
简单查看入口点的数据结构,找到 pascal/taie/analysis/pta/core/solver/DefaultSolver.java 的 addEntryPoint() 方法,传入 EntryPoint 程序入口点对象
@Override public void addEntryPoint(EntryPoint entryPoint) {...}
查看 Record 记录类 pascal/taie/analysis/pta/core/solver/EntryPoint.java,入口被定义为 method 及其 parameters
public record EntryPoint(JMethod method, ParamProvider paramProvider) {...}
查看了入口点的数据结构之后,回头看 pascal/taie/analysis/pta/plugin/EntryPointHandler.java,其根据 main class 及其 main 方法添加指针分析入口点的主要代码如下,
solver.addEntryPoint(new EntryPoint(main, new DeclaredParamProvider(main, solver.getHeapModel(), 1)));
动态调试打印 main,得到函数签名 <Server: void main(java.lang.String[])>。
可以直接在 EntryPointHandler 插件里添加入口点方法,比如添加某 JavaWeb 系统的 Servlet 方法为指针分析的入口,分析其参数实例得到函数签名:<DecryptApplicationService2: void service(com.oreilly.servlet.MultipartWrapper, javax.servlet.http.HttpServletResponse)>
操作发现,通过 new EntryPoint(JMethod, ParamProvider) 创建入口点,函数签名必须是 JMethod 类型。追溯 main 是如何赋值 JMethod 类型的,发现
soot.Scene.java#getMainMethod() 方法如下,
public SootMethod getMainMethod() { if (!hasMainClass()) { throw new RuntimeException("There is no main class set!"); } SootMethod mainMethod = Options.v().src_prec() != Options.src_prec_dotnet ? mainClass.getMethodUnsafe("main", Collections.singletonList(ArrayType.v(RefType.v("java.lang.String"), 1)), VoidType.v()) : mainClass.getMethodUnsafe("Main", Collections.singletonList(ArrayType.v(RefType.v(DotnetBasicTypes.SYSTEM_STRING), 1)), VoidType.v()); if (mainMethod == null) { throw new RuntimeException("Main class declares no main method!"); } return mainMethod; }
查看参考资料,发现太阿的插件通过封装方法可以获取到 new EntryPoint(JMethod, ParamProvider) 的参数对象,比如 pascal/taie/analysis/pta/plugin/ThreadHandler.java 的 onStart() 方法。
@Override public void onStart() { if (!solver.getOptions().getBoolean("implicit-entries")) { return; } TypeSystem typeSystem = solver.getTypeSystem(); HeapModel heapModel = solver.getHeapModel(); // setup system thread group JMethod threadGroupInit = requireNonNull( hierarchy.getJREMethod("<java.lang.ThreadGroup: void <init>()>")); ClassType threadGroup = typeSystem.getClassType(ClassNames.THREAD_GROUP); Obj systemThreadGroup = heapModel.getMockObj(Descriptor.ENTRY_DESC, "<system-thread-group>", threadGroup); solver.addEntryPoint(new EntryPoint(threadGroupInit, new SpecifiedParamProvider.Builder(threadGroupInit) .addThisObj(systemThreadGroup) .build()));
查看代码时发现 src/test/java/pascal/taie/analysis/pta/CustomEntryPointPlugin.java,该类就是自定义入口点的测试类。
创建一个插件目录和插件类 pascal/taie/analysis/pta/plugin/pumpkin/TomcatEntry.java,参考 CustomEntryPointPlugin 的插件编写代码。
CustomEntryPointPlugin 插件的代码很全,只需要填充类名、方法名、参数类型即可。
获取自定义入口方法所在的类,这里写入测试类:
@Override public void onStart() { JClass clz = hierarchy.getClass("com.esafenet.servlet.service.smartsec.DecryptApplicationService2"); assert clz != null;
入口点方法的参数分成三类,首先调用 clz.getDeclaredMethod(method) 获取入口点方法,然后根据不同的情况分别获取方法的参数,代码如下。
如果是空参数,则 ParamProvider 使用 EmptyParamProvider.get()
默认声明参数,传入堆和方法名 new DeclaredParamProvider(declaredParam1, heapModel, 1) 获取声明的参数类型。跟踪 DeclaredParamProvider() 方法,定位到 pascal/taie/analysis/pta/core/solver/DeclaredParamProvider#generateObjs() 方法查看具体操作
// 声明参数: emptyParam、declaredParam JMethod emptyParam = clz.getDeclaredMethod("entryWithEmptyParam"); assert emptyParam != null; solver.addEntryPoint(new EntryPoint(emptyParam, EmptyParamProvider.get())); JMethod declaredParam1 = clz.getDeclaredMethod("entryWithDeclaredParam1"); assert declaredParam1 != null; solver.addEntryPoint(new EntryPoint( declaredParam1, new DeclaredParamProvider(declaredParam1, heapModel, 1))); ... // 特定参数: specifiedParam JMethod specifiedParam = clz.getDeclaredMethod("entryWithSpecifiedParam"); assert specifiedParam != null; SpecifiedParamProvider.Builder paramProviderBuilder = new SpecifiedParamProvider.Builder(specifiedParam); Obj thisObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{this}", clz.getType(), specifiedParam); Obj p0 = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{0}", specifiedParam.getParamType(0), specifiedParam); Obj p1 = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{1}", specifiedParam.getParamType(1), specifiedParam); Obj stringObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{0}.s1", typeSystem.getType(ClassNames.STRING), specifiedParam); Obj param1Obj = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{1}[*]", typeSystem.getType("Param1"), specifiedParam); JField s1Field = hierarchy.getField("<Param1: java.lang.String s1>"); paramProviderBuilder.addThisObj(thisObj) .addParamObj(0, p0) .addFieldObj(p0, s1Field, stringObj) .addParamObj(1, p1) .addArrayObj(p1, param1Obj) .setDelegate(new DeclaredParamProvider(specifiedParam, heapModel)); solver.addEntryPoint(new EntryPoint(specifiedParam, paramProviderBuilder.build()));
参考 CustomEntryPointPlugin.java 编写自定义插件 TomcatEntry.java,
package pascal.taie.analysis.pta.plugin.pumpkin; import pascal.taie.World; import pascal.taie.analysis.pta.core.heap.Descriptor; import pascal.taie.analysis.pta.core.heap.HeapModel; import pascal.taie.analysis.pta.core.heap.Obj; import pascal.taie.analysis.pta.core.solver.DeclaredParamProvider; import pascal.taie.analysis.pta.core.solver.EntryPoint; import pascal.taie.analysis.pta.core.solver.Solver; import pascal.taie.analysis.pta.core.solver.SpecifiedParamProvider; import pascal.taie.analysis.pta.plugin.Plugin; import pascal.taie.language.classes.ClassHierarchy; import pascal.taie.language.classes.JClass; import pascal.taie.language.classes.JMethod; import pascal.taie.language.type.TypeSystem; public class TomcatEntry implements Plugin { private Solver solver; private ClassHierarchy hierarchy; private TypeSystem typeSystem; private HeapModel heapModel; @Override public void setSolver(Solver solver) { this.solver = solver; this.hierarchy = solver.getHierarchy(); this.typeSystem = solver.getTypeSystem(); this.heapModel = solver.getHeapModel(); } @Override public void onStart() { JClass clz = hierarchy.getClass("com.esafenet.servlet.service.smartsec.DecryptApplicationService2"); assert clz != null; JMethod specifiedParam = clz.getDeclaredMethod("service"); assert specifiedParam != null; JClass reqWrapper = World.get().getClassHierarchy().getClass("javax.servlet.ServletRequestWrapper"); JClass respWrapper = World.get().getClassHierarchy().getClass("javax.servlet.http.HttpServletResponseWrapper"); SpecifiedParamProvider.Builder paramProviderBuilder = new SpecifiedParamProvider.Builder(specifiedParam); Obj thisObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{this}", clz.getType(), specifiedParam); Obj p0 = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{0}", reqWrapper.getType(), specifiedParam); Obj p1 = heapModel.getMockObj(Descriptor.ENTRY_DESC, "MethodParam{1}", respWrapper.getType(), specifiedParam); paramProviderBuilder.addThisObj(thisObj) .addParamObj(0, p0) .addParamObj(1, p1) .setDelegate(new DeclaredParamProvider(specifiedParam, heapModel)); System.out.println("build对象"); System.out.println(paramProviderBuilder.build().getParamObjs(0)); solver.addEntryPoint(new EntryPoint(specifiedParam, paramProviderBuilder.build())); } }
编辑 options.yml 运行配置文件,添加插件参数,导入项目有关 jar 包
appClassPath:
- java-benchmarks/general/javax.servlet-api-4.0.0.jar
- java-benchmarks/yisaitong/CDGServer3/WEB-INF/lib/JSPSmart.jar
- java-benchmarks/yisaitong/CDGServer3/WEB-INF/lib/Rijndael.jar
- java-benchmarks/yisaitong/CDGServer3/WEB-INF/lib/UserAuth.jar
- ...
mainClass:
inputClasses: []
...
analyses:
pta: cs:ci;plugins:[pascal.taie.analysis.pta.plugin.pumpkin.TomcatEntry];
运行结果如下,证明找到了插件添加的 com.esafenet.servlet.service.smartsec.DecryptApplicationService2 入口点,进行了 4.58s 的指针分析。
[Pointer analysis] elapsed time: 3.99s
Detected 0 taint flow(s):
TFGDumper starts ...
Source nodes:
Sink nodes:
Dumping /.../Tai-e-master/output/taint-flow-graph.dot
TFGDumper finishes, elapsed time: 0.18s
-------------- Pointer analysis statistics: --------------
#var pointers: 8,2402 (insens) / 8,2402 (sens)
#objects: 5345 (insens) / 5345 (sens)
#var points-to: 118,0422 (insens) / 118,0422 (sens)
#static field points-to: 1806 (sens)
#instance field points-to: 14,4745 (sens)
#array points-to: 1,1705 (sens)
#reachable methods: 8067 (insens) / 8067 (sens)
#call graph edges: 4,3095 (insens) / 4,3096 (sens)
----------------------------------------
pta finishes, elapsed time: 4.58s
Tai-e finishes, elapsed time: 114.80s
入口点添加成功,但测试发现 Source nodes 一直为空,说明 req 没有匹配上 sources 的 req.getParameter() 方法。
如下的 source 程序的堆里定位到了,说明是 req 的设置问题。
sources:
- { kind: call, method: "<javax.servlet.ServletRequestWrapper: java.lang.String getParameter(java.lang.String)>", index: result }
首先来到插件代码,打印入口点的信息:
System.out.println("build对象"); System.out.println(paramProviderBuilder.build().getParamObjs(0));
发现第一个参数 req 的对象信息如下所示,看起来没问题。
[EntryPointObj{alloc=MethodParam{0},type=javax.servlet.ServletRequestWrapper in <com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>}]
难道是 javax.servlet.ServletRequestWrapper 的问题?
修改插件代码,尝试指定 javax.servlet.http.HttpServletRequestWrapper 类型,
JClass reqWrapper = World.get().getClassHierarchy().getClass("javax.servlet.http.HttpServletRequestWrapper");
发现终于找到了污点对象及其数据流!
Detected 25 taint flow(s):
TaintFlow{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>[3@L82] $r2 = invokeinterface request.getParameter(%stringconst0)/result -> <java.io.FileInputStream: void <init>(java.lang.String)>[2@L93] invokespecial $r2.<init>(r1)/0}
...
TFGDumper starts ...
Source nodes:
VarNode{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>/$r2}
Sink nodes:
VarNode{<java.io.FilePermission: void init(int)>/$r30}
VarNode{<java.util.zip.ZipFile: void <init>(java.io.File,int,java.nio.charset.Charset)>/$r9}
...
为什么指定模拟参数为 javax.servlet.ServletRequestWrapper 不行?
在测试代码里,com.opensymphony.module.sitemesh.parser.PageRequest 继承了 HttpServletRequestWrapper,尝试把该类设置为模拟参数,发现也可以找到污点流。
[EntryPointObj{alloc=MethodParam{0},type=com.opensymphony.module.sitemesh.parser.PageRequest in <com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>}]
[Pointer analysis] elapsed time: 5.29s
Detected 25 taint flow(s):
TaintFlow{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>[3@L82] $r2 = invokeinterface request.getParameter(%stringconst0)/result -> <java.io.FileInputStream: void <init>(java.lang.String)>[2@L93] invokespecial $r2.<init>(r1)/0}
TaintFlow{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>[3@L82] $r2 = invokeinterface request.getParameter(%stringconst0)/result -> <java.io.FileOutputStream: void <init>(java.lang.String)>[2@L101] invokespecial $r2.<init>(r1)/0}
...
Source nodes:
VarNode{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>/$r2}
由此得出结论,模拟参数的对象是接口的实现类、或者接口实现类的子类时,才能生成 Source nodes。
关于 getParameter(str) 方法,
关于要设置的污点对象 <com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>
入口方法、及其参数的所在类不能是接口类,那只能选择 HttpServletRequest 的实现类 javax/servlet/http/HttpServletRequestWrapper 作为入口方法的参数。
在这种配置下,对 source 进行测试。
定义了 getParameter(str) 方法的顶级接口 ServletRequest,没有找到 Source nodes。
- { kind: call, method: "<javax.servlet.ServletRequest: java.lang.String getParameter(java.lang.String)>", index: result }
重写了 getParameter(str) 方法的实现类 javax.servlet.ServletRequestWrapper,找到了 Source nodes:
VarNode{<com.esafenet.servlet.service.smartsec.DecryptApplicationService2: void service(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)>/$r2}
- { kind: call, method: "<javax.servlet.ServletRequestWrapper: java.lang.String getParameter(java.lang.String)>", index: result }
没有声明或重写 getParameter(str) 方法的接口类 javax.servlet.http.HttpServletRequest,没有找到 Source nodes。
- { kind: call, method: "<javax.servlet.http.HttpServletRequest: java.lang.String getParameter(java.lang.String)>", index: result }
没有声明或重写 getParameter(str) 方法的实现类 javax.servlet.http.HttpServletRequestWrapper,也是入口点方法的实际参数之一,没有找到 Source nodes。
- { kind: call, method: "<javax.servlet.http.HttpServletRequestWrapper: java.lang.String getParameter(java.lang.String)>", index: result }
由此得出结论,source 设置方法的所在函数,必须满足两个条件:
将污点分析结果 .dot 转换成 svg 文件
dot -Tsvg -o taint-flow-graph.svg taint-flow-graph.dot
学习所得,不足之处欢迎师傅们指点纠正,不胜感激。