April 20, 2021 in Yara sigs
Writing Yara rules is easy. Writing good Yara rules is … testing – both as an adjective and a verb.
There is a class of Yara rules – the one that relies on actual machine code – that we can do better now.
How?
Your typical approach to writing code-based Yara sigs is relying on byte streams of machine code extracted from analyzed programs – usually a very specific code sequence of interest (e.g. RC4 algo, Luhn check routine, etc.). We then ‘patch’ offsets in jumps, calls, etc. to account for their variability.
Such Yara rules are common and pretty handy. They work most of the time, but there is a caveat. Compiler and malicious coder’s tricks may shift machine code around and as a result, some code sequences may differ. As such, a pretty decent Yara rule based on a very specific program code may fail on newer samples.
In order to improve efficiency of code-based Yara signatures we can now use capa.
You may be laughing now – capa itself is a detection engine. Given a bunch of samples, we could just run our capa rules over them and get detections we need. The problem is the speed. The second problem is that while Yara is supported by nearly everything that blinkenlights, Capa is not.
The best approach is therefore to analyze the code, write your good capa signature. And then, use it to test your Yara rules. Your Yara rule must detect the very same sampleset that Capa hits on. This is an iterative process, but allows to cherry-pick variants and subtle differences in implementation that can then lead you to improve your Yara sigs. Moreso, if you have other ways to detect samples as belonging to a certain malware family, you can then correlate it against your family-specific Capa- and Yara- rulesets and highlight missing Yara rules. Using the Capa output you could auto-generate Yara rules as well (although this is a bit silly w/o manual oversight /it would literally be like hashing, if blindly automated/).
The task of correlating the capa and yara detections/rules can be delegated to existing Python libraries – something along these lines:
import yara import capa.main import capa.rules from capa.features import ARCH_X32, ARCH_X64, String from capa.features.insn import Number, Offset ... yr = yara.compile(filepath='foo.yar') fm = yr.match(filename) if fm: ... fm[0] ... cr = capa.main.get_rules('foo.yml', disable_progress=True) cr.rules.RuleSet(cr) ex = cr.main.get_extractor (fn, "auto", disable_progress=True) ca, ccn = cr.main.find_capabilities(cr, ex, disable_progress=True) try: ... capabilities.keys() ... <print output, match, whatever> ...