Detection Engineering: Practicing Detection-as-Code – Tuning

Detection Engineering: Practicing Detection-as-Code – Tuning – Part 8
Detections should adapt to changes in the monitored environments. As organizations modify thei 2025-11-25 08:40:0 Author: blog.nviso.eu(查看原文) 阅读量:16 收藏

Detections should adapt to changes in the monitored environments. As organizations modify their infrastructure, through migrations, network reconfigurations, new systems or software, or deprecation of existing ones, detections require constant refinement to remain effective and maintain the alert queue at a manageable levels. Creating new detection rules involves researching, learning and understanding new tactics and techniques, problem-solving, and creative thinking to address the never-ending emerging threats. In contrast, tuning detections is a repetitive process that can feel mundane and less engaging by comparison.

In Part 7, we showcased how we can leverage automation to continuously monitor the performance and trigger rate of our deployed detections. In this part, we are going to investigate how we can introduce automation and utilize continuous deployment pipelines to streamline the tedious task of tuning our detections. We’ll provide examples from Microsoft Sentinel, particularly its Watchlists functionality, but the concepts provided here apply to other SIEM platforms as well, since most of them come with similar features.

Watchlists

Watchlists are custom data sets you can upload and use to enrich your detections, hunting, and investigations. For instance: if you have lists of high-privileged accounts, computer assets, IOCs, … you can upload them as a CSV file and reference them in your hunting rules or analytics. In this blog we will show how they can be used for tuning and filter-outs.

Under the hood, watchlist items are JSON objects that are added to a special Sentinel table “Watchlist”. The table is cached for query performance. You retrieve the elements in the watchlist by use of the _GetWatchlist() function.

There are some limitations you need to be aware of. Watchlists are not intended for large data volumes. The maximum number of active watchlist items (all watchlists combined) is 10 million, and file uploads (a single watchlist) are limited to 3.8 MB. Think about this: if you have a watchlist of network ranges, one item would be about 40 bytes (an IP address in CIDR notation and a range name), meaning you can store 100K network ranges – that is a lot of ranges. For our tuning detections, these limits are sufficient. If you need bigger files, you can upload your files to Azure Storage, but this is outside the scope of this blog. Refer to the MS documentation if you need this.¹

There is a retention period of 28 days. This can be confusing – it does not mean your watchlist items are deleted after 28 days. It means deleted items will be purged after 28 days. To keep the other items active, there is an automatic refresh interval of 12 days.

Watchlist Management

Creating watchlists can be done via the Sentinel web interface. This process is straightforward: you provide a name, an alias and a search key, and you point to the CSV file containing the data. The alias is the name you will use in your queries to reference the watchlist. The name can be something else. We keep them identical but we can think of scenarios where they have different names. For instance, if you are an MSSP, you will deal with multiple watchlists, so it can be convenient to have specific names for the watchlist, e.g. Servers_CustomerA, Servers_CustomersB, … but in the rules you need a general reference, which then would be “Servers”.

Creation of Servers watchlist in Sentinel

The SearchKey is the name of the column that you intend to use in joins or lookups. It is designed for that use, to make the query more performant. Think about this when defining a watchlist. Most of the time, it will be the value that you want to use in your query.

Editing watchlists can also be done via the web interface; this is a straightforward process.

As this blog is about Detection-as-Code, we manage our watchlists programmatically. There are two APIs. There is an API to create and edit watchlists², and there is an API to create and edit watchlist items³.

A small warning here – It might be tempting to go for an approach where you just delete and re-upload a watchlist, after it has been changed. That seems simple indeed: you do not have to care about the different possible operations (update, delete or create) and it makes your integration very straightforward. However, Microsoft advises against this because there is a 5-minute SLA for data ingestion. This means that when you delete a watchlist and you recreate it, you might see both versions active, but more importantly: it is also possible there is no watchlist item at all during that time window. As a consequence, any rule running during that window will fail: it will miss bad stuff (if you use it for blocklisting) or it will generate false positives (if you use it for allowlisting).

A possible workaround is to “hack” the _GetWatchlist() function, so that it also retrieves items that have been deleted in the last 5 minutes, but we did not go that route. Instead, to sync the watchlists stored in our repository with the watchlists on Sentinel, we built a script (watchlist_mgmt.py) that leverages Sentinel’s watchlist API to sync the watchlist items, without the need to delete and re-create watchlists or modify the _GetWatchlist() function. We will be using that script later in an Azure DevOps pipeline to introduce some automation to the tuning process, but for now we will go through its functionality.

A screenshot of the API consumer code is provided below:

The action “SyncWatchlistItems” of the script watchlist_mgmt.py processes watchlists defined within the “filters” directory. For each watchlist name provided, it reads the watchlist’s metadata from its JSON file and its contents from the CSV file. The script then creates or updates the watchlist (same API endpoint) in Sentinel using this information. Following this, it retrieves the existing watchlist items from Sentinel and compares them against the repository watchlist. If any items in Sentinel are absent from the repository, they are deleted from the platform’s watchlist. This ensures that the Sentinel watchlist remains consistent with the repository’s configuration, removing any unnecessary entries.

The screenshot below shows the implementation of the above logic.

Watchlist Organization

There are multiple ways of organizing your watchlists, and you need to think this through, otherwise, it can quickly become an organizational mess. We decided to organize our filter-outs into two kinds of watchlists:

Content Pack Watchlists

These contain variables that are only used by detections in a particular content pack (the concept of content packs is explained in part 2). They do not require of lot of upfront thinking. If you create a rule that requires a specific variable, you add the variable and its value to the watchlist, and you are ready.

There is a specific technical requirement – because you need to find the key-value pair for a particular detection rule, you need a reference to that rule in both the rule and the watchlist. As an example, say we have a content pack “windows_security” with a detection rule for DCSync, and we want to allow specific accounts that are permitted to execute the sync. The rule will look like this. It contains a reference to itself, which is used to filter for the variables that are applicable to the rule only.

let rule_uuid = "28748697-a290-4367-b67c-57f923e21848";
let AllowedAccounts = _GetWatchlist("windows_builtin")
| where column_ifexists("uuid", "") == rule_uuid
| where column_ifexists("Variable", "") == "AllowedAccounts"
| project SearchKey;
SecurityEvent
| where <DETECTION LOGIC>
| where Account !in (AllowedAccounts)

The example watchlist contains filter-outs for 2 different rules, identified by the UUID.

uuid	Variable	Value (*)	Tags
28748697-a290-4367-b67c-57f923e21848	AllowedAccounts	CONTOSO\aad_sync	Ticket-12345
28748697-a290-4367-b67c-57f923e21848	AllowedAccounts	CONTOSO\sp_sync	Ticket-12345
806fb458-5eaa-45d7-a8a9-78750c637744	AnotherVariable	SomeValue	Ticket-12346

windows_security watchlist
(*) Defined as SearchKey

Other columns we have created:

Variable: the name of the variable that is used in the filter-out
Value: its actual value. This column is defined as SearchKey
Tags: optional column that can be used to store a ticket id, or another reference to a filter-out process

If you run _GetWatchlist(“windows_security”), the entire watchlist looks like this:

By using the extra conditions in the rule, it will only return the relevant values. The result is stored in a single-column table “AllowedAccounts”. This table can then be used in the exclusion filter. Note that we use “column_ifexists” – this is to prevent the rule from failing in case the watchlist does not exist.

If you feel that the code in the detection rule is a bit clunky, especially if you have to use it in every rule, you can store it as a function “GetWatchlistValue”.

The rule would then look like this. Much nicer, indeed.

let rule_uuid = "28748697-a290-4367-b67c-57f923e21848";
let AllowedAccounts = GetWachlistValue("windows_builtin", rule_uuid, "AllowedAccounts");
SecurityEvent
| where <DETECTION LOGIC>
| where Account !in (AllowedAccounts)

Global Watchlists

Global watchlists contain data that can be used by multiple content packs: typically lists containing assets such as company domains, firewalls, scanners, servers, etc… If you know a particular IP is a scanner, you can then reference it in your firewall content pack, IDS content pack, WAF content pack, windows content pack, … without the need to maintain it multiple times.

Because global watchlist items are referenced by multiple rules across multiple content packs, you need to define them properly and do the necessary schema validations. For instance, if you have a watchlist containing servers, you might want to add the server role, like DC, DNS or Fileserver, or whatever you like, but make sure you use a consistent typology or you might break the filters in your rules.

Secondly, in your rules, you need to know what kind of object you are dealing with. For instance, user accounts can have multiple formats, depending on the log you are processing: sometimes you see the sAMAccountName (jdoe), sometimes the UPN ([email protected]) , sometimes with domain (CONTOSO\jdoe) or without: just the username like john.doe. Think about this when you organize the watchlist and create your query filters.

Thirdly, you need to know what data type you are working with. The Kusto operator you will use to filter for server roles will be different than the one for network names, as the first one is an array (a server can have multiple roles) and the second one is typically a label. As for naming convention, we adhere to the entities⁴ used in Sentinel. We are in that ecosystem anyway, and we already use them in our rules. Finally, as in the Content Pack watchlists, we also use tags that an analyst or a detection engineer can use to store a service ticket reference when adding a watchlist item.

Some examples of watchlists:

Servers
- Schema: HostName, IPAddress, Role[], Tags[]
- SearchKey: IPAddress
- Role[]: DC, DNS, …
Firewalls
- Schema: HostName, IPAddress, Tags[]
- SearchKey: IPAddress
Scanners
- Schema: HostName, IPAddress, Tags[]
- SearchKey: IPAddress
Networks
- Schema: Subnet, Name, Tags[]
- SearchKey: Subnet
- Name: Visitor, Byod, Clients, Servers, Management, …
DomainsInternal
- Schema: Domain, Tags[]
- SearchKey: Domain
DomainsTrusted
- Schema: Domain, Tags[]
- SearchKey: Domain
UserAccounts
- Schema: Name, UPNSuffix, Sid, AadUserId, Role[], Tags[]
- SearchKey: Name
- Role[]: Breakglass, Canary, Administrator, …

A perceptive reader will notice that if you combine these watchlists with an asset database or user directory, this approach becomes even more powerful. We do recommend building automations that populate (parts of) the watchlists, but this is outside the scope of this blog.

Some examples of using Global watchlists: Below we have a very simple detection for lateral movement. The idea is that there is usually no traffic to SMB ports on clients. If we have a watchlist that contains IP addresses of domain controllers and file servers and other servers that we expect to see SMB traffic to, then we could use the watchlist like this:

let ServersAllowList = _GetWatchlist('Servers') 
| project SearchKey;
DeviceNetworkEvents
| where RemotePort == 445
| where RemoteIP !in (ServersAllowList)

If you want to limit the filter-out to particular server roles only, you simply add a filter:

let ServersAllowList = _GetWatchlist('Servers') 
| where column_ifexists("Role","") has_any ("DC", "Fileserver")
| project SearchKey;
DeviceNetworkEvents
| where RemotePort == 445
| where RemoteIP !in (ServersAllowList)

If you want to work with ranges instead of single IP addresses, the query will be different. For this rule, instead of using the Servers watchlist, you could leverage the Networks watchlist and filter for the Servers range.

let ServersAllowList = toscalar(_GetWatchlist('Networks') 
| where column_ifexists("Name","") in~ ("Servers")
| project SearchKey
| summarize make_set(SearchKey)
);
DeviceNetworkEvents
| where RemotePort == 445
| where not(ipv4_is_in_any_range(RemoteIP, ServersAllowList))

Here we use the ipv4_is_in_any_range function, which expects a dynamic array; hence the use of summarize make_set() and toscalar(). By the way, you could use that code block for individual IP addresses too, as you do not have to use CIDR notation (/24); just the IP address would do.

Filter Out Flow

To best leverage watchlists in a DaC pipeline, we are going to follow the workflow below. A detection engineer adds a new value to the watchlist and creates a pull request, which triggers a contextualization pipeline. The contextualization pipeline identifies how many alerts and incidents are associated with the entry we added to the watchlist and enriches the PR with those results. This will give us an estimation of the impact our tuning has on the generated alerts. Then, a fellow detection engineer reviews and approves or rejects the changes. Once approved, the update to the watchlist launches a CD pipeline that pushes the changes to Sentinel.

Contextualizing Pull Requests with Build Validations

According to the workflow above, we will enhance the pull request with context using build validations and KQL. This will offer the detection engineer detailed information regarding the requested allow listing. By doing so, the engineer can estimate the noise generated by the new entries in the allow list before deciding whether to approve or reject the pull request.

Our first step is to identify the changes introduced by the PR. Since the build validation will be on the main branch, we can get the changes by issuing the following command. This Git command is used to identify file names that have changed between the main branch and the PR branch. By specifying --name-only, it lists only the file names without displaying the content changes. The --pretty="" option ensures that no additional commit log information is shown, focusing solely on the file names.

git diff --name-only --pretty="" origin/main..HEAD

Then, for each file, we will execute the following command to display the differences in the file filters/scanners.csv between the main branch and the PR branch. By using --no-pager, the output is displayed directly in the terminal without pagination. The --unified=0 option specifies that the diff output should have zero lines of context, showing only the lines that have changed.

git --no-pager --unified=0 origin/main..HEAD -- filters/scanners.csv

Then, for the changes of each file, we are parsing the output using regular expressions and identifying what values were added to each watchlist in the repository.

If an entry is added and removed in the watchlist, like 9.9.9.9 above, that simply means that we are changing its order in the list.

The next step is to contextualize the pull request based on the information identified above, meaning the values we added in the watchlist. To calculate an estimation, we are going to perform a search on Sentinel incidents and attempt to identify how many of them include the allow-listed values in their entities⁴. In Sentinel, an entity represents an element that is related to an incident or alert. Entities are used to provide additional context and details about the security incidents being analyzed. Common types of entities include IP addresses, user accounts, hostnames, URLs, files and others.

To determine whether any of the allow-listed properties added to the watchlist appear as entities in incidents generated by rules managed in our repository, we will use the following KQL query. The query examines Azure Sentinel incidents over a 30-day period, correlating them with their associated alert IDs, as each incident may contain multiple alerts. We then filter the alerts with an entity containing any of the defined values in the property_values variable and summarize the findings by counting the occurrences of each entity value across incidents and alerts.

let property_values = dynamic([]);
let lookback_time = 30d;
SecurityIncident
| where TimeGenerated > ago(lookback_time)
| where ProviderName == "Azure Sentinel"
| project IncidentNumber, IncidentName, Title, RelatedAnalyticRuleIds, AlertIds
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind = inner (
SecurityAlert
| where TimeGenerated > ago(lookback_time)
| where ProductName == "Azure Sentinel" and ProductComponentName == "Scheduled Alerts"
| project SystemAlertId, AlertName, Entities, AlertType
| extend AnalyticRuleId = split(AlertType, "_")[-1]
| mv-expand Entities = todynamic(Entities)
| mv-expand Entities = todynamic(Entities)
| mv-expand kind=array key = bag_keys(Entities)
| extend
    PropertyName = tostring(key),
    PropertyValue = tostring(Entities[tostring(key)])
| where not(PropertyName startswith "$") and PropertyName != ""
| where PropertyName != "Type"
| where PropertyValue in (property_values)
) on $left.AlertId == $right.SystemAlertId
| summarize
    IncidentCount = count_distinct(IncidentNumber),
    AlertCount = count_distinct(SystemAlertId),
    AnalyticRuleIds = make_set(AnalyticRuleId),
    IncidentNumbers=make_set(IncidentNumber),
    SystemAlertIds=make_set(SystemAlertId),
    AlertNames = make_set(AlertName)
    by PropertyName, PropertyValue, Title
| project-reorder PropertyName, PropertyValue, IncidentCount, Title, AlertCount, AlertNames, AnalyticRuleIds, IncidentNumbers, SystemAlertIds

Suppose that we wanted to allowlist the IP 10.16.5.90 from the scanner’s watchlist. In that case we would add the IP into the property values dynamic list variable and perform the query.

let property_values = dynamic(['10.16.5.90']);

A sample output of the query is shown below. We identified the allowlisted value “10.16.5.90” in the Address entity field of 3 rules. The entity value is present in 21 incidents and 309 alerts.

We put everything together in this script so that we identify the changes introduced by a PR in a watchlist and then craft the KQL query using a Jinja⁶ template. The KQL query is saved in a pipeline variable to be used by subsequent steps in the pipeline.

import re
import subprocess
from jinja2 import Environment, FileSystemLoader


def run_command(command: list) -> str:
    """Executes a shell command and returns the output."""
    try:
        # print(f"[R] Running command: {' '.join(command)}")
        output = subprocess.check_output(command, text=True, encoding="utf-8", errors="replace").strip()
        # print(f"[O] Command output:\n{'\n'.join(['\t'+line for line in output.splitlines()])}")
        return output
    except subprocess.CalledProcessError as e:
        print(f"##vso[task.logissue type=error]Error executing command: {' '.join(command)}")
        print(f"##vso[task.logissue type=error]Error message: {str(e)}")
        return ""
    except UnicodeDecodeError as e:
        print(f"##vso[task.logissue type=error]Unicode decode error: {e}")
        return ""


def get_pr_modified_files() -> list:
    """Get the pr modified files"""
    return run_command(["git", "diff", "--name-only", "--pretty=" "", "origin/main..HEAD"]).splitlines()


def get_pr_modified_file_diff_lines(file: str) -> str:
    """Get the pr modified file diff"""
    return run_command(["git", "--no-pager", "diff", "--unified=0", "origin/main..HEAD", "--", file]).splitlines()


def get_watchlist_added_values(watchlist_diff: str) -> list:
    added_regex   = r"^\+(?!\+\+)\s*(.*)$"
    removed_regex = r"^\-(?!\-\-)\s*(.*)$"
    added_values = []
    removed_values = []
    for line in watchlist_diff:
        match_added = re.search(added_regex, line)
        values = match_added.group(1).split(",") if match_added else []
        values = [v.strip() for v in values]
        added_values += values

        match_removed = re.search(removed_regex, line)
        values = match_removed.group(1).split(",") if match_removed else []
        values = [v.strip() for v in values]
        removed_values += values

    # Adding and removing an entry in the same file means just moving it around in the watchlist
    added_values_final = []
    removed_values_final = []
    added_copy = added_values[:]
    removed_copy = removed_values[:]

    for val in added_values:
        if val in removed_copy:
            # cancel out one occurrence from removed
            removed_copy.remove(val)
        else:
            added_values_final.append(val)

    for val in removed_values:
        if val in added_copy:
            # cancel out one occurrence from added
            added_copy.remove(val)
        else:
            removed_values_final.append(val)

    print (f"Added values: {','.join(added_values_final)}")
    print (f"Removed values: {','.join(removed_values_final)}")

    return added_values_final


def identify_watchlist_changes():
    all_values = []
    pr_modified_files = get_pr_modified_files()
    print(f"Modified Files:\n{', '.join(pr_modified_files)}")

    for pr_modified_file in pr_modified_files:
        if pr_modified_file.startswith("filters") and pr_modified_file.endswith(".csv"):
            print(f"Checking file: {pr_modified_file}")
            pr_modified_file_diff = get_pr_modified_file_diff_lines(pr_modified_file)
            all_values += get_watchlist_added_values(pr_modified_file_diff)

    env = Environment(loader=FileSystemLoader("pipelines/scripts/templates"))
    template = env.get_template("contextualization_query.jinja")

    kql_query = template.render(entity_values=all_values)
    print(f"KQL Query: \n{kql_query}")
    print(f"##vso[task.setvariable variable=kql_query]{kql_query.replace("\n", " ")}")


def main():
    identify_watchlist_changes()


if __name__ == "__main__":
    main()

The jinja template is as follows:

let entity_values = dynamic({{ entity_values }});
let lookback_time = 30d;
SecurityIncident
| where TimeGenerated > ago(lookback_time)
| where ProviderName == "Azure Sentinel"
| project IncidentNumber, IncidentName, Title, RelatedAnalyticRuleIds, AlertIds
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind = inner (
    SecurityAlert
    | where TimeGenerated > ago(lookback_time)
    | where ProductName == "Azure Sentinel" and ProductComponentName == "Scheduled Alerts"
    | project SystemAlertId, AlertName, Entities, AlertType
    | extend AnalyticRuleId = split(AlertType, "_")[-1]
    | mv-expand Entities = todynamic(Entities)
    | mv-expand Entities = todynamic(Entities)
    | mv-expand kind=array key = bag_keys(Entities)
    | extend
        EntityName = tostring(key),
        EntityValue = tostring(Entities[tostring(key)])
    | where not(EntityName startswith "$") and EntityName != ""
    | where EntityName != "Type"
    | where EntityValue in (entity_values)
    )
    on $left.AlertId == $right.SystemAlertId
| summarize
    IncidentCount = count_distinct(IncidentNumber),
    AlertCount = count_distinct(SystemAlertId),
    AnalyticRuleIds = make_set(AnalyticRuleId),
    IncidentNumbers=make_set(IncidentNumber),
    SystemAlertIds=make_set(SystemAlertId),
    AlertNames = make_set(AlertName)
    by EntityName, EntityValue, Title
| project-reorder
    EntityName,
    EntityValue,
    IncidentCount,
    Title,
    AlertCount,
    AlertNames,
    AnalyticRuleIds,
    IncidentNumbers,
    SystemAlertIds

After saving the KQL query as a pipeline variable, we can use the detection monitoring script that we created in Part 7 to query the environment and identify incidents that include the values that we added to the watchlist as entity. This will give us some sense of the impact this filter-out will have on the environment.

name: Contextualize Watchlist Change

trigger: none

jobs:
- job: ContextualizeWatchlistChange
  displayName: "Contextualize Watchlist Change"
  steps:
    - checkout: self
      fetchDepth: 0
      path: 's/$(Build.Repository.Name)'
    - script: |
        python $(Pipeline.Workspace)/s/$(Build.Repository.Name)/pipelines/scripts/identify_watchlist_changes.py
      displayName: 'Run Identify Watchlist Changes'
    - script: |
        pip install -r $(Pipeline.Workspace)/s/$(Build.Repository.Name)/pipelines/scripts/requirements.txt
      displayName: 'Python Dependencies Installation'
    - bash: |
        python $(Pipeline.Workspace)/s/$(Build.Repository.Name)/pipelines/scripts/detection_monitoring.py --tenant 'QA' --platform 'sentinel' --detection-compare-field 'AnalyticRuleIds'
      env:
        QUERY: $(kql_query)
      displayName: "Run Detection Monitoring Script"

We add the pipeline in the Build Validation of the main branch. This time though, we set the Policy Requirement to Optional as we do not want potential failures of the run to block our ability to merge the pull request.

To test our implementation, we are going to add the IP 10.16.5.90 to the scanners watchlist. The SOC analyst would create a pull request adding the IP to the filters/scanners.csv file.

The Build Validation would run the pipeline, identify the changes in the watchlist introduced by the pull request and create the KQL query from the jinja template.

The query would then be run on the target platform and identify incidents and alerts where the added value(s) is in the entities.

Automatic Tuning with Watchlists

Following the creation of the pull request and its approval, another pipeline is triggered upon a successful merge to the main branch that will identify which watchlists have been updated and sync them to the target platform. But first, we will go through some git commands that we need to implement the logic described. We will use a similar approach to part 6 for automatic deployments triggered by repository updates.

The first command that we are going to use will obtain the hash of the most recent commit on the main branch. The -1 option restricts the output to just the latest commit, ensuring that only one commit is displayed. The --pretty=format:%H part customizes the output to show only the full commit hash.

git log main -1 --pretty=format:%H

We then use the following command to display the commit message, using the commit hash returned from the previous command

git show --pretty=format:%s f88594f5acbb0e0d9bc5652249dd44897ed23b40

Azure DevOps Repos automatically created this commit (“Merged PR <pr-number>“), when we merged our pull request. We then execute the following Git command to fetch the commit parents.

git show --pretty=format:%P f88594f5acbb0e0d9bc5652249dd44897ed23b40

Then, we are fetching the commit list between the parent commits by executing the command below. The range <3b9a0c…>..<f767e8…> defines the starting and ending commits, allowing us to see all the commits that fall between these two hashes on the main branch. By specifying the --pretty=format:%H option, the output is customized to show only the commit hashes.

git log main --pretty=format:%H 3b9a0c0d926bae8fa6295186e06d7e17033c0ebe..f767e8edadd6783a542de5ed3f53c1c79ed2575b

For each commit, we execute git diff-tree, which examines the differences introduced by the commit. The --no-commit-id flag omits the commit ID from the output, focusing solely on the file changes. The --name-status option provides a summary of the changes, showing us the status of each file (e.g., added, modified, deleted) along with the file names.

git diff-tree --no-commit-id --name-status -r f767e8edadd6783a542de5ed3f53c1c79ed2575b

Depending on the Git client version you are using, you might need to verify that the output is as displayed in the screenshots.

For the output displayed in the above commands we are using a no-fast-forward merge. The git commands might need to be modified if another merge type is used.

The next step is to put everything together into a script. The script retrieves the last commit hash, checks if the commit message indicates a merged pull request, and identifies the start and end commits of the merge to list all commits involved. It then checks each commit for modifications in the filters directory. Modified or added filenames in that directory are stored in the watchlist_names variable to be used by the pipeline at a later step.

import subprocess
import os

base_paths = ["filters/*.csv", "filters/*.json"]


def run_command(command: list) -> str:
    """Executes a shell command and returns the output."""
    try:
        #print(f"[R] Running command: {' '.join(command)}")
        output = subprocess.check_output(command, text=True, encoding="utf-8", errors="replace").strip()
        #print(f"[O] Command output:\n{'\n'.join(['\t'+line for line in output.splitlines()])}")
        return output
    except subprocess.CalledProcessError as e:
        print(f"##vso[task.logissue type=error] Error executing command: {' '.join(command)}")
        print(f"##vso[task.logissue type=error] Error message: {str(e)}")
        return ""
    except UnicodeDecodeError as e:
        print(f"##vso[task.logissue type=error] Unicode decode error: {e}")
        return ""

def get_last_commit() -> str:
    """Retrieve the most recent commit hash on the main branch."""
    return run_command(["git", "log", "main", "-1", "--pretty=format:%H"])

def get_commit_message(commit_hash: str) -> str:
    """Retrieve a commit message"""
    return run_command(["git", "show", "--pretty=format:%s", commit_hash])

def get_commit_parents(commit_hash: str) -> list:
    """Retrieve the commit parents"""
    return run_command(["git", "show", "--pretty=format:%P", commit_hash]).split(" ")

def get_commit_list(start_commit:str, end_commit:str) -> list:
    """Retrieve a commit list"""
    return run_command(["git", "log", "main", "--pretty=format:%H", f"{start_commit}..{end_commit}"]).splitlines()

def get_commit_modified_files(commit_hash: str) -> list:
    """Get a list of modified files in the commit, along with their status"""
    return run_command(["git", "diff-tree", "--no-commit-id", "--name-status", "-r", commit_hash, "--"]+ base_paths).splitlines()

def identify_filters():
    filters = []

    input_commit_hash = get_last_commit()
    print(f"Last commit ID: {input_commit_hash}")

    commit_message = get_commit_message(input_commit_hash)
    print(f"Commit message: {commit_message}")

    if commit_message.startswith("Merged PR"):
        print("PR merge commit identified. Identifying changes...")
        commit_parents = get_commit_parents(input_commit_hash)

        if len(commit_parents) == 2:
            start_commit = commit_parents[0]
            end_commit = commit_parents[1]
            print(f"Start commit:{start_commit}..End commit:{end_commit}")

            commit_list = get_commit_list(start_commit, end_commit)
            print(f"Commit list:\n {', '.join(commit_list)}")
            for commit_hash in commit_list:
                print(f"Processing commit:{commit_hash}")
                commit_modified_files = get_commit_modified_files(commit_hash)

                for commit_modified_file in commit_modified_files:
                    status, filepath = commit_modified_file.split("\t")


                    if filepath.startswith("filters/") and (status in ["A", "M"]):
                        print(f"Filter watchlist {filepath} {"created" if status=="A" else "modified"}.")
                        filters.append(os.path.basename(filepath.removesuffix(".json").removesuffix(".csv")))
        else:
            print(f"##vso[task.logissue type=error]Could not identify parents of {input_commit_hash}")
        
    filters = list(set(filters))
    print(f"Filter watchlists identified for deployment: {', '.join(filters)}")
    print(f"##vso[task.setvariable variable=watchlist_names]{', '.join(filters)}")
    return

def main():
    identify_filters()

if __name__ == "__main__":
    main()

The pipeline is triggered by changes in the main branch, within the filters/* directory. The pipeline checks out the latest code from the main branch, installs necessary Python dependencies, and runs the script above to update changes in watchlists. It then synchronizes watchlist items based on the identified changes in the repository watchlists.

name: Automatic Watchlist Deployment Triggered By Repo Changes

trigger:
  branches:
    include:
      - main
  paths:
    include:
      - "filters/*"

jobs:
- job: IdentifyFilterWatchlistChanges
  displayName: "Identify Filter Watchlist Changes"
  condition: eq(variables['Build.SourceBranchName'], 'main')
  steps:
    - checkout: self
      fetchDepth: 0
    - script: |
        git fetch origin main
        git checkout -b main origin/main
      displayName: "Fetch Branches Locally"
    - script: |
        pip install -r pipelines/scripts/requirements.txt
      displayName: 'Python Dependencies Installation'
    - script: |
        python pipelines/scripts/identify_filter_changes.py
      displayName: 'Run Identify Filter Changes Script'
    - script: |
        python pipelines/scripts/watchlist_mgmt.py --tenant '<Tenant Name>' --action 'SyncWatchlistItems' --names '$(watchlist_names)'
      displayName: "Watchlist Management Script Run"

As an example, we will make the following two changes in the scanners filter file in our repository. We will add one entry (10.16.5.90) and remove another one (1.1.1.1).

Upon merging the changes above, the identify_filter_changes.py script is run and identifies the modified watchlist.

Then, watchlist_mgmt.py syncs the watchlist to Sentinel.

Wrapping Up

Wrapping up, we’ve explored how continuous deployment pipelines can streamline the tuning of detections through the use of watchlists. This process helps us reduce manual workload and scale our detection library more efficiently.

References

About the Authors

Kristof Baute

Kristof is a member of the Threat Detection Engineering team at NVISO’s CSIRT & SOC and is mainly involved in Use Case research and development.

Stamatis Chatzimangou

Stamatis is a member of the Threat Detection Engineering team at NVISO’s CSIRT & SOC and is mainly involved in Use Case research and development.

文章来源: https://blog.nviso.eu/2025/11/25/detection-engineering-practicing-detection-as-code-tuning-part-8/
如有侵权请联系:admin#unsafe.sh