Validate your Windows Audit Policy Configuration with KQL

Validate your Windows Audit Policy Configuration with KQL
2024-9-5 15:0:0 Author: blog.nviso.eu(查看原文) 阅读量:15 收藏

Defining an audit policy in Windows is crucial for making sure that the appropriate security events are logged and monitored. A well defined audit policy facilitates the detection of security incidents, improves incident response capabilities and ensures compliance with regulatory requirements. There is an abundance of best practices guides and documentation out there for the configuration of the Windows audit policy.

However, simply configuring the audit policy is not enough. You should also verify that the configuration is being applied correctly and consistently in your environment so that the correct events will be available when required. An improper implementation of the audit policy configuration can lead to visibility gaps and can potentially result in missing critical security incidents. Conversely, auditing more events than required can lead to unnecessary noise during investigations and unwanted side effects for your tools, like overloading the SIEM, increasing the cost of storage and requiring more computational power for correlation.

Verifying that the audit policy is applied consistently throughout the environment can be a challenge, especially if you are approaching the problem from the perspective of an MSSP or even if you work in an internal SOC with limited access to the environment you are monitoring. Tools like auditpol are available, but running them at a large scale and presenting the output may be a challenge some times. Even so, auditpol would display the configuration of the system and not what is actually being received by the SIEM. In this blog post we will look into the idea of leveraging the capabilities of Kusto Query Language (KQL) using Microsoft Sentinel to provide a quick and practical way of identifying discrepancies in the audit policy.

In order to assess whether a windows audit policy is applied correctly in a monitored environment we focus on the following vectors:

These 3 vectors should provide us with enough insight to begin troubleshooting discrepancies in the application of the defined audit policy.

We will use the following query to examine all of these vectors. For testing purposes the query is parameterized based on the audit policy recommendations for windows servers by Microsoft [1][2].

We will however explain how to customize the query to match your organization’s defined audit policy in the section “Query Customization” at the very end of this blog post, but first we will break down the output of the query and provide some information on how to interpret it.

// LookbackTime of the queries.
let LookbackTime = 30d;
// As of July 2024 the SecurityEvent table contains Audit Success and Audit Failure (https://rodtrent.substack.com/p/microsoft-sentinel-updated-securityevent). 
let KeywordsPopulated = true;
// The defined/applied/expected audit policy of the environment. This variable should be customized to match the configuration of the environment the query is run against.
let AppliedAuditPolicy = datatable(
    Task: int,
    Category: string,
    Subcategory: string,
    AuditSuccess: bool,
    AuditFailure: bool,
    Comment: string
) [
    14336, "Account Logon", "Credential Validation", true, true, "",
    14339, "Account Logon", "Kerberos Authentication Service", true, true, "This subcategory makes sense only on domain controllers.",
    14337, "Account Logon", "Kerberos Service Ticket Operations", true, true, "This subcategory makes sense only on domain controllers.",
    14338, "Account Logon", "Other Account Logon Events", false, false, "Should not contain any events. Reserved for future usage.",
    13828, "Account Management", "Application Group Management", false, false, "Application Group Management subcategory events may not exist because Authorization Manager is very rarely in use and it is deprecated starting from Windows Server 2012.",
    13825, "Account Management", "Computer Account Management", true, false, "This subcategory generates events only on domain controllers. This subcategory doesn’t have Failure events.",
    13827, "Account Management", "Distribution Group Management", false, false, "This subcategory generates events only on domain controllers. This subcategory doesn’t have Failure events.",
    13829, "Account Management", "Other Account Management Events", true, false, "This subcategory doesn’t have Failure events.",
    13826, "Account Management", "Security Group Management", true, false, "This subcategory doesn’t have Failure events.",
    13824, "Account Management", "User Account Management", true, true, "",
    13314, "Detailed Tracking", "DPAPI Activity", false, false, "It’s mainly used for DPAPI troubleshooting.",
    13316, "Detailed Tracking", "PnP Activity", false, false, "This subcategory doesn’t have Failure events.",
    13312, "Detailed Tracking", "Process Creation", true, false, "This subcategory doesn’t have Failure events.",
    13313, "Detailed Tracking", "Process Termination", false, false, "This subcategory doesn’t have Failure events.",
    13315, "Detailed Tracking", "RPC Events", false, false, "Events in this subcategory occur rarely.",
    13317, "Detailed Tracking", "Token Right Adjusted", false, false, "This subcategory doesn’t have Failure events.",
    14083, "DS Access", "Detailed Directory Service Replication", false, false, "This subcategory makes sense only on domain controllers.",
    14080, "DS Access", "Directory Service Access", true, true, "This subcategory makes sense only on domain controllers.",
    14081, "DS Access", "Directory Service Changes", true, false, "This subcategory makes sense only on domain controllers. This subcategory doesn’t have Failure events.",
    14082, "DS Access", "Directory Service Replication", false, false, "This subcategory makes sense only on domain controllers.",
    12546, "Logon/Logoff", "Account Lockout", false, true, "This subcategory doesn’t have Success events.",
    12554, "Logon/Logoff", "Group Membership", false, false, "This subcategory doesn’t have Failure events.",
    12550, "Logon/Logoff", "IPsec Extended Mode", false, false, "",
    12547, "Logon/Logoff", "IPsec Main Mode", false, false, "",
    12549, "Logon/Logoff", "IPsec Quick Mode", false, false, "",
    12545, "Logon/Logoff", "Logoff", true, false, "This subcategory doesn’t have Failure events.",
    12544, "Logon/Logoff", "Logon", true, true, "",
    12552, "Logon/Logoff", "Network Policy Server", false, false, "",
    12551, "Logon/Logoff", "Other Logon/Logoff Events", false, false, "",
    12548, "Logon/Logoff", "Special Logon", true, false, "This subcategory doesn’t have Failure events.",
    12553, "Logon/Logoff", "User / Device Claims", false, false, "This subcategory doesn’t have Failure events.",
    12806, "Object Access", "Application Generated", false, false, "",
    12805, "Object Access", "Certification Services", false, false, "",
    12811, "Object Access", "Detailed File Share", false, false, "",
    12808, "Object Access", "File Share", false, false, "",
    12800, "Object Access", "File System", false, false, "",
    12810, "Object Access", "Filtering Platform Connection", false, false, "Success auditing for this subcategory typically generates a very high volume of events, for example, one event for every connection that was made to the system. It is much more important to audit Failure events (blocked connections, for example).",
    12809, "Object Access", "Filtering Platform Packet Drop", false, false, "Success events in this subcategory rarely occur.",
    12807, "Object Access", "Handle Manipulation", false, false, "There is no recommendation to enable this subcategory for Success or Failure auditing, unless you know exactly what you need to monitor in Object’s Handles level.",
    12802, "Object Access", "Kernel Object", false, false, "There is no recommendation to enable this subcategory, unless you know exactly what you need to monitor at the Kernel objects level.",
    12804, "Object Access", "Other Object Access Events", false, false, "",
    12801, "Object Access", "Registry", false, false, "",
    12812, "Object Access", "Removable Storage", false, false, "",
    12803, "Object Access", "SAM", false, false, "",
    12813, "Object Access", "Central Access Policy Staging", false, false, "This subcategory doesn’t have Failure events.",
    13568, "Policy Change", "Audit Policy Change", true, false, "This subcategory doesn’t have Failure events.",
    13569, "Policy Change", "Authentication Policy Change", true, false, "This subcategory doesn’t have Failure events.",
    13570, "Policy Change", "Authorization Policy Change", false, false, "This subcategory doesn’t have Failure events.",
    13572, "Policy Change", "Filtering Platform Policy Change", false, false, "",
    13571, "Policy Change", "MPSSVC Rule-Level Policy Change", true, false, "",
    13573, "Policy Change", "Other Policy Change Events", false, false, "",
    13057, "Privilege Use", "Non Sensitive Privilege Use", false, false, "",
    13058, "Privilege Use", "Other Privilege Use Events", false, false, "This auditing subcategory doesn’t have any informative events inside.",
    13056, "Privilege Use", "Sensitive Privilege Use", false, false, "",
    12291, "System", "IPsec Driver", false, false, "There is no recommendation for this subcategory in this document, unless you know exactly what you need to monitor at IPsec Driver level.",
    12292, "System", "Other System Events", false, false, "",
    12288, "System", "Security State Change", true, false, "This subcategory doesn’t have Failure events.",
    12289, "System", "Security System Extension", true, false, "This subcategory doesn’t have Failure events.",
    12290, "System", "System Integrity", true, true, ""
];
// Identify all computers in the Security Event Log and total number of events for the lookback time.
let EnvironmentInformation = SecurityEvent
    | project TimeGenerated, Computer, Channel, EventSourceName, _BilledSize, Keywords
    | where TimeGenerated > ago(LookbackTime)
    | where Channel == "Security" and EventSourceName == "Microsoft-Windows-Security-Auditing"
    | where isempty(Keywords) != KeywordsPopulated
    | summarize
        TotalComputers = make_set(Computer),
        TotalEventCount = count(),
        TotalBilledSizeBytes = sum(_BilledSize);
// List of computers that appear in the logs for the queried LookbackTime.
let TotalComputers = toscalar(EnvironmentInformation
    | project TotalComputers);
let TotalEventCount = toscalar(EnvironmentInformation
    | project TotalEventCount);
let TotalBilledSizeBytes = toscalar(EnvironmentInformation
    | project TotalBilledSizeBytes);
let AllKeywords = datatable(Keywords: string)["0x8020000000000000","0x8010000000000000",""];
let AuditKeywords = AllKeywords | where isempty(Keywords) != KeywordsPopulated;
// Start Query
AppliedAuditPolicy
| project
    Task,
    AuditPolicySubCategory = strcat(Category, ".", Subcategory),
    AuditSuccess,
    AuditFailure,
    AuditStatus = (AuditSuccess or AuditFailure),
    Comment
| extend placeholder = 1 // Cross-join the tables https://learn.microsoft.com/en-us/kusto/query/join-operator?view=microsoft-fabric#cross-join.
| join kind=inner (AuditKeywords
    | extend placeholder = 1)
    on placeholder
| extend ExpectedEnabledStatus = case(Keywords == "0x8020000000000000", AuditSuccess, Keywords == "0x8010000000000000", AuditFailure, AuditStatus)
| project-away placeholder, placeholder1, AuditSuccess, AuditFailure, AuditStatus
| join kind=fullouter (SecurityEvent
    | project
        TimeGenerated,
        Computer,
        Channel,
        EventSourceName,
        Task,
        Keywords,
        EventID,
        _BilledSize,
        _IsBillable 
    | where TimeGenerated > ago(LookbackTime)
    | where Channel == "Security" and EventSourceName == "Microsoft-Windows-Security-Auditing"
    | where isempty(Keywords) != KeywordsPopulated
    | summarize
        minTimeGenerated = min(TimeGenerated),
        maxTimeGenerated = max(TimeGenerated),
        IdentifiedEventIDs = make_set(EventID),
        ComputersWithAuditPolicy = make_set(Computer),
        EventCount = count(),
        BilledSizeBytes = sum(_BilledSize),
        IsBillable = make_set(_IsBillable)
        by Task, Keywords)
    on $left.Task == $right.Task and $left.Keywords == $right.Keywords
| project-away Task1, Keywords1
| extend Keywords = case(Keywords == "0x8020000000000000", "Audit Success", Keywords == "0x8010000000000000", "Audit Failure", isempty(Keywords), "N/A", Keywords) // Beautify values.
| extend IdentifiedEventIDs = iif(isempty(IdentifiedEventIDs), todynamic("[]"), IdentifiedEventIDs) // Set default value to [] if empty.
| extend ComputersWithAuditPolicy = iif(isempty(ComputersWithAuditPolicy), todynamic("[]"), ComputersWithAuditPolicy) // Set default to [] if empty.
| extend EventCount = iif(isempty(EventCount), 0, EventCount) // Set default value to 0 if empty.
| extend BilledSizeBytes = iif(isempty(BilledSizeBytes), 0.0, BilledSizeBytes)  // Set default value to 0.0 if empty
| extend BilledSize = format_bytes(BilledSizeBytes) // Format output to user readable units (KB, MB, GB etc...).
| extend TotalBilledSize = format_bytes(TotalBilledSizeBytes) // Format output to user readable units.
| extend BilledSizePercentage = round(100.0 * BilledSizeBytes / TotalBilledSizeBytes, 3) 
| extend IsBillable = iif(isempty(IsBillable), "N/A", IsBillable)  // Set default value to N/A.
| extend minTimeGenerated = iif(isempty(minTimeGenerated), "N/A", tostring(format_datetime(minTimeGenerated, 'yyyy-MM-dd HH:mm:ss'))) // Set default value to N/A if empty.
| extend maxTimeGenerated = iif(isempty(maxTimeGenerated), "N/A", tostring(format_datetime(maxTimeGenerated, 'yyyy-MM-dd HH:mm:ss'))) // Set default value to N/A if empty.
| extend ComputersMissingAuditPolicy = set_difference(TotalComputers, ComputersWithAuditPolicy) // Identify computers with a missing audit policy by calculating the difference between TotalComputers and ComputersWithAuditPolicy.
| extend TotalComputerCount = array_length(TotalComputers) // Get the total number of computers int the environment for the query timespan.
| extend ComputersWithAuditPolicyCount = array_length(ComputersWithAuditPolicy) // Calculate total number of computers with the audit policy enabled.
| extend PercentageCoverageInEnvironment = round(100.0 * ComputersWithAuditPolicyCount / TotalComputerCount, 3) // Calculate the percentage of computers that have the audit policy enabled.
| extend EventPercentage = round(100.0 * EventCount / TotalEventCount, 3) // Calculate the percentage of events generated by the audit policy.
| extend Enabled = iif(EventCount == 0, "false", "true") // Set to true or false depending on whether events where identified for the audit policy category.
| extend Verdict = case(
                       ExpectedEnabledStatus == Enabled,
                       "OK",
                       ExpectedEnabledStatus != Enabled and Enabled == "true",
                       "Warning: Audit policy is enabled, although it is expected to be disabled.",
                       ExpectedEnabledStatus != Enabled and Enabled == "false",
                       "Warning: No event IDs identified for audit policy for the queried time even though it is expected to be enabled. Consider increasing Lookbacktime time of the query or reviewing your audit policy.",
                       "N/A"
                   ) // Calculate the verdict according to the ExpectedEnabledStatus and Enabled status of each audit policy category.
| project
    AuditPolicySubCategory,
    Keywords,
    ExpectedEnabledStatus,
    Enabled,
    Verdict,
    Comment,
    IdentifiedEventIDs,
    minTimeGenerated,
    maxTimeGenerated,
    EventCount,
    TotalEventCount,
    EventPercentage,
    BilledSize,
    BilledSizeBytes,
    TotalBilledSize,
    TotalBilledSizeBytes,
    BilledSizePercentage,
    IsBillable,
    ComputersMissingAuditPolicy,
    ComputersWithAuditPolicy,
    ComputersWithAuditPolicyCount,
    TotalComputerCount,
    PercentageCoverageInEnvironment
| sort by AuditPolicySubCategory asc, Keywords desc

Kusto

But how would you interpret those results correctly? You can verify that logs are being ingested by each configured audit policy subcategory setting by comparing the values of the ExpectedEnabledStatus and the Enabled columns. This of course can be relative to your environment (e.g. Event IDs may not be generated for that audit policy subcategory for the queried timeframe or some actions are simply rare in your environment etc…). It helps to denote such things in the Comment field of the AppliedAuditPolicy variable so in subsequent runs of this query you do not start troubleshooting from scratch. The minTimeGenerated and maxTimeGenerated can help you identify possible interruptions in logging (of course interruptions within this timeframe can not be detected by this query). Another thing to pay attention to is audit policy subcategories that are expected to be disabled but related event IDs have been identified for them. The Verdict should provide a short description of these issues.

For the volume, we calculate the event count and billed size per audit policy subcategory as well as the percentage of the total events and billed size per audit policy subcategory.

The volume related columns will provide some insight on how much noise is generated by each audit policy subcategory. By looking at the EventCount and BilledSize values, you can assess whether the benefits of having that audit policy subcategory enabled justify its costs. If the log count is high or the logs consume a considerable amount of storage, but the events do not critically contribute to your use cases, you may lean towards disabling that audit policy setting.

The last vector examined is the coverage. For the coverage we calculate the following:

Coverage is very important and should be evaluated in conjunction with the status and volume results. What good would it be to have Process Creation events but only for one of the monitored computers of the environment? It is advised to aim for a high coverage percentage at least for the most critical audit policy subcategories. The ComputersMissingAuditPolicy column can provide some hints on where to start looking in case you identify discrepancies.

The query can be customized for any environment with little effort.

To customize the query period simply edit the LookbackTime variable.

It is set to 30 days but you may have to modify this if the query takes too long to complete. We had no trouble running it in environments with 300+ servers however, in environments with thousands of servers you may have to experiment. In general, the greater the lookback time the more representative the results will be.

As of July 2024 [5][7] Microsoft enriched the SecurityEvent table with additional data columns for Windows with the Azure Monitor Agent installed. One of these additions is the Keywords column that represents audit success or audit failure [6] for the logs in the Security event viewer. Depending on your version of the Azure Monitor Agent [7] the Keywords column will either be populated or empty. So, there are environments where the Keywords column is populated for every monitored computer, environments where it’s not populated at all, and environments where it’s populated for some computers but not others depending on whether the Azure Monitor Agent is up to date.

To identify what is the case in your environment run the following query.

Set the value of KeywordsPopulated to true if all Azure Monitor Agent are of version 1.29.0 and above or false if none of the Azure Monitor Agent is above version 1.29.0. [7] If your environment is mixed (only some windows have a version of 1.29.0) then you have to run the query twice – once with the KeywordsPopulated variable set to true and once with the KeywordsPopulated variable set to false and review the results separately.

Running the query in environments that do not have the Keywords column populated will still return actionable results for some use cases (e.g. you will be able to check whether you are getting logs as a result of enabling that policy) but it will not be clear whether the identified logs are a result of enabling that specific audit policy’s audit success or audit failure. As more environments are getting up to date with the latest agent version this inconsistency will be less common and there will be no reason to run the query above to identify if the Keywords column is populated.

The AppliedAuditPolicy variable is used to map the audit policy subcategory to its numeric representation in the logs [3][4] as well as to define the applied audit policy that is expected to be present in the environment.

Modify the AppliedAuditPolicy datatable variable to match the audit policy settings that have been configured in your environment. Set the AuditSuccess and AuditFailure field to true or false depending on the settings of each audit policy subcategory.

Comments for each subcategory can be added to denote important things about the policy. The comments will be displayed in the results when the query is run.

let AppliedAuditPolicy = datatable (Task: int, Category: string, Subcategory: string, AuditSuccess: bool, AuditFailure: bool, Comment: string) [
    14336, "Account Logon", "Credential Validation", true, true, "",
    14339, "Account Logon", "Kerberos Authentication Service", true, true, "",
    14337, "Account Logon", "Kerberos Service Ticket Operations", true, true, "",
    14338, "Account Logon", "Other Account Logon Events", false, false, "Should not contain any events. Reserved for future usage.",
    13828, "Account Management", "Application Group Management", false, false, "",
    .
    .
    .
    12288, "System", "Security State Change", true, false, "This subcategory doesn’t have Failure events.",
    12289, "System", "Security System Extension", true, false, "This subcategory doesn’t have Failure events.",
    12290, "System", "System Integrity", true, true, ""
];

Kusto

The configuration of the Windows audit policy is a very important part of your organization’s security posture. Ensuring that the audit policy is applied consistently across your environment is just as important and quality controls should be in place. You are highly encouraged to develop your own methodologies for reviewing audit policy configurations in your organization.

The provided query serves as a quick way and initial trigger for investigating discrepancies in the application of windows audit policy in your environment in order to establish a minimum level of quality and consistency. Complementary checks and improvements to the query may be provided in the future, either as appendixes or as part of another blog.

文章来源: https://blog.nviso.eu/2024/09/05/validate-your-windows-audit-policy-configuration-with-kql/
如有侵权请联系:admin#unsafe.sh