A Security Operations Center (SOC) watches an organization’s IT systems for cyber threats 24/7. It quickly finds and fixes security problems and uses Security Information and Event Management (SIEM) tools to collect and analyze alerts and logs. SIEMs depend on log Collectors servers, which gather data from many sources and send it to the SIEM. If the Collectors fail, the SIEM loses input and the SOC can miss attacks or respond too slowly.

That means Collectors’ uptime, reliability, and log availability and integrity aren’t just “nice to have”. They are essential for detecting threats quickly, getting real-time alerts, supporting investigations, and ensuring compliance. Managing dozens of Collectors across different networks is complex and challenging. Each Collector misconfiguration or downtime is not just a drift. It is a potential security and visibility gap.

In this blog series, I will show how to solve these challenges with DevOps and Infrastructure as Code (IaC) practices. Ansible and GitHub Actions power the solution.

The unique Collectors management challenges

Most DevOps or IaC use cases focus on efficiency, like avoiding configuration drift and making deployments scale easily. But when managing Collectors, priorities go beyond these basic advantages. The top priorities are nonstop availability, especially during chaos, and keeping investigation data reliable and correct. That makes Collectors much more sensitive and higher risk than typical infrastructure.

The biggest challenges we are facing are:

  • Evidence integrity
    Logs moving through Collectors can end up in investigations, audits, or reports. When Collector configurations aren’t consistent or tracked, it’s hard to know where the logs came from. It’s also unclear if the data has been changed. Trust in the evidence drops quickly.
  • Detection Quality
    What Collectors do to events, like parsing, filtering, dropping, etc., decides which logs reach the SIEM, how fast they arrive, and how they are structured. Even a small change in these configurations can stop a detection from working or slow down alerts. It can also cause false negatives that hide real attacks.
  • Attack surface
    Collectors are usually exposed to many log sources and often sit in the middle of sensitive traffic. Poor control over configurations, credentials, or setup methods can be dangerous. They let attackers create blind spots, suppress or rewrite logs to hide their actions.

In a SOC, managing Collectors manually is not just slow and inefficient. It is also risky.

The Solution: Let the Code Do the Work

We needed a way to manage all Collector setups and track every change. We also needed an easy way to repeat setups and automate updates or new installs.

Here’s how we did it. Instead of logging into each server individually and performing manual actions, we have:

  • Everything in one trusted place
    All Collector configurations are stored in a central GitHub repository. Every update is versioned and documented in the commit history. So, you always know what changed and can easily roll back if something goes wrong.
  • Automated Collector Setup
    New Collector? Add it to the Ansible inventory and make a few clicks on GitHub Actions. Then, Ansible installs everything, deploys configurations, and validates that the server is working. What used to take hours now takes minutes.
  • Fast recovery
    If a Collector fails, it can be easily restored with a GitHub Actions workflow that runs Ansible playbooks. No manual guesswork and minimum downtime.
  • Consistency
    Every Collector gets the exact same setup, with the exact same settings. No surprises, which makes fixing problems faster and easier when you know everything is set up the same way.
  • Configuration as Code
    We write setup in simple, human-friendly YAML files. This is code that anyone on the team can understand and review.
  • Easy and secure connection
    The solution uses a secure VPN to reach Collectors behind firewalls or private networks. No manual access steps are needed each time. Updates become quicker and safer.

How It Actually Works – Architecture

Here’s a simplified technical breakdown of the architecture.

GitHub repositories. The single source of truth with:

  • One repository with Collectors’ configurations: It has log collection rules (e.g. log parsing and dropping rules) and configurations (e.g. High Availability and Load Balancing configurations).
  • One repository with Ansible code: Ansible playbooks, inventory, etc.
  • GitHub Actions workflows on each repository.
  • All configurations and code versioned and tracked.

Ansible control node. A Linux central server with:

  • SSH access to all managed Collectors. Establishes a secure VPN connection, so it can reach Collectors within other private networks. Ensure that all connections are logged and monitored.
  • Access to a key vault (where credentials and SSH keys are stored securely).
  • Access to our GitHub private repositories (to pull configurations and push updates).
  • Ansible installed.
  • Ansible Inventory: A list of all Collectors with their IP addresses and SSH connection details. This file defines which servers Ansible targets.
  • Ansible Playbooks: YAML files that describe what to deploy.

Collectors: Linux servers at different sites or networks.

Centralized Configuration Management

Whenever you change any Collectors’ configuration in GitHub, the solution deploys the update automatically to the Collector servers. It restarts the right services and checks that each server works correctly after the change.

The workflow is simple:

  1. A team member updates the Collectors configuration files in GitHub, commits them and creates a Pull Request (PR).
  2. If the tests on GitHub Actions are successful and the changes are approved, the PR can be merged.
  3. Once the PR is merged, a GitHub Actions workflow starts automatically.
  4. The workflow reads the details from the PR. It figures out which Collectors to update, which configurations to change, and which services to restart after deployment.
  5. The workflow triggers Ansible playbooks on the Ansible control node using this information.
  6. The playbooks use the information to update the right Collector servers.
  7. After deployment, services are restarted and checks are performed to confirm they work correctly. If any problem is found, an automatic rollback can restore the previous state.
  8. All actions are logged and saved in GitHub Actions.

Automated Collectors Setup:
From Zero to Production in Minutes

Now, here’s where it gets really powerful. Instead of manually setting up each new Collector, we automated the entire setup process.

Here’s what happens when a new Collector is needed:

  1. Create an Ansible user on the new Collector and set its SSH key so Ansible can connect.
  2. Add the new Collector to the Ansible inventory with its IP and SSH details.
  3. Start the setup workflow in GitHub Actions and fill in the required inputs.
  4. The workflow runs Ansible playbooks on the Ansible control node with these inputs.
  5. Ansible fetches the required secrets from the key vault and configuration files from GitHub.
  6. Ansible connects to the new Collector over SSH.
  7. Ansible creates SOC engineers’ user accounts on the Collector.
  8. Ansible checks hardware and network requirements, installs required software, and builds configurations based on the user’s inputs on step 3.
  9. Other configuration files from GitHub are deployed to the Collector.
  10. Services are restarted and basic checks confirm that the Collector works as expected.
  11. If the tests pass, the new Collectors’ configurations are pushed to the GitHub repository.
  12. GitHub Actions logs what was deployed, when and by whom.

All of this happens in less than 5 minutes. One team member, one workflow, one manual step. Ansible can run it on 1 server or 100 servers. The process is identical every time.

What This Means for Your SOC

If you run a SOC, you know the pressure. Everyone wants logs to show up faster. Everything needs to be more secure and reliable. Meanwhile, you’re supposed to do this with the same team size as last year. By automating the boring, repetitive work, my team focuses on what matters. We troubleshoot infrastructure. We improve security. We build better automations. We no longer waste time on repetitive CLI tasks.

Plus, we can update configurations without risking silent Collector failures. Validation and rollback protect the Collectors. Even if a Collector breaks unexpectedly, we can restore it with the same setup in minutes, maintaining log ingestion. Good logs help SOC teams spot threats faster and fix issues quicker.

In the next part, I’ll share more technical details about the solution. You’ll also see how all the parts work together. Stay tuned!

About the Author

Tryfon Skandamis

Tryfon Skandamis leads NVISO’s SOC Infrastructure Engineering team. He focuses on applying SOC engineering best practices and keeping tools and infrastructure running at their best. When he’s not troubleshooting or fine‑tuning them, he builds automations that make life easier for SOC engineers.

Published