From 10,000 Daily Alerts to a Simple 1–5 Score: A Machine Learning Approach to Enterprise Vulnerability Management
Press enter or click to view image in full size
I built TAPS (ThreatSurface Analyzer with Predictive Scoring), a machine learning system that:
- Automates security risk assessment across thousands of assets
- Combines vulnerability data, configuration quality, and business context
- Generates simple 1–5 risk scores (like credit scores for cybersecurity)
- Achieved 89.9% accuracy using LARS regression
- Proves configuration hardening has higher ROI than just patching
If your security team drowns in vulnerability alerts and struggles with prioritization, this is for you.
Part 1: The Problem Nobody’s Solving
The 10,000-Alert-Per-Day Problem
Picture this: You’re a security analyst at a mid-sized enterprise. You arrive Monday morning, open your vulnerability scanner, and see 10,247 new alerts. Your SIEM has flagged 3,892 suspicious events. Your compliance dashboard shows 584 configuration deviations.
You have 8 hours. Where do you even start?
This isn’t hypothetical. According to Verizon’s 2024 Data Breach Investigations Report, the median enterprise generates over 10,000 security alerts daily. Even with a team of analysts, manual review is mathematically impossible.
The Broken Prioritization Model
Most teams rely on CVSS (Common Vulnerability Scoring System) scores. A CVSS 9.0 “critical” vulnerability gets immediate attention. A CVSS 6.5 “medium” goes to the backlog.
Here’s the problem: CVSS measures intrinsic vulnerability severity but ignores:
- Is the vulnerable system internet-facing or internal?
- Is it configured securely with compensating controls?
- Does it handle customer credit cards or test data?
- Is it production or development?
A CVSS 9.0 on a well-configured test server might be less risky than a CVSS 6.5 on a poorly-configured, internet-facing production system handling financial data.
The Business Translation Gap
Try explaining security to your CEO:
You: “We have 327 critical vulnerabilities, 1,203 high, and 4,891 medium.”
CEO: “Is that… good? Bad? Should I be worried? How much should we spend fixing this?”
You: “Well, it depends on…”
CEO: glazes over
Security metrics don’t translate to business impact. Executives need a number they can understand, track, and budget against.
The Consistency Problem
I ran an experiment: I gave three senior security analysts the same system profile and asked them to rate its risk on a 1–5 scale.
Results:
- Analyst A: 2.5 (low-medium risk)
- Analyst B: 3.8 (high risk)
- Analyst C: 4.2 (critical risk)
Same system. Three different assessments. How do you trend organizational risk when individual evaluations vary by 68%?
The industry needs:
- Scalability: Assess thousands of assets in minutes, not weeks
- Consistency: Same assessment regardless of who evaluates
- Context: Combine vulnerabilities, configuration, and business impact
- Simplicity: One number executives understand
- Actionability: Clear prioritization guidance
Enter TAPS.
Part 2: The Solution Architecture
The Core Concept: Security Credit Scores
Financial services solved a similar problem decades ago. How do you assess the creditworthiness of millions of people consistently? Credit scores.
Complex financial history → Single number (300–850) → Clear decision (approve/deny loan).
What if we applied this to security?
Complex security data → Single number (1–5) → Clear action (quarterly review / emergency patch).
The Three-Dimensional Risk Model
TAPS integrates three data dimensions that are typically kept separate:
Dimension 1: Vulnerability Intelligence (NIST NVD)
What we capture:
- Maximum CVSS score among all vulnerabilities
- Total vulnerability count
- Average vulnerability severity
- Presence of critical (CVSS ≥ 9.0) vulnerabilities
Why it matters:
This answers: “What’s broken on this system?”
Data source: NIST National Vulnerability Database (nvd.nist.gov)
Dimension 2: Configuration Quality (CIS Benchmarks)
What we capture:
- CIS Benchmark compliance percentage (0–100%)
- Automated security control implementation
- Manual security control implementation
- Web Application Firewall deployment
- Intrusion Detection System presence
- System age (patch currency proxy)
Why it matters:
This answers: “How well is it defended?”
Even high-severity vulnerabilities are less dangerous on well-configured systems with compensating controls.
Data source: Center for Internet Security Benchmarks (cisecurity.org)
Dimension 3: Business Context (FAIR Framework)
What we capture:
- Asset criticality rating (1–5)
- Financial exposure if compromised
- Threat frequency (attack likelihood)
- Environment (Production/Staging/Development)
- Business unit (Finance/E-Commerce/HR/etc.)
- Data classification (Public/Internal/Confidential/Restricted)
- Loss magnitude category
- Internet accessibility
- User base size
Why it matters:
This answers: “How much damage would a breach cause?”
A vulnerability on a development server isn’t the same threat as one on a production financial system.
Data source: FAIR (Factor Analysis of Information Risk) taxonomy
The Output: TAPS Scores
Score Range: 1.0–5.0
1.0–2.5: LOW RISK 🟢
→ Action: Quarterly review cycle
→ Example: Development server, few vulnerabilities, good compliance
2.5–3.5: MEDIUM RISK 🟡
→ Action: Monthly monitoring, 30-day remediation window
→ Example: Staging environment, moderate vulnerabilities, acceptable configuration
3.5–4.5: HIGH RISK 🟠
→ Action: Weekly tracking, 7-day priority remediation
→ Example: Production system, high CVSS vulnerabilities, internet-facing
4.5–5.0: CRITICAL RISK 🔴
→ Action: Immediate response, executive escalation, possible isolation
→ Example: Production financial system, critical vulnerabilities, poor compliance, external-facing
Simple. Clear. Actionable.
Press enter or click to view image in full size
Part 3: The Machine Learning Approach
Why Machine Learning?
Rule-based systems (if CVSS > 7 AND production THEN high_risk) are brittle and miss nuanced patterns. Machine learning discovers complex relationships automatically:
- Does vulnerability count matter more than max CVSS?
- At what compliance threshold does risk accelerate?
- How do vulnerabilities interact with business impact?
- Are there non-linear effects we’re missing?
The Fair Comparison Challenge
Most algorithm comparison studies are flawed. They create custom features for each algorithm:
- LARS gets linear interaction terms
- Neural networks get normalized features
- Decision trees get categorical bins
Then they claim Algorithm X “won.” But did the algorithm win, or did the feature engineering win?
My Approach: Identical Features for All
All three algorithms I tested received the exact same 19 base features:
- 4 vulnerability metrics
- 6 configuration quality metrics
- 9 business context metrics
No custom features. No algorithm-specific preprocessing. Level playing field.
This way, performance differences reflect genuine algorithmic capabilities, not feature engineering tricks.
The Three Algorithms
Algorithm 1: LARS (Least Angle Regression)
Type: Linear regression with automatic feature selection
How it works: Starts with all features, incrementally selects the most important ones while driving irrelevant feature coefficients to exactly zero through L1 regularization.
Strengths:
- Extremely interpretable (clear coefficient values)
- Fast training and prediction (< 1 second)
- Automatic feature selection (eliminates noise)
Results:
- R² = 0.899 (explains 89.9% of variance)
- MAE = 0.163 (average error 0.16 points)
- RMSE = 0.209 (limited catastrophic errors)
Best use case: Executive reporting and resource allocation
Example insight from LARS:
cis_compliance_rate coefficient: -0.397
Interpretation: Each 10% compliance improvement reduces risk by 0.04 points.
Business value: Quantified ROI for security hardening investments.Algorithm 2: M5 Decision Tree
Type: Rule-based hierarchical partitioning
How it works: Recursively splits data on feature thresholds to create if-then rules.
Strengths:
- Very high interpretability (human-readable rules)
- Captures feature interactions naturally
- No assumptions about functional form
Results:
- R² = 0.772 (explains 77.2% of variance)
- MAE = 0.241 (average error 0.24 points)
- RMSE = 0.314 (acceptable error distribution)
Best use case: SOC analyst playbooks and decision support
Example rule from M5:
IF vuln_count > 5
AND cis_compliance < 60%
AND environment = "Production"
THEN risk_score ≈ 4.3 (CRITICAL)
ACTION: Immediate patching + executive notificationSecurity analysts can follow this logic without ML expertise.
Algorithm 3: LOESS (Local Regression)
Type: Non-parametric k-Nearest Neighbors
How it works: Predicts based on the 50 most similar assets in the training data, weighted by distance.
Strengths:
- Captures non-linear patterns and thresholds
- No functional form assumptions
- Flexible to complex relationships
Results:
- R² = 0.624 (explains 62.4% of variance)
- MAE = 0.330 (average error 0.33 points)
- RMSE = 0.404 (moderate error distribution)
Best use case: Threshold detection and specialized analysis
Key finding from LOESS: Risk doesn’t increase linearly with CVSS. It accelerates exponentially above CVSS 7.0. This non-linear pattern validates data-driven thresholds rather than arbitrary cutoffs.
Press enter or click to view image in full size
Press enter or click to view image in full size
The Ensemble Approach
Rather than pick a “winner,” I combined all three:
Ensemble Prediction:
TAPS_score = (LARS_pred + LOESS_pred + M5_pred) / 3Benefits:
- Accuracy: Often matches or beats individual models
- Confidence Scoring: When all three agree (variance < 0.2), confidence is high. When they disagree (variance > 0.5), it’s an edge case needing human review.
- Robustness: If one model struggles with unusual data, others compensate.
Results:
- R² = 0.839
- MAE = 0.185
- RMSE = 0.209
Plus free confidence intervals.
Press enter or click to view image in full size
Part 4: The Breakthrough Findings
Finding 1: Configuration Quality Beats Patching (Sort Of)
The data revealed something surprising:
CIS Compliance Coefficient: -0.397 (LARS)
Max CVSS Coefficient: +0.327 (LARS)
Translation: Improving configuration has higher marginal impact than reducing vulnerability severity.
Why this matters:
- You can’t eliminate vulnerabilities instantly (patching takes time, testing, deployment)
- You CAN improve configuration relatively quickly (enable WAF, harden settings, implement IDS)
Practical implication:
If you had $100K to spend on security, the data suggests investing in configuration management tools and compliance automation would reduce risk more than hiring more people to patch faster.
Important caveat: This doesn’t mean “don’t patch.” It means “hardening multiplies the value of patching.”
Finding 2: Algorithm Convergence Validates Risk Drivers
When I compared feature importance across LARS and M5 (two completely different algorithms), they agreed on the top 3 risk drivers:
Top 3 (Consensus):
- Configuration quality (cis_compliance_rate)
- Vulnerability severity (max_cvss)
- Business impact (business_impact_score)
When independent methods reach the same conclusion, confidence skyrockets. These aren’t statistical artifacts — they’re genuine causal factors.
Finding 3: Five Features Are Redundant
LARS drove 5 feature coefficients to exactly zero:
uptime_daysestimated_usersmanual_compliance_ratebusiness_unitloss_magnitude
Interpretation: These provide no additional predictive value beyond the other 14 features.
Practical value: Simpler data collection. You can skip these features without losing accuracy, reducing operational burden.
Finding 4: No Universal “Best” Algorithm
LARS won on accuracy. M5 won on interpretability. LOESS won on threshold detection.
The lesson: Deploy multiple algorithms for different stakeholders:
- LARS for executives (clear coefficient insights)
- M5 for analysts (decision rules)
- LOESS for automation (pattern detection)
- Ensemble for comprehensive scoring
Different operational needs require different algorithmic properties.
Part 5: Real-World Deployment
Scenario 1: Automated Vulnerability Scanning
Integration Point: Post-processing after scanner completes
Workflow:
- Vulnerability scanner finishes weekly scan
- TAPS extracts 19 features for each asset
- LARS model scores all assets in < 5 seconds
- Dashboard updates with current risk posture
- Assets scoring > 4.0 auto-generate priority tickets
Algorithm Choice: LARS (speed + accuracy)
Get Hibullahi AbdulAzeez’s stories in your inbox
Join Medium for free to get updates from this writer.
Business Value:
- Manual assessment: 1,000 assets × 15 min = 250 hours
- TAPS assessment: 1,000 assets × 0.3 sec = 5 minutes
- ROI: 249.9 hours saved per scan cycle
Scenario 2: SOC Analyst Decision Support
Integration Point: SIEM alert enrichment
Workflow:
- SIEM generates security alert
- TAPS looks up asset’s current risk score
- M5 decision tree provides logic:
Asset: web-server-042TAPS Score: 4.3Rule Applied: "High vuln count + Low compliance + Production"Recommendation: P1 response, notify senior analyst
Analyst follows playbook based on score
Algorithm Choice: M5 (interpretable rules)
Business Value:
- Consistent triage across all analysts
- Junior analysts make decisions like seniors
- Reduced mean time to respond (MTTR)
Scenario 3: Executive Dashboard
Integration Point: Daily batch reporting
Workflow:
- Nightly batch job scores entire asset inventory
- LARS feature importance shows top organizational risk drivers:
Q4 2025 Security Posture: 3.4 / 5.0Top Risk Drivers:• CIS Compliance: 67% average (target: 80%)• Critical Vulnerabilities: 12% of assets• Production Exposure: 45% lack WAFRecommendation: Invest $500K in config automationExpected Impact: Reduce org risk from 3.4 to 2.8
Board presentation shows trend line over time
Algorithm Choice: LARS (coefficient insights)
Business Value:
- Quantified, data-driven budget justification
- Clear ROI for security investments
- Risk communicated in business language
Scenario 4: Threshold-Based Alerting
Integration Point: Continuous monitoring
Workflow:
- LOESS identifies risk thresholds from training data
- System monitors assets for threshold crossings:
ALERT: web-server-128 crossed CVSS 7.0 thresholdPrevious Score: 3.2 → Current Score: 4.1Risk Acceleration: Detected by LOESS non-linear analysisAction: Emergency patching authorized
Proactive tickets generated before incidents
Algorithm Choice: LOESS (threshold detection)
Business Value:
- Proactive vs. reactive posture
- Data-driven alert thresholds (not arbitrary)
- Reduced false positive alert fatigue
Part 6: Lessons Learned (The Hard Way)
Lesson 1: First Iteration Was Terrible
My initial models achieved R² around 0.45–0.55. Terrible. Below operational thresholds.
What went wrong:
- Target variable construction was oversimplified
- Hyperparameter search was too narrow
- Training data needed better balance
The fix:
- Consulted with domain experts on realistic risk scoring
- Expanded hyperparameter grids significantly
- Refined synthetic data generation to match real patterns
Final results: R² improved to 0.77–0.90 range.
Lesson: ML projects are iterative. First attempt teaches you what doesn’t work. Embrace the iteration.
Lesson 2: Interpretability Costs Accuracy (But It’s Worth It)
M5 scored 12.7% lower R² than LARS. My first instinct: “M5 loses.”
Wrong mindset.
M5 generates human-readable rules. Analysts can:
- Follow decision logic without ML training
- Explain assessments in audits
- Trust the system (transparency breeds trust)
That’s worth 12.7% accuracy in security operations where humans remain in the loop.
Lesson: Optimize for operational value, not just accuracy metrics.
Lesson 3: Fair Comparison Is Harder Than It Sounds
Every instinct screamed “engineer better features for LOESS!” (Add polynomial terms! Transform features!)
I resisted. The whole point was identical features for fair comparison.
Discipline paid off. Results are scientifically valid because I controlled for confounding variables.
Lesson: Define your experimental goals early and don’t compromise methodology for marginal gains.
Lesson 4: Synthetic Data Is Both Blessing and Curse
Blessing:
- No confidentiality issues
- Perfect for academic/research
- Reproducible results
Curse:
- Doesn’t capture real-world messiness
- Edge cases are theoretical
- Needs validation on actual enterprise data
Lesson: Synthetic data proves feasibility. Real data proves value. Need both.
Lesson 5: Ensemble Is Usually The Answer
When in doubt, average multiple models.
- Errors cancel out
- Free confidence scoring via variance
- Robust to individual model failures
Unless you have a strong reason not to, ensemble.
Lesson: Don’t overthink algorithm selection. Deploy multiple and combine.
Part 7: What’s Next
Short-Term: Validation on Real Data
Synthetic data proves the concept works. But I need to validate on actual enterprise environments with:
- Real vulnerability scans
- Real configuration audits
- Real incident outcomes
Call to action: If your organization is interested in a pilot, let’s talk. I’m offering free implementation in exchange for feedback and anonymized validation data.
Medium-Term: Extend to Other Asset Types
TAPS currently focuses on Apache web servers. The methodology should generalize to:
- Databases (MySQL, PostgreSQL, Oracle)
- Network devices (routers, switches, firewalls)
- Cloud resources (EC2, S3, Lambda)
- Endpoints (laptops, desktops, mobile)
Challenge: Each asset type requires different feature engineering. Transfer learning might help.
Long-Term: Temporal Modeling
Current TAPS provides point-in-time assessment. The next evolution:
- Trend prediction: “This asset’s risk increased 0.8 points over 30 days — investigate degradation”
- Time series forecasting: “At current patching velocity, 23% of assets will enter critical range in Q2”
- Anomaly detection: “Risk spike detected — unusual vulnerability disclosure affecting 127 assets”
This requires recurrent neural networks (LSTM/GRU) and more complex architectures.
The Ultimate Vision: Closed-Loop Security
The dream:
- TAPS detects high risk →
- Triggers automated remediation (patch deployment, config hardening) →
- Re-scans and verifies risk reduction →
- Learns from successful remediations →
- Improves future predictions
Human oversight remains critical, but routine decisions become automated. Security teams focus on strategy, not spreadsheets.
Part 8: How You Can Use This
For Security Practitioners
If you’re drowning in vulnerability alerts:
- Start collecting the 19 features TAPS uses
- Build a simple baseline model (even basic linear regression helps)
- Use scores to triage your existing backlog
- Iterate and improve based on incident outcomes
Resources you need:
- Vulnerability scanner (Nessus, Qualys, OpenVAS)
- Configuration auditor (InSpec, OpenSCAP)
- Asset management database (ServiceNow, CMDB)
- Basic Python + scikit-learn knowledge
For Security Leaders
If you’re tired of unclear security posture:
- Calculate your current assessment cost (time × hourly rate)
- Pilot TAPS on 100–500 assets
- Compare manual vs. automated assessment accuracy
- Measure time savings and consistency improvements
- Scale to full deployment if successful
Expected ROI: 95%+ time reduction in assessment, 40–60% improved consistency based on my experiments.
Part 9: Open Questions and Collaboration
Questions I’m Still Exploring
1. Does TAPS work across different industries? Finance, healthcare, and retail have different threat models. Does the same feature importance hold? Or do industry-specific models perform better?
2. How often should models retrain? Threat landscape evolves. How frequently does TAPS need retraining? Monthly? Quarterly? Continuous learning?
3. Can we incorporate threat intelligence feeds? If a vulnerability is actively exploited in the wild, should its risk score dynamically increase? How do we integrate real-time threat data?
4. What about false positives in vulnerability scanners? Garbage in, garbage out. How robust is TAPS to noisy input data?
5. Can adversaries game the system? If attackers know the TAPS algorithm, can they manipulate features to appear low-risk? What’s the adversarial robustness?
How You Can Contribute
I’m actively seeking:
🤝 Collaboration Partners
- Security teams willing to pilot TAPS
- Data scientists interested in security applications
- Researchers for academic publication
📊 Real-World Data (Anonymized)
- Vulnerability scan outputs
- Configuration audit results
- Incident outcomes for validation
💡 Feedback and Corrections
- Methodological improvements
- Alternative algorithms to test
- Edge cases I’m missing
💼 Industry Validation
- Does this solve your actual problems?
- What features are missing?
- What deployment blockers exist?
Contact:
- LinkedIn: https://linkedin.com/in/cyb3rle0
- Email: [email protected]
Conclusion: Making Security Measurable
Security has operated too long on gut feelings, incomplete data, and reactive firefighting. We measure everything else in business — sales conversion, customer satisfaction, operational efficiency — but security remains opaque.
TAPS demonstrates that security risk CAN be:
- Measured (1–5 numerical score)
- Automated (thousands of assets in minutes)
- Consistent (same assessment regardless of analyst)
- Explained (clear risk drivers with coefficients)
- Actioned (clear prioritization guidance)
Is TAPS perfect? Absolutely not. It needs real-world validation, extension to more asset types, temporal modeling, and continuous refinement.
But it proves the concept: Machine learning can transform security from art to science.
The vulnerability management problem is solvable. The data exists. The algorithms work. What’s missing is adoption, iteration, and collaboration between security and data science communities.
This is my contribution to that collaboration. What’s yours?
Appendix: Technical Deep-Dive
Feature List (All 19)
Vulnerability Metrics:
max_cvss- Highest CVSS v3.1 score (0.0-10.0)vuln_count- Total vulnerability count (0-25)mean_cvss- Average CVSS score (0.0-10.0)has_critical- Any CVSS ≥ 9.0? (binary)
Configuration Quality: 5. cis_compliance_rate - Overall CIS compliance (0.0-1.0) 6. auto_compliance_rate - Automated controls (0.0-1.0) 7. manual_compliance_rate - Manual controls (0.0-1.0) 8. has_waf - WAF deployed? (binary) 9. has_ids - IDS active? (binary) 10. uptime_days - System age (1-500 days)
Business Context: 11. business_impact_score - Criticality (1-5) 12. financial_exposure - Potential loss (normalized) 13. threat_frequency - Attack likelihood (0.0-1.0) 14. environment - Prod/Staging/Dev (categorical) 15. business_unit - Organizational function (categorical) 16. data_classification - Public/Internal/Conf/Restricted (categorical) 17. loss_magnitude - Impact category (categorical) 18. is_external_facing - Internet accessible? (binary) 19. estimated_users - User base size (10-10,000)
Hyperparameters Used
LARS:
LassoLars(
alpha=0.01, # Regularization strength
max_iter=1000, # Convergence iterations
random_state=42
)LOESS (k-NN):
KNeighborsRegressor(
n_neighbors=50, # Local neighborhood size
weights='distance', # Inverse distance weighting
metric='euclidean'
)M5:
DecisionTreeRegressor(
max_depth=5, # Tree depth limit
min_samples_split=20, # Minimum split size
min_samples_leaf=10, # Minimum leaf size
random_state=42
)Performance Metrics Explained
R² (Coefficient of Determination):
R² = 1 - (SS_residual / SS_total)
where:
SS_residual = Σ(y_actual - y_predicted)²
SS_total = Σ(y_actual - y_mean)²Interpretation: Proportion of variance explained by model
MAE (Mean Absolute Error):
MAE = (1/n) × Σ|y_actual - y_predicted|Interpretation: Average prediction error in TAPS score points
RMSE (Root Mean Squared Error):
RMSE = √[(1/n) × Σ(y_actual - y_predicted)²]Interpretation: Penalizes large errors more than MAE
Further Reading
Academic Papers:
- Allodi & Massacci (2014) — Comparing Vulnerability Severity and Exploits
- Efron et al. (2004) — Least Angle Regression
- Hastie et al. (2009) — Elements of Statistical Learning
Industry Frameworks:
- NIST National Vulnerability Database
- CIS Security Benchmarks
- FAIR Risk Taxonomy
If this article helped you think differently about security automation, please: 👏 Clap (up to 50 times!)
💬 Comment with your thoughts
🔗 Share with your security team
📧 Subscribe for future security data science posts
Let’s make vulnerability management less painful, together.
Tags: #Cybersecurity #MachineLearning #VulnerabilityManagement #DataScience #SecurityAutomation #AI #InfoSec #ThreatIntelligence #SOC #RiskManagement #Python #Scikitlearn #EnterpriseSecurity #DevSecOps #PredictiveAnalytics