SANS Stormcast Monday Mar 3rd: AI Training Data Leaks; MITRE Caldera Vuln; modsecurity bypass


Episode Artwork
1.0x
0% played 00:00 00:00
Mar 02 2025 7 mins   526 1 0


Common Crawl includes Common Leaks

The "Common Crawl" dataset, a large dataset created by spidering website, contains as expected many API keys and other secrets. This data is often used to train large language models

https://trufflesecurity.com/blog/research-finds-12-000-live-api-keys-and-passwords-in-deepseek-s-training-data

Github Repositories Exposed by Copilot

As it is well known, Github's Copilot is using data from public GitHub repositories to train it's model. However, it appears that repositories who were briefly left open and later made private have been included as well, allowing Copilot users to retrieve files from these repositories.

https://www.lasso.security/blog/lasso-major-vulnerability-in-microsoft-copilot

MITRE Caldera Framework Allows Unauthenticated Code Execution

The MITRE Caldera adversary emulation framework allows for unauthenticted code execution by allowing attackers to specify compiler options

https://medium.com/@mitrecaldera/mitre-caldera-security-advisory-remote-code-execution-cve-2025-27364-5f679e2e2a0e

modsecurity Rule Bypass

Attackers may bypass the modsecurity web application firewall by prepending encoded characters with 0.

https://github.com/owasp-modsecurity/ModSecurity/security/advisories/GHSA-42w7-rmv5-4x2j