Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

How C3 AI agents automate predictive maintenance for Shell

June 5, 2026

How E.ON modernizes the grid with AI using SAP S/4HANA

June 4, 2026

GitHub Copilot users experience token-based price increases

June 2, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Sunday, June 7
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Why we need a small, specialized, locally executable model for cyber defense
Tools

Why we need a small, specialized, locally executable model for cyber defense

versatileaiBy versatileaiMay 9, 2026No Comments9 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

samuel avatar

Built for AMD Developer Hackathons · Training on a single AMD Instinct MI300X · Apache 2.0

why is this important

The Frontier model is superior in so many ways. They are also expensive to make calls, send all prompts to someone else’s data center, and are explicitly trained to deny true defenders logs of incidents, attacker-level payloads found in their own logs, and nasty edge cases present in vulnerability exposure drafts.

Defensive cybersecurity is no place for these trade-offs.

Confidential evidence remains internal. SOC analysts triage leaked credential dumps, malware reverse engineers analyze samples, and vulnerability researchers create CVEs. None of them should paste that content into a hosted API. The data itself can be compromised. Composite API cost per call. A medium-sized SOC processes thousands of unreliable alerts per day. The cost of a hosted API to “describe this CVE” or “what CWE applies here” turns defensive automation into a budget issue. Air-gapped and partially connected environments are the norm, not the exception, in critical infrastructure, healthcare, and government operations. If a tool can’t run on a laptop or a single on-premises GPU, it won’t ship there. Attackers are becoming more automated. Ransomware gangs use LLM to create phishing drafts in 30 languages. Bug bounty automation tools chain agent tools to fuzz, triage, and exploit faster than a human can review. To play defense at the same speed, you need a model that the defender can own and run with.

So it’s a local issue. But “local” alone is not enough.

Why not just a small model, but a small specialized model?

A 70B generalist running locally on four GPUs is “local” but not deployable. A 4B generalist running locally on a single consumer GPU is deployable, but nothing beats an 8B specialist for the actual work you need to do.

The bet behind CyberSecQwen-4B is that for narrow-scope, well-reviewed cyber threat intelligence tasks such as CWE classification, CVE-to-CWE mapping, and structured CTI Q&A, the 4B can be carefully tweaked to match or exceed 8B specialists while still fitting into 12GB consumer cards.

We tested this against the strongest public baseline we could find, Cisco’s Foundation-Sec-Instruct-8B. This was evaluated based on a proprietary published protocol on CTI-Bench.

Metric (CTI bench, n=5, temperature 0.3) CyberSecQwen-4B Foundation-Sec-Instruct-8B Δ CTI-MCQ (2,500 items) 0.5868 ± 0.0029 0.4996 +8.7 pp CTI-RCM (1,000 CVE→CWE items) 0.6664 ± 0.0023 0.6850 −1.9 pp Parameter 4 B 8 B Half size

CyberSecQwen-4B maintains 97.3% of Foundation-Sec-Instruct-8B’s CTI-RCM accuracy while outperforming CTI-MCQ scores by +8.7 points with half the number of parameters. This is the only number that matters when the defender chooses what to place.

5 minute walkthrough

The 5-minute video below explains the training methodology, AMD MI300X workflow, and benchmark results in a more visual format. If you want to read everything in detail, the rest of this article covers the same in its exact structure.

Why choose AMD MI300X?

The entire pipeline (training, adapter merging, and evaluation) runs end-to-end on a single AMD Instinct MI300X 192 GB instance via AMD Developer Cloud. With the combination of 192 GB HBM3 and ROCm 7’s vLLM stack, you no longer have to worry about quantization tricks, gradient checkpoints, or splitting models across devices. Full bf16, FlashAttendant-2 forward + backward, batch size 4, sequence length 4096 — all on one GPU.

Component Version Hardware AMD Instinct MI300X 192 GB · gfx942 ROCm 7.0 Docker vllm/vllm-openai-rocm:latest PyTorch 2.6.0 (ROCm) flash-attn 2.8.3 vLLM 0.10.1 Latest transformers / peft / trl when training

The train.sh recipe is hardware independent. To run on other 40GB+ datacenter GPUs, remove the AMD-specific environment variables (no action required elsewhere) and reinstall flash-attn from the appropriate wheel. We tested portability by training sister models on separate stacks. Details are explained below.

training data

Two corpora, both released with Apache-2.0-clean:

The 2021 CVE → CWE mapping is sourced from MITER/NVD public records. Importantly, all overlaps with CTI-Bench’s evaluation set are deduplicated before training, so the benchmark numbers above are honest holdouts outside the distribution and not contamination. Comprehensive Defense Analyst Q&A based on deduplicated CVE explanations. Produced using a more powerful teacher and licensed under the Apache-2.0 license for redistribution.

The base model is Qwen3-4B-Instruct-2507, which is 4B instruction-tuned with Apache-2.0 and had the best performance among the available 4B-class IT models during training. Intentionally tweaking IT checkpoints (not basic). This preserves the concise answer multiple choice format that was erased by the collapse of IT to SFT before the IT path was established.

There is a measurable impact here that is worth reporting.

Model CTI-RCM CTI-MCQ Qwen3-4B-Instruct-2507 (raw IT) 0.519 0.473 CyberSecQwen-4B (tweak this) 0.6664 0.5868

The IT base significantly reduces MCQ accuracy compared to the underlying pre-trained base. This is exactly the same pattern that Cisco reports for Foundation-Sec-Instruct and Foundation-Sec Base: MCQs break down due to instruction tuning. Our fine-tuning recovers and exceeds IT’s starting point in both benchmarks and restores the format binding that IT eroded in achieving domain lift.

recipe

LoRA r = 64 LoRA alpha = 64 # alpha/r = 1.0 LoRA dropout = 0.05 LR = 5e-5 # cosine, warmup ratio 0.03 epoch = 10 precision = bf16 attention = FlashAttendant-2 (forward + backward) maximum sequence length = 4096 batch = 4 (no accumulation) optimizer = paged_adamw_8bit

On Qwen, FlashAttendant-2 is turned on because the head dimensions (128) are well within the MI300X (gfx942) shared memory budget. With this configuration, the step time settles at approximately 7.85 seconds/step. This is approximately 1.6x faster than the same recipe for the companion Gemma-4-E2B base model, which cannot use FA2 in the global attention layer (head_dim=512 exceeds the LDS budget) and falls back to sdpa.

Companion model: same recipe, different board

To check whether the results are recipe-driven or substrate-specific, we trained a sister model, Gemma4 Defense-2B, using the exact same training corpus and hyperparameters, simply swapping the base model to Gemma-4-E2B-it.

Model CTI-RCM (5-trial average ± standard) CTI-MCQ CyberSecQwen-4B (Qwen-based) 0.6664 ± 0.0023 0.5868 ± 0.0029 Gemma4 Defense-2B (Gemma-based) 0.6754 ± 0.0035 0.6042 ± 0.0090

The two models converge to within 0.9 points on CTI-RCM. The recipe will be passed on. It’s not about which families you target, it’s about how you fine-tune your IT checkpoints. CyberSecQwen-4B is Apache 2.0 and is ideal if Gemma’s terms of service are an issue. If a 2B fits your deployment budget more comfortably than a 4B, the Gemma4 Defense-2B is a good choice.

Issues and fixes

No AMD ROCm project ships without a War Stories section. Below is the shortened version.

Issue fix FA2 fails on Gemma-4 with head_dim=512 Fallback to sdpa in global attention layer. The local attention layer still uses FA2. The same recipe is up to 1.6 times slower compared to Kwen. AITER kernel conflicts with CyberPal-2.0-20B service. Set VLLM_ROCM_USE_AITER=0 for that particular evaluation. AMD environment variables do nothing outside of ROCm, so they remain in the recipe. bitsandbytes is not officially supported in ROCm. I didn’t need the 4/8 bit anyway. 192 GB is plenty of headroom. Use paged_adamw_8bit (bnb’s optimizer-only path works). vLLM ROCm + Evaluation Chat Template Uses the TRITON_ATTN backend. Explicitly pass chat_template.jinja from the merged model directory to avoid overriding IT-based templates. HF-Spaces ZeroGPU Quota for Demo Anonymous visitors reached the limit of 2 minutes per day per IP. The demo space (cybersecqwen-chat) uses HF OAuth on the client side, so each visitor’s calls are billed based on their own quota (free 3.5 minutes/day, pro 25 minutes/day).

try it yourself

Live Demo (Sign in with HF for free quota): 👉 https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/cybersecqwen-chat

Model: 👉 https://huggingface.co/lablab-ai-amd-developer-hackathon/CyberSecQwen-4B

3 lines of inference (12 GB or higher GPU):

from transformer import AutoModelForCausalLM, AutoTokenizer
import Torch model ID = “lablab-ai-amd-developer-hackathon/CyberSecQwen-4B”
tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=“Auto”) message = ( {“role”: “system”, “content”: “You are a Defensive Cybersecurity Assistant. Please first provide your canonical CWE-ID, then provide a 1-3 sentence justification.”},{“role”: “user”, “content”: “Path traversal in Java web apps where user-controlled input is concatenated to the File() path. What is CWE?”}, ) prompt = tok.apply_chat_template(message, tokenize=erroradd_generation_prompt=truth) out = model.generate(**tok(prompt, return_tensors=“pt”).to(model.device), max_new_tokens=256temperature =0.3)
print(tok.decode(out(0), skip_special_tokens=truth))

To provide high-throughput services, vLLM works out-of-the-box on AMD MI300X via the official vllm/vllm-openai-rocm image. See the GitHub repository for exact service delivery commands and fixed configuration.

Purpose of use

CyberSecQwen-4B is built for security professionals working on:

CWE taxonomy — maps vulnerability descriptions (CVEs, advisories) to MITER CWE categories CTI Q&A — answers structured questions about cybersecurity concepts, attacks, and controls Defensive triage assistance — supports human analysts in triaging CVEs, prioritizing patches, and documenting threat actor behavior

It is not expressly intended for the generation of exploit code or weaponized proof of concept, automated execution of security decisions without review by a qualified human, legal/medical/regulatory advice context, or general chat/code generation outside of cybersecurity. This recipe was built with narrow practicality in mind, not breadth.

what’s next

Here are some directions we would like to expand in, roughly in order of priority.

1B variant for laptop-class deployments. Qwen2.5-1.5B or Llama-3.2-1B as base, same recipe, target ≥0.55 CTI-RCM (within 6 pp of 4B). GGUF release (Q4_K_M, Q5_K_M) where the model is quantized to run on the phone/edge box. ~2.5 GB on Q4_K_M is well within the range of ARM laptop memory. It is continually evaluated as new CVE to CWE mappings are published. The 2021 cohort had an intentional distribution cap. We plan to track NVD growth in future versions. Resilience in adversarial cases. Specialist models are only better in the worst case. We would like to expose a hardening path for a common prompt injection pattern in CVE-description-as-input attacks.

If any of these would like to unblock your team, please open an issue in our GitHub repository. This is the quickest way to queue your team.

At the end

The frontier model debate has been about scale for two years. Defense and cyber conversations need to talk about what actually fits where it’s needed. A 4B specialist that matches the 8B in half size, runs on a card that researchers can purchase, and does not send sensitive evidence off-premises. This is a useful corner of the design space, and the AMD MI300X + ROCm 7 + Hugging Face training stack allows us to occupy that corner in a single training run.

Try the demo, read the model cards, and file questions. If a recipe has been ported to something you haven’t tried yet, that’s the next most interesting data point.

— athena129 · AMD Developer Hackathon Submission

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI helps reduce burden on UK NHS
Next Article RingCentral adds Shopify, Calendly, and WhatsApp to its AI receptionist
versatileai

Related Posts

Tools

How C3 AI agents automate predictive maintenance for Shell

June 5, 2026
Tools

How E.ON modernizes the grid with AI using SAP S/4HANA

June 4, 2026
Tools

GitHub Copilot users experience token-based price increases

June 2, 2026
Add A Comment

Comments are closed.

Top Posts

TCL launches A400 Pro QD-Mini LED Art TV with 4K 144Hz, AI art generation, and gallery-style design

November 30, 202595 Views

Switzerland releases its own completely open AI model

September 4, 202571 Views

The Colorado AI Act was delayed until June 2026

September 21, 202559 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

TCL launches A400 Pro QD-Mini LED Art TV with 4K 144Hz, AI art generation, and gallery-style design

November 30, 202595 Views

Switzerland releases its own completely open AI model

September 4, 202571 Views

The Colorado AI Act was delayed until June 2026

September 21, 202559 Views
Don't Miss

How C3 AI agents automate predictive maintenance for Shell

June 5, 2026

How E.ON modernizes the grid with AI using SAP S/4HANA

June 4, 2026

GitHub Copilot users experience token-based price increases

June 2, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?