🇦🇺 AusCyberBench Evaluation Dashboard

Australia's First LLM Cybersecurity Benchmark • 13,449 Tasks • 25 Open Models

Evaluate proven open language models on Australian cybersecurity knowledge including Essential Eight, ISM Controls, Privacy Act, SOCI Act, and ACSC Threat Intelligence.

Recommended models have been tested: Qwen2.5-3B (55.6%), DeepSeek (55%), TinyLlama (33%)

⚙️ Evaluation Settings

10 500
0.1 1
8 256

📋 Model Selection

💾 Persistent Results: Run 1-2 models at a time to avoid GPU timeouts. Results merge with the leaderboard automatically!

✅ Recommended (Tested)

🛡️ Cybersecurity-Focused

Small Models (1-4B)

Medium Models (7-12B)

Reasoning & Analysis

Diverse & Multilingual

⚡ GPU Limits

Free tier: 60-second limit

  • ✅ 1-2 models: Safe
  • ⚠️ 3-5 models: May timeout
  • ❌ 6+ models: Will timeout

📊 Persistent Leaderboard

💾 Results persist across sessions! Run models one at a time to build up a complete leaderboard.

  • New runs merge with existing results
  • Best score per model is kept
  • Perfect for avoiding GPU timeouts

Leaderboard


Dataset: Zen0/AusCyberBench • 13,449 tasks | Models: 25 open LLMs (no gated models) | License: MIT