agentforge-benchmarks
cargoBenchmark comparison: runs agents against GAIA, AgentBench, and WebArena tasks and reports percentile vs. published baselines (v2 F-05)
Audits
No audits for this package yet.
Benchmark comparison: runs agents against GAIA, AgentBench, and WebArena tasks and reports percentile vs. published baselines (v2 F-05)