2024-10-04 · HuggingFace

Introducing the Open FinLLM Leaderboard

modelsresearchcommentary

Introducing the Open FinLLM Leaderboard

Source: HuggingFace Date: 2024-10-04 URL: https://huggingface.co/blog/leaderboard-finbench

Summary

Leaderboard release: Open Financial LLM Leaderboard (OFLL) covers 40 tasks across 7 financial categories — information extraction, textual analysis, QA, text generation, risk management, forecasting, and decision-making — plus 4 Spanish tasks. Zero-shot evaluation only. Notable finding: smaller models (Llama-3.1-7B, internlm-7B) often outperform Llama-3.1-70B on forecasting tasks. GPT-4 and Llama 3.1 lead overall.

Implications

Thread: open-weights ecosystem health. Specialized financial benchmarks matter because general leaderboards tell you nothing about domain-specific performance. The smaller-outperforms-larger finding on forecasting tasks is practically important: it suggests that 7B-scale models fine-tuned on financial data may outperform raw 70B models on specific financial tasks, which changes the cost calculus for financial AI deployment. The inclusion of Spanish tasks is a useful multilingual signal for a domain where Spanish-language financial markets are significant. This is domain-specific benchmark infrastructure that open-weight financial model development can build on.

← all signals