Alipay Logo Ali Logo ZJU Logo TTIC Logo
logo

FAMMA

A Benchmark for Financial Domain Multilingual Multimodal Question Answering
 
Siqiao Xue1, Xiaojing Li1, Fan Zhou1, Qingyang Dai1, Zhixuan Chu2, Tingting Chen1, Zhiyi Mou2, Yujie Wei2, Hongyuan Mei3
1Alipay, 2Zhejiang University, 3TTIC
 

🔔 News

Badge Last updated: February 2024

🚀 [2025-02-01]: Initial release of FAMMA-LivePro dataset, collected from invited experts! 🌟

🚀 [2025-01-01]: Full release of FAMMA-Basic dataset, now including answers and explanations with enhanced quality! 🌟

🔥 [2024-06-01]: Initial public release of FAMMA benchmark (based on the FAMMA-Basic dataset), along with our paper FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering. We welcome all submissions and look forward to your participation! 😆

Introduction

FAMMA is a multilingual multimodal financial question-answering benchmark dataset with the following key features:

  • To minimize potential data contamination, new questions are released regularly, with a full benchmark refreshed every six months. Currently, the dataset includes two releases, totaling approximately 2,000 questions:
    • FAMMA-Basic: it contains 1,935 questions sourced from online materials, with corresponding answers and explanations.
    • FAMMA-LivePro: it includes 103 expert-curated questions, provided without answers. This release serves as a live benchmark in its current stage.
  • The dataset features three diverse image types—tables, charts, and text/math screenshots—and spans eight financial subfields, ensuring broad coverage across major asset classes. Each question is categorized by difficulty level (easy, medium, hard) and is available in three languages (English, Chinese, and French). Additionally, questions are classified into two formats: multiple-choice and open-ended.

Leaderboard

Open-Source Proprietary

FAMMA-LivePro

Released on 2025-01
Model Arithmetic (Pass@1) Non-Arithmetic (Pass@1) Overall (Pass@1)
Overall Easy Medium Hard Overall Easy Medium Hard Overall

Overall results of different models on the FAMMA-LivePro leaderboard. The best-performing model in each category is in-bold, and the second best is underlined.
*: use OCR to extract the image content and pass to the model.
GPT o1 version: 2024-12-17, o1-mini version: 2024-09-12, 4o version: 2024-08-06
Deepseek-R1 version: 2025-01-20, Qwen-QwQ-32B version: 2025-03-05, Qwen-VL-Max version: 2025-01-25
Claude 3.5 Sonnet version: 2024-10-22
Gemini 2.0 Flash Thinking version: exp-0120, Gemini 1.5 Pro version: 002

FAMMA-Basic

Released on 2024-06
Model Arithmetic (Pass@1) Non-Arithmetic (Pass@1) Overall (Pass@1)
Overall Easy Medium Hard Overall Easy Medium Hard Overall

Overall results of different models on the FAMMA leaderboard. The best-performing model in each category is in-bold, and the second best is underlined.
*: use OCR to extract the image content and pass to the model.
GPT o1 version: 2024-12-17, o1-mini version: 2024-09-12, 4o version: 2024-08-06
Deepseek-R1 version: 2025-01-20, Qwen-QwQ-32B version: 2025-03-05, Qwen-VL-Max version: 2025-01-25
Claude 3.5 Sonnet version: 2024-10-22
Gemini 2.0 Flash Thinking version: exp-0120, Gemini 2.0 Flash version: exp, Gemini 2.0 Pro version: exp-0205, Gemini 1.5 Pro version: 002

BibTeX


@article{xue2024famma,
        title={FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering},
        author={Siqiao Xue, Tingting Chen, Fan Zhou, Qingyang Dai, Zhixuan Chu, and Hongyuan Mei},
        journal={arXiv preprint arXiv:2410.04526},
        year={2024},
        url={https://arxiv.org/abs/2410.04526}}