FAMMA

🔔 News

Last updated: June 2024

🚀 [2025-05-20]: Release of answers and explanations for FAMMA-LivePro dataset; these responses are masked via Base64 encoding to prevent data leakage. 🌟

🚀 [2025-04-22]: Release of FAMMA-Reasoning dataset, an extension of FAMMA featuring richly annotated, high-quality reasoning chains distilled from DeepSeek-R1. 🌟

🚀 [2025-03-01]: Release of FAMMA-Basic-Txt, FAMMA-LivePro-Txt dataset, two purely textual datasets that utilize OCR to extract multimodal information and convert it into textual context for each question in FAMMA-Basic and FAMMA-LivePro, respectively! 🌟

🔥 [2025-02-01]: Initial release of FAMMA-LivePro dataset, collected from invited experts! 🌟

🔥 [2025-01-01]: Full release of FAMMA-Basic dataset, now including answers and explanations with enhanced quality! 🌟

🔥 [2024-10-06]: Initial public release of FAMMA benchmark (based on the FAMMA-Basic dataset), along with our paper FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering. We welcome all submissions and look forward to your participation! 😆

Introduction

FAMMA is a multilingual multimodal financial question-answering benchmark dataset with the following key features:

To minimize potential data contamination, new questions are released regularly, with a full benchmark refreshed every six months. Currently, the dataset includes four releases, totaling approximately 4,000 questions:
- FAMMA-Basic: it contains 1,935 questions sourced from online materials, with corresponding answers and explanations.
- FAMMA-Basic-Txt: a textual version of FAMMA-Basic, where OCR has been used to extract multimodal information and convert it into contextual text for each question.
- FAMMA-LivePro: it includes 103 expert-curated questions, provided without answers. This release serves as a live benchmark in its current stage.
- FAMMA-LivePro-Txt: A textual version of FAMMA-LivePro, generated using OCR to extract and represent the questions in text format.
The dataset features three diverse image types—tables, charts, and text/math screenshots—and spans eight financial subfields, ensuring broad coverage across major asset classes. Each question is categorized by difficulty level (easy, medium, hard) and is available in three languages (English, Chinese, and French). Additionally, questions are classified into two formats: multiple-choice and open-ended.

Leaderboard

Open-Source Proprietary

FAMMA-LivePro

Released on 2025-01

Model	Arithmetic (Pass@1)				Non-Arithmetic (Pass@1)				Overall (Pass@1)
Model	Overall	Easy	Medium	Hard	Overall	Easy	Medium	Hard	Overall	Easy	Medium	Hard

Overall results of different models on the FAMMA-LivePro leaderboard. The best-performing model in each category is in-bold, and the second best is underlined.
*: use OCR to extract the image content and pass to the model.
GPT o1 version: 2024-12-17, o1-mini version: 2024-09-12, 4o version: 2024-08-06
Deepseek-R1 version: 2025-01-20, Qwen-QwQ-32B version: 2025-03-05, Qwen-VL-Max version: 2025-01-25
Claude 3.5 Sonnet version: 2024-10-22
Gemini 2.0 Flash Thinking version: exp-0120, Gemini 1.5 Pro version: 002

FAMMA-Basic

Released on 2024-06

Model	Arithmetic (Pass@1)				Non-Arithmetic (Pass@1)				Overall (Pass@1)
Model	Overall	Easy	Medium	Hard	Overall	Easy	Medium	Hard	Overall	Easy	Medium	Hard

Overall results of different models on the FAMMA leaderboard. The best-performing model in each category is in-bold, and the second best is underlined.
*: use OCR to extract the image content and pass to the model.
GPT o1 version: 2024-12-17, o1-mini version: 2024-09-12, 4o version: 2024-08-06
Deepseek-R1 version: 2025-01-20, Qwen-QwQ-32B version: 2025-03-05, Qwen-VL-Max version: 2025-01-25
Claude 3.5 Sonnet version: 2024-10-22
Gemini 2.0 Flash Thinking version: exp-0120, Gemini 2.0 Flash version: exp, Gemini 2.0 Pro version: exp-0205, Gemini 1.5 Pro version: 002

BibTeX


@article{xue2024famma,
        title={FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering},
        author={Siqiao Xue, Xiaojing Li, Fan Zhou, Qingyang Dai, Zhixuan Chu, and Hongyuan Mei},
        journal={arXiv preprint arXiv:2410.04526},
        year={2024},
        url={https://arxiv.org/abs/2410.04526}}