update

2026-01-22 22:01:07 +01:00
parent 9910bd202a
commit 02b00ee108
122 changed files with 51725 additions and 4768 deletions
--- a/storage/PHJDAPH9/.zotero-ft-cache
+++ b/storage/PHJDAPH9/.zotero-ft-cache
@@ -0,0 +1,59 @@
+Skip to main content
+Computer Science > Computation and Language
+arXiv:2305.14251 (cs)
+[Submitted on 23 May 2023 (v1), last revised 11 Oct 2023 (this version, v2)]
+FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
+Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi
+View PDF
+Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. We conduct an extensive human evaluation to obtain FACTSCOREs of people biographies generated by several state-of-the-art commercial LMs -- InstructGPT, ChatGPT, and the retrieval-augmented PerplexityAI -- and report new analysis demonstrating the need for such a fine-grained score (e.g., ChatGPT only achieves 58%). Since human evaluation is costly, we also introduce an automated model that estimates FACTSCORE using retrieval and a strong language model, with less than a 2% error rate. Finally, we use this automated metric to evaluate 6,500 generations from a new set of 13 recent LMs that would have cost $26K if evaluated by humans, with various findings: GPT-4 and ChatGPT are more factual than public models, and Vicuna and Alpaca are some of the best public models. FACTSCORE is available for public use via `pip install factscore`.
+Comments:	25 pages; 7 figures. Published as a main conference paper at EMNLP 2023. Code available at this https URL
+Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
+Cite as:	arXiv:2305.14251 [cs.CL]
+ 	(or arXiv:2305.14251v2 [cs.CL] for this version)
+ 	
+https://doi.org/10.48550/arXiv.2305.14251
+Focus to learn more
+Submission history
+From: Sewon Min [view email]
+[v1] Tue, 23 May 2023 17:06:00 UTC (2,490 KB)
+[v2] Wed, 11 Oct 2023 05:27:50 UTC (2,491 KB)
+
+Access Paper:
+View PDFTeX Source
+view license
+Current browse context: cs.CL
+< prev next >
+
+newrecent2023-05
+Change to browse by: cs cs.AI cs.LG
+References & Citations
+NASA ADS
+Google Scholar
+Semantic Scholar
+1 blog link (what is this?)
+Export BibTeX Citation
+Bookmark
+Bibliographic Tools
+Bibliographic and Citation Tools
+Bibliographic Explorer Toggle
+Bibliographic Explorer (What is the Explorer?)
+Connected Papers Toggle
+Connected Papers (What is Connected Papers?)
+Litmaps Toggle
+Litmaps (What is Litmaps?)
+scite.ai Toggle
+scite Smart Citations (What are Smart Citations?)
+Code, Data, Media
+Demos
+Related Papers
+About arXivLabs
+Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
+About
+Help
+Contact
+Subscribe
+Copyright
+Privacy Policy
+Web Accessibility Assistance
+
+arXiv Operational Status 
--- a/storage/PHJDAPH9/2305.html
+++ b/storage/PHJDAPH9/2305.html