Is ChatGPT A Good Translator?

Is ChatGPT A Good Translator? A Preliminary Study

摘要

  • 提供了ChatGPT对机器翻译的初步评估,包括翻译提示性、多语言翻译和翻译的鲁棒性。我们采用ChatGPT建议的提示来触发其翻译能力,并发现候选提示通常效果很好,并表现出微小的性能差异。
  • ChatGPT在高资源的欧洲语言上与商业翻译产品(例如谷歌翻译)具有竞争能力,但在低资源或遥远的语言上明显落后。
  • For distant languages, we explore an interesting strategy
    named pivot prompting that asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, which improves the translation performance significantly.
  • 至于翻译的健壮性,ChatGPT在生物医学摘要或Reddit注释上的性能不如商业系统,但它可能是一个很好的口语翻译器。

动机

We are particularly interested in how ChatGPT performs for machine translation tasks, especially the gap between ChatGPT and commercial translation products.

  1. Translation Prompt 翻译提示
  2. Multilingual Translation 多语种翻译
  3. Translation Robustness 翻译的鲁棒性

实验和结果

Translation Prompt


To design the prompts for triggering the machine translation ability of ChatGPT, we seek inspiration from ChatGPT by asking it for advice.
Thus, we summarize them into three candidate prompts,where
[SRC] and [TGT] represent the source and target languages of translation.

TP3 performs the best in terms of all the three metrics. Thus, we use TP3 throughout this report by default.

Multilingual Translation

  • Specifically, we ask ChatGPT to translate the source sentence into a high-resource pivot language (i.e., English by default) first and then into the target language.

Translation Robustness

总结与启发

可以在对chatGPT解决某一任务做实验的过程中发现chatGPT的缺点,并提出一定的解决方案,作出改进,比如本文中的pivot策略。

An Analysis of the Automatic Bug Fixing Performance of ChatGPT

An Analysis of the Automatic Bug Fixing Performance of ChatGPT(ChatGPT在Bug自动修复的性能分析)

摘要

  • evaluate ChatGPT on the standard bug fixing benchmark set, QuixBugs, and compare the performance with the results of several other approaches reported in the literature.
    在标准bug修复基准集Quixbug上评估ChatGPT,并将其性能与文献中报道的其他几种方法的结果进行比较。
  • ChatGPT’s bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches.
    ChatGPT的bug修复性能与常见的深度学习方法CoCoNut和Codex相比具有竞争力,并且明显优于标准程序修复方法报告的结果。
  • ChatGPT offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered. By providing such hints to ChatGPT, its success rate can be further increased
    ChatGPT提供了一个对话系统,通过该系统可以输入进一步的信息,例如,某个输入的预期输出或观察到的错误信息。通过向ChatGPT提供这些提示,可以进一步提高其成功率。

动机

  • The bug fixing performance of ChatGPT is so far unclear.

方法

  1. first ask ChatGPT for bug fixes for the selected benchmarks and manually check whether the suggested solution is correct or not.
    首先询问ChatGPT对所选基准测试的bug修复,并手动检查建议的解决方案是否正确。
  2. study and categorize ChatGPT’s answers to gain a deeper understanding of its behavior.
    研究和分类ChatGPT的答案,以获得更深入地了解它的行为。
  3. provide a small hint to the model (e.g., a failing test input with an error it produces) to see if it improves ChatGPT’s fix rate.
    为模型提供了一个小提示(例如,一个失败的测试输入并产生一个错误),看看它是否提高了ChatGPT的修复率。

对于QuixBugs中的40个基准测试问题中的每一个,使用错误的Python代码,删除所有包含的注释,并询问ChatGPT代码是否包含bug以及如何修复它。对于每个基准测试问题,向ChatGPT发出几个独立的请求,并手动检查给定的答案是否正确。通过对每个查询使用相同的格式来标准化我们的过程。

Fig 1
expect from ChatGPT an answer that addresses the bug in line 7, where n ˆ= n - 1 should be replaced with n &= n - 1, either with a response containing the complete code snippet with the fixed bug (correctly addressed) or by
giving an exact and correct description how to change the affected code lines.

结果

对比结果

Fig 2
a checkmark (✓) indicates that a correct answer was given in at least one of the four runs for a benchmark problem. A cross (✗) indicates that no correct answer was given in any of the runs.

  • for some problems, ChatGPT suggests a complete re-implementation which is then bug-free.
  • these are probably no real bug fixes, since the introduced bug is not localized. We assume that ChatGPT simply reproduced what it has learned here.
  • Furthermore, we do not count a bug as fixed if additional changes suggested by ChatGPT introduce new errors that prevent the program from running properly.

对回答分类

Fig 3
Fig 4

与其对话

Fig 5
give ChatGPT an exact input example and the resulting error message from Python (lines 17–19)

  • human input can be of much help to an automated APR system, with ChatGPT providing means to do so.

启发

  1. 将chatGPT在某一基准数据集下的推荐性能与其他模型进行比较,分析推荐任务方面的性能
  2. 突出其对话的特性,使其在后续follow-up中性能有所提高,如本文与其他模型相比而言平平无奇,但加入和系统对话,为chatgpt提供更多信息或提示后,优越性立马体现出来。