endolith/.gitignore

Last active September 20, 2024 00:21

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/endolith/e001d8b7811699cf9be822a774e7cb67.js"></script>
Save endolith/e001d8b7811699cf9be822a774e7cb67 to your computer and use it in GitHub Desktop.

Download ZIP

I tried to plot AGI on the Chatbot Arena Elo scale by comparing to "both bad" and "tie" votes

Raw

	*.json
	*checkpoint.ipynb

Raw

I tried to plot AGI on the same Elo scale by comparing to "both bad" and "tie" votes

(Or, rather, I had an LLM write it for me. (But another LLM checked it and said it was correct, so...))

When a battle is voted as a tie, the "ideal model" is also considered to have tied with both. When a battle is voted as "both bad", then the ideal model is considered to have beaten both. So it acts as an upper bound for Elo scores, and since the judgments are from humans, a model that scores that well all the time would be human-equivalent?

https://gist.github.com/endolith/e001d8b7811699cf9be822a774e7cb67

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/discussions/67

Raw

chatbot-arena-leaderboard-calculation-bradley-terry-model.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Author

endolith commented Sep 20, 2024 •

edited

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment