In the quest for better AI, survivorship bias hampers development

Terence Tao of the University of California, one of the foremost mathematicians of our time, pointed out that "[...] so much knowledge is somehow trapped in the head of individual mathematicians. And only a tiny fraction is made explicit", and that a "[a] lot of the intuition is not captured in the printed papers in journals, but in conversations with mathematicians, in lectures and in the way we advise students."

Made me think that much of how current LLMs are trained in a way that doesn't take into account failed human efforts. As such, "survivorship bias" takes hold, as the model only retrieves based on training data is made up of curated information available from the internet, i.e. it misses many of the complicated and non-successful stories/papers/conversations that were never published or recorded electronically.

Tao points out that:

"The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it."
undefined
A popular visual representation of survivorship bias. This demonstrative diagram shows where returning WW2-era planes were hit. Suggesting that the red clusters should be reinforced exemplifies selection bias, as the planes analyzed did not include any with severe enough damage to crash and not return, only planes with light enough damage to return home. Wikipedia.

Quotes from Scientific American:

AI Will Become Mathematicians’ ‘Co-Pilot’
Fields Medalist Terence Tao explains how proof checkers and AI programs are dramatically changing mathematics
Show Comments