In the quest for better AI, survivorship bias hampers development
Terence Tao of the University of California, one of the foremost mathematicians of our time, pointed out that "[...] so much knowledge is somehow trapped in the head of individual mathematicians. And only a tiny fraction is made explicit", and that a "[a] lot of the intuition is not captured in the printed papers in journals, but in conversations with mathematicians, in lectures and in the way we advise students."
Made me think that much of how current LLMs are trained in a way that doesn't take into account failed human efforts. As such, "survivorship bias" takes hold, as the model only retrieves based on training data is made up of curated information available from the internet, i.e. it misses many of the complicated and non-successful stories/papers/conversations that were never published or recorded electronically.
Tao points out that:
"The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it."
Quotes from Scientific American: