When a large language model first passed the United States Medical Licensing Exam in 2023, it was a big deal. But two years later, what was once a notable milestone in artificial intelligence progress is more of a bare minimum.
“It’s not enough for a large language model to simply answer medical test questions accurately,” said Nigam H. Shah, MBBS, PhD, chief data scientist at Stanford Health Care. “That type of evaluation doesn’t tell us anything about what matters.”
Read more: https://scopeblog.stanford.edu/2025/04/08/ai-artificial-intelligence-evaluation-algorithm/