Apple researchers build a context understanding benchmark and find quantized models degrade unevenly
A new benchmark from Apple ML Research probes LLMs on four distinct context understanding tasks across nine datasets, finding that pretrained dense models struggle with nuanced contextual features and that 3-bit quantization causes variable performance drops.