Wednesday, February 4, 2026

Principled Coarse-Grained Acceptance for Speculative Decoding in Speech


Speculative decoding accelerates autoregressive speech technology by letting a quick draft mannequin suggest tokens {that a} bigger goal mannequin verifies. Nevertheless, for speech LLMs that generate acoustic tokens, actual token matching is overly restrictive: many discrete tokens are acoustically or semantically interchangeable, decreasing acceptance charges and limiting speedups. We introduce Principled Coarse-Graining (PCG), which verifies proposals on the stage of Acoustic Similarity Teams (ASGs) derived from the goal mannequin’s embedding area. By splitting every token’s likelihood mass throughout the overlapping teams that include it, we outline an overlap-aware coarse-grained distribution and carry out rejection sampling on the ensuing group variable. This yields an exactness assure on the group stage whereas permitting the accepted draft token to face in for any member of the group in apply. On LibriTTS, PCG will increase acceptance and throughput relative to plain speculative decoding and prior speech-specific relaxations whereas sustaining intelligibility and speaker similarity. These outcomes recommend acoustically conscious, group-level acceptance as a easy and basic technique to speed up speech token technology whereas sustaining speech high quality.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles