Abstract
Quantization Index Modulation (QIM) steganography in the Linear Predictive Coding (LPC) domain has emerged as an effective approach for speech steganography, offering high imperceptibility and statistical undetectability, also resulting in low detection accuracy for steganalysis, especially for short samples in low embedding rates. Global and local features are fundamental to VoIP steganography analysis as they comprehensively characterize statistical perturbations across four codeword correlation dimensions induced by embedding well-established research focus. However, conventional pipeline architectures neglect cross-scale and inter-scale feature interactions, while VoIP speech’s temporal dynamics exhibit distinct codeword correlation patterns at varying time scales. We propose a multi-scale steganalysis structure that characterizes different codeword correlation features to address these limitations. The framework incorporates novel Global–Local Interaction (GLI) modules for adaptive fusion of cross-scale and in-scale features to achieve multi-scale blending, and designs a Multi-Predictor Mixing module that leverages the complementary predictive capabilities of hierarchical feature representations for classification. Our experiments demonstrate that the proposed model outperforms existing methods in detection accuracy, particularly with short samples at low embedding rates, while also meeting real-time processing requirements.
| Original language | English |
|---|---|
| Article number | 440 |
| Journal | Multimedia Systems |
| Volume | 31 |
| Issue number | 6 |
| DOIs | |
| State | Published - Dec 2025 |
Keywords
- Interactive feature extraction
- LPC parameters
- Multi-scale architecture
- VoIP steganalysis