Here's my guess: They recorded his dialogue, then dubbed it to another tape that was running at a lower speed (say 90%), so it would cover less tape. When played back on a machine set to 100% it would fall back to the correct sync speed but be higher pitched.
The way Bagdasarian did it was to sing the songs normally, then listen to the tape (over headphones )slowed down and sing along with it to get the inflection right (if you deliberately try to talk slow you'll stretch some sounds unnaturally). When the resulting "slow" singing was sped up, it got all squeaky, but sounded naturally spoken because he was basing it off a real-time recording.
Making chipmunk voices is easy. I can do all three. Talk normal tone for Alvin, pitch your voice up a notch for Theodore, and go nasal for Simon. Record that and pitch it up and it's virtually indistinguishable from the original (sample of a vintage Chipmunks recording