Here's my guess: They recorded his dialogue, then dubbed it to another tape that was running at a lower speed (say 90%), so it would cover less tape. When played back on a machine set to 100% it would fall back to the correct sync speed but be higher pitched.
The way Bagdasarian did it was to sing the songs normally, then listen to the tape (over headphones )slowed down and sing along with it to get the inflection right (if you deliberately try to talk slow you'll stretch some sounds unnaturally). When the resulting "slow" singing was sped up, it got all squeaky, but sounded naturally spoken because he was basing it off a real-time recording.
Making chipmunk voices is easy. I can do all three. Talk normal tone for Alvin, pitch your voice up a notch for Theodore, and go nasal for Simon. Record that and pitch it up and it's virtually indistinguishable from the original (sample of a vintage Chipmunks recording
Just need to use the standard settings on the typical reel-to-reel recorder (I forget what the two speeds are). I once tried to put together a parody bit, "The Chipmunks Sing Led Zeppelin", and the chipmunk voices are the easiest thing in the world. Just play back the song in question at the slower speed (in this case, "Stairway To Heaven") and sing along in your normal voice. Play the mix back at the higher speed and the music sounds normal, but suddenly it's Alvin singing along with Jimmy Page.
The project itself didn't go anywhere, but it was fun.