That would be great, if one picture = one concept. But different people look at different pictures and see different things or different focal points, not necessarily what the publisher thought was the only/obvious focal point. And the pictures are re-used throughout the life of the software, too, which complicates things further. It's uncomplicated when the picture is a red blob meaning "red", but when there's a picture of a man standing next to a red airplane, what is the sentence saying? Its it about the color, the position, the actor, the relationship between the two, an action that is implied?
For example, if I remember correctly, the same picture stands for "one woman" and "she is wearing a bathing suit" on the Mohawk RS (or something like that). I was reasonably certain we were talking about "one woman" for the first time, but the second time, it was tough to tell -- was it the glasses? The position she was in? I never even *thought* about the bathing suit -- mostly because I certainly didn't expect to learn the word "bathing suit" in Lesson 1 of a language program (the fact that Mohawk uses different incorporated forms with verb-noun combinations for wearing different pieces of clothing totally aside -- again, RS's intent was probably "Oh, "wear + Item-of-clothing" is pretty easy", which it is in some languages. But it's sloppy to assume that all languages are best taught in the same order.)
I've probably said before, I've learned more from RS by stripping out the mp3 files, inserting the English (available on RS web site for English programs) into the "lyrics" field on my iPod for each utterance, and playing them on shuffle setting. At least then you get the randomness and you definitely know what's being said. The usefulness of the language can be debated (IRL I rarely comment on the fact that a boy is sitting on an airplane, or that two children are jumping) but it's easier to get the frequent aural repetition with an ability to check the precise meaning that way. But after using the product for awhile, both ways, I'm still unable to have any kind of meaningful exchange, and I'm a pretty experienced language learner.
If you absolutely, positively cannot get anyone to teach you a language, RS is probably no better and no worse than any other self-study program, and at least it's visually interesting and has the "this is cool tech" advantage. But if you're living surrounded by native speakers, I'd really recommend teaching one of them to teach you the way you want to be taught. Even the mere fact that the content will be related to your actual life will help immeasurably. Otherwise you need to buy airplanes and yellow cars really fast.
