By using qualitative case study research method, this research utilized Vygotsky’s Sociocultural Theory as a lens to visualize the multimodal literacy-learning that occurred in the activity of picture-based storytelling. I found that the visual picture mode allowed the students to create the diversified meanings from the picture but the created meanings were restrictedly within the individual student. However, the verbal speech mode as a channel or platform allowed the students’ created meanings to flow to other students through the meaning negotiation process. The student’s created meanings from discussion first took place on the interpsychological plane between the students, followed by occurring on the intrapsychological plane of individual student. Implication of the research is that the verbal discussion is very important for meanings understanding even though many evidences showed that pictures, images or movies make some information more clear and effective than verbal speech.