![]() |
Testing with ChildrenGetting Reliable Information from KidsMary F. Schraidt |
|||
Children are a vital part of today's market; a part that, with the growing trend toward segmentation, is becoming even more important. Consequently, those who produce goods meant for children generally recognize the need for data that will facilitate intelligent decisions about this group. Children, even very young children, discriminate and show preferences. The challenge is to reliably predict these preferences. How, in practice, does one predict a child's behavior vis-a-vis a cereal, candy bar, soft drink, toy, or whatever?
The situation is rife with confusion. The history of evaluation with children has been hit-and-miss and largely based on intuition. Most approaches seem to have had at least some success, but the literature is scant and questions about the relative effectiveness of procedures abound.
ASTM Committee E-18 on Sensory Evaluation of Materials and Products and its Subcommittee E18.08 on Sensory Evaluation by Consumers has added its voice by establishing a task group that focuses on testing with children. They are in the process of re- viewing status, objectives, research methods, findings, and needs in the area. In cooperation with E-18's Seminar Committee, which is dedicated to generating interest in Committee E-18 by providing exposure for its objectives and achievements, the task group organized a session on testing with children. The well-attended seminar took place at the Spring meeting in San Francisco on May 24, 1990. There were two presentations, both focusing on research with children, but contrasting in style. Together they demonstrated the diversity and complexity of the problem.
The Qualitative Approach
Dr. Valeria Lovelace, director of research at Children's Televison Workshop, which develops programs for television's Sesame Street, gave a sampling of the Workshop's research activities in this area.
Supported by pictures and videotapes of the younger generation in action, it was charmingly entertaining as well as informative. The underlying purpose of Children's Television Workshop is to gain analytical understanding of little kids – what motivates them, how they communicate, and how to interpret the data they provide. The researchers ask such questions as: "How do children learn?" "What combinations of stimuli will be effective in transmitting knowledge?" "How can one develop the right attitudes and feelings in children?" The Television Workshop test groups are comprised of children of varying age, including some as young as 3 to 4 years. The panels are relatively small and are conducted in informal settings. The kids play and interact under the scrutiny of trained observers whose task is not only to record what happens, but also to understand what it means.
One practical objective is to adjust the content of Sesame Street programs before they are aired. This is to assure that they will be liked by the prospective juvenile audience, that they will communicate knowledge, and that they will generate socially accept able attitudes. When the proposed content falls short, the research seeks ways to improve program effectiveness.
The Quantitative Approach
The second speaker was Beverley Kroll, president of Peryam & Kroll, who summarized the results of a research project. In a presentation supported by slides and charts, Kroll provided a comprehensive report of work conducted at the Peryam & Kroll facility. Citing the dearth of literature on the subject, she first summarized the conclusions of what little does exist:
- There is consensus that children can discriminate, particularly in regard to degree of liking.
- Children are able to show degree of preference.
- Children can provide useful information about products.
- Children require special handling, different from the procedures routinely employed with adults. One must pay attention to such things as gaining confidence, providing motivation, and communicating in language children understand.
The First Task – Variables
The first task of the study was to select the variables to be investigated. As a test product, the researchers settled on a sweetness difference in an orange drink, since one can reliably predict that children will like a sweeter drink, at least within the normal range. This proved to be true.
Difference in scale type was the main issue. Based upon preliminary work, three scales were selected:
- the standard, well-known hedonic scale, with the usual verbal categories;
- a scale devised by Peryam & Kroll (P & K) in which the labels of the categories are expressed in language judged more suitable to children; and
- a picture scale (face) in which the categories are represented by a series of "smiley" faces ranging from happy to sad.
Kroll noted the school of thought, which is bolstered by intuition, that longer scales tend to create confusion because there are lots of words to understand and choices to make. On the other hand, longer scales can be more discriminating. This factor was of enough importance to be included. Using the semi-traditional nine points as a reference, this variable was addressed in subdued fashion, using the same scale types but reduced to seven points by eliminating one "good" and one "bad" category from each.
Initial work had included children up to 13 years of age, but to get at the crucial problem, two age groups were defined, based upon assumptions about ability to handle verbal input: pre-literate, ages 5 to 7, where most can be expected to read little, if at all, and not understand big words; and the semi- literate, ages 8 to 10, where most can read at some level, but again, may or may not understand words such as "extremely" or "moderately.”
Most of the experiments employed what Kroll called the straightforward approach, where the successive categories were read one after another, always starting at the good end. Another approach was named "bifurcated." The subject was first induced into placing the stimulus into either the good/like or the bad/dislike category, then presented with successive phrases to scale the degree of liking or disliking. Most of the testing was one-on-one, i.e., an interviewer dealt with each child individually. Another side issue was comparing this kind of interviewing with self-administration, where the subject received instructions and responded using a printed form. This experiment excluded the face scale because the concern was with ability to read with understanding, and involved only the 8 to 10-year-old group.
A subject tried the pair of samples (high versus low sweet) twice, using a different scale for each pair, and made a preference choice after each pair. Most experiments, except for those on the mode of presentation variables, included all three scale types. The design required the scales to be used equally often, and appear equally often as the first or second pair. In all cases the high sweet and low sweet samples were served first or second equally often.
The attempt was made to have equal numbers of girls and boys in each of the age groups. This was not achieved exactly, but was close. With regard to data analysis, Kroll pointed out that some conclusions might be qualified because they are based upon "soft" data, namely, the comparison of levels of the significance of the difference between the high and low sweet drinks among scales, between age groups, and across other variables.
However, the significance levels themselves are "hard" data. For the paired comparison, the significance of the proportion of choice was determined by the z-test. For the scalar measures the significance of the difference between the average rating for the high sweet and low sweet drinks was determined using the t-by-difference test, which was appropriate since each subject had tried both samples.
The Real Question
After all the preliminaries came the real question. What, if anything, was uncovered? Are any of the conclusions definitive, settling certain points once and for all? At a somewhat less ambitious level, one could ask whether or not the results provide guidance either to ourselves or to others? Has there been progress in this nebulous area?
First, the overall results of the paired comparisons, which were always made after the pair of drinks had been presented and rated, showed a highly significant difference, partially due to the large number of judgements (1032). Of course, the higher sweet sample was preferred, which validated the product variable. Test order, that is, whether the first or second pair of the session, made no difference; the 8 to 10-year-olds were definitely more discriminating than the younger kids; scale type may have made a difference, although the evidence is border- line. Following the hedonic and P & K scales, discrimination was about the same as overall. But after the face scale, it dropped to the level of non-significance. All scales significantly discriminated at better than the 10% level.
The scale-length results tend to lay to rest the belief that kids need simplicity. The 9-point scales were equally as good, if not somewhat better, than the 7-point versions. Definitely, the 7-point scales were not better. Whether the 9-point scales were actually better for discrimination rests upon comparison of the 5% versus 1% levels of significance, but the 7-point scales offer no advantage.
Boys did slightly better than girls, although this was not consistent and probably not indicative of any meaningful trend. The older group (8 to 10-year-olds) showed good discrimination with both scale lengths, whereas the 5 to 7-year-olds showed significant discrimination only with the 9-point scales. On the basis of the supposition that the simpler scales should be easier for younger children, one might have expected the opposite result.
It is often noted in sequential monadic testing that there is better discrimination when only the second-served samples are considered. Here there was significant discrimination with the second-served samples for both scale lengths, but almost none with the first-served samples. Is this due to some kind of contrast? Is it a training effect, where the ratings of the second sample have the benefit of experience with the first?
One part of the study was designed to help answer the specific question of whether there is any advantage in using the two-stage, "bifurcated" approach. It was limited to the 9-point scales, and the Face scale was excluded. Overall, the bifurcated approach was inferior to the straightforward approach. Even for the 5 to 7-year-olds, the age group for whom the method was designed, the bifurcated scale was no better than the straightforward.
Kroll labeled the self-administration phase of the study as an afterthought. Limited in scope, it utilized only the hedonic and P & K scales, and excluded 5 to 7-year-olds for the obvious reason that they are pre-literate. The results were definite. The 8 to 10-year-olds showed that they can handle written questionnaires effectively. Overall, they showed significance at the 1% level, which is even a little better than the same age group had done when interviewed one-on-one. It was further noted that the effect of self-ad-ministration was more pronounced with the hedonic scale, whereas discrimination with the P & K scale was about the same with the two approaches.
This finding should bring cheer to sensory specialties. It makes things easier. If children of this age are sufficiently knowledgeable that "big words" do not defeat the purpose, why bother with tedious, one-on- one interviewing?
Summary
Here is a capsule summary of the study results reported by Kroll.
- The product variation was valid for the in- tended purpose.
- The P & K scale performed better than either the hedonic scale or face scale.
- Reducing scale length from nine points to seven points offers no advantage.
- The 5 to 7-year-olds do not perform any better with the face scale than with the verbal scales.
- The bifurcated approach does not discriminate as well as the straightforward method.
- Older children perform as well using written questionnaires as when interviewed one-on-one.
Kroll capped her presentation by citing factors which should be investigated, and urged the help of researchers both within Committee E-18 and elsewhere. These needs include:
(1) further exploration of scale length;
(2) the response to neutral or low-preference stimuli;
(3) effect of the many possible variations in pictorial scales; and
(4) dealing with very young children (under 5 years).


