![]() |
Evaluating Rating Scales for Sensory Testing with ChildrenSensory testing with children is becoming increasingly important to the food industry, but little research on appropriate methodology has been conductedBeverley J. Kroll |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Testing with children is in an embryonic stage. Over the years, a few sensory researchers have considered the problems involved in applying their science to this special population, but for the most part the field has been static. The need for serious investigation is pointed up by how little research has been done in this area. As a way of focusing on the specific needs for this kind of research, a thumbnail sketch of certain key questions the literature considers is presented in the box on p. 80. One thing is very noticeable not only in the literature, but also in word-of-mouth, unpublished material about children's testing. The methods used have been intuitive, even granted that the investigator may have had a rationale. Once a method has been selected, there has been no serious investigation of possible alternatives. It is as if the researchers said, "We planned this, we tried it, it seemed to work, and there was no time to bother with what might have worked better.” We therefore undertook a basic research project designed to help establish a solid foundation for future investigations. This article describes the procedures, analysis, and conclusions of research intended to evaluate the relative merit of rating scales that might be used when testing with children. In this study, we used two methods of questioning – one-on-one interviewing (Fig. 1) and self-administered questionnaire (Fig. 2) – and three types of rating scale (Fig. 3). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Fig. 1 -- Children Ages 5-7 and 8-10 were tested using one-on-one interviews |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Fig. 2-- Children Ages 8-10 were also tested using self-administered questionnaires in standard sensory testing booths |
![]() Fig. 3-- Three Types of Rating Scale Were Used: the traditional hedonic scale, the P&K scale developed for this study, and the typical face scale. After testing, scale values of 1 to 9 were assigned (starting with 1 at the top) for the purposes of analysis |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Variables Selected | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A great many variables could be considered. Hence, it was necessary to be selective and try to choose the more important ones. It was imperative that the study include a picture scale. Testing with children is overrun with picture scales, the rationale being that younger people may not understand words and phrases but can more accurately deal with facial expressions. Besides, pictures are entertaining and should inspire closer attention to the task. There are many such caricature scales around, but all have the same general characteristics, representing degrees of pleasantness ranging from high to low. The question is how well successive pictures communicate the basic idea. Some preliminary work was done with a scale from an earlier published study, which used the Snoopy cartoon character, but the results were disappointing. Scales using children's faces with variations in degree of detail were also tried. Eventually a series of simplified people faces was selected as probably best and certainly representative.
Certainly, this factor was of enough importance to be included in the study. Starting with the frequently used 9 points, how far down should one go? To 7 points? 5? 3? Or even to just 2 points, which would be paired comparison? The study addressed this variable in subdued fashion by trying 7 points, using the same three scale types as before but eliminating one good category and one bad category from each scale.
Another approach sometimes used by investigators is what may be called "bifurcated" – the interviewer first asks the subject to place the stimulus into either the good/ like or the bad/dislike category, then tries to get the child to scale degree of like or dislike by presenting the successive categories. The categories were presented starting in the middle and proceeding to the ends. This seemed logical, but that could be open to debate. If the subject failed to make a choice in response to the initial question, the result was recorded as "maybe good/maybe bad" or "neither like nor dislike" (but was not read to the subject). This phase of testing included only the hedonic and P&K scales because the face scale is inappropriate to this approach. The question of which was the better procedure – the bifurcated or the straightforward – was addressed in a side experiment. Another side issue that seemed worth testing was one-on-one interviewing vs a self-administered questionnaire. This experiment used the 9-point hedonic scale and P&K scales and involved only children 8 – 10 years old, i.e., the semiliterate group. Again, the face scale was excluded because the concern was mainly with ability to read with understanding. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Testing Procedure The test subjects were prerecruited from families on our extensive roster of consumer panelists. Usually, the computer knows which families have children and their ages. All had to like orange drinks, which was no problem. Otherwise the only concern was age, sex, and availability to fit into the schedule. An important proviso was that no child should be invited to participate in more than one test, which would raise questions about training effect. In all cases, a subject tried the pair of samples, high sweet vs low sweet, twice, using a different scale for each pair, then made a paired-comparison choice after each pair. Except for those on the mode of presentation, the experiments included all three scale types – hedonic, P&K, and face. The design required that the scales be used equally often and appear equally often as the first or second pair. Furthermore, for each scale type the high-sweet and low- sweet samples were served first or second equally often. Sex differences did not seem important in the context of this investigation, but our recruiters attempted to have equal numbers of girls and boys in each of the age groups. This was not achieved exactly, but it was close. They also tried to get an even distribution of ages within each age group. Again, this was not exact but was very close. The drinks were prepared in quantity ahead of time, chilled to refrigerator temperature, and held at that temperature throughout testing. They were poured just before serving. A sample as served was about 1% oz of drink in a small plastic glass. The samples were identified by code number, but only for the convenience of the operators and to avoid errors. If a subject even saw the codes, it was accidental. All interviewing was conducted one-on-one, except for the sessions using the regular written questionnaires. The interviewers were carefully briefed on the protocol to be followed for each variation. The interviewer met the subject and parent in a reception area. Leaving the parent there, the interviewer took the child to the testing area while chatting in a friendly manner to establish rapport and relieve possible tension. The test itself was not discussed except in a very general way. In the test room, the child was seated at a table across from the interviewer (Fig. 1) and told that he or she would get some samples of orange drink and would be asked questions about them. The first sample was brought and the child invited to try it. When the child was finished, the interviewer began the questioning procedure according to the set protocol. After a rating was made, the child was told to drink some water while the interviewer got the next sample. The waiting period was about 2 minutes. The second sample of the pair was then tried and rated. This was followed by the question, "Which did you like better, the first sample you tried or the second one?” Then the child was told there were more drinks to be tried and had a drink of water while waiting another 2 minutes. The second pair was handled like the first, and the child was escorted back to his or her parent. The whole sequence took about 10 minutes. Analyses There is a qualification to note here. Some findings, in the sense of the objectives of the research, rely on what may be called soft data; however, they were derived from hard data.
Soft Data. The tables of results show significance levels ranging from 1% to 15%. These figures were compared among scales, between age groups, between test orders, between orders of serving, and so on. How legitimate, or how useful, is this approach? There is no routine, accepted statistical procedure for determining whether one level of significance is or is not significantly different from another. Perhaps a method for this purpose could be devised, but its possible utilization has not been explored. An example of the questions to be resolved would be, how much more important is the 1 % level than the 2% level? Probably not very important, since both are near certainty. But one is easily convinced that the 1% level shows more discrimination than the 10% level. These are the kinds of decisions that served as the basis for most of the conclusions in this study. Results What, if anything, was discovered in this study? Are any conclusions definitive, settling certain points once and for all? Not likely! But there are results that can direct future research on the subject.
Overall, there was a highly significant difference – well below the 0.1% level – which was due in part to the large number of subjects (N). As expected, the high-sweet sample was preferred, which validated the product variable. Other conclusions come from comparing different subgroups. Test order, whether the first or second pair of the session, made no difference. There was no difference in discrimination between boys and girls. Children 8 – 10 years old were definitely more discriminating than the younger kids, who failed to establish a significant difference. Their failure might have been due to interference by the scaling task. The difference between ages might have been expected. Scale type may also have made a difference, although evidence is borderline. When the comparison was made after the hedonic and P&K scales, discrimination was about the same as overall; but when it was made after the face scale, it dropped to the level of nonsignificance. This might be a chance effect, or there may be something about the face scale which later interfered with the paired comparison.
With the 9-point scales, all subgroups showed significant discrimination, granted that at one point it dropped to a questionable 15% level; whereas with the 7-point scales, three subgroups showed nonsignificance. The boys did slightly better than the girls, although this was not consistent. It is probably trivial, and not indicative of any meaningful trend. This result is definite and hardly unexpected. The children 8 – 10 years old showed good discrimination with both scale lengths, whereas the children 5 – 7 years old showed significant discrimination only with the 9-point scales, completely failing the task with the shorter version. On the basis of the supposition that the simpler scales should be easier for younger children, one might have expected this to be the other way around. It is often noted in sequential monadic testing that there is better discrimination when only the second-served samples are considered. In this study, there was significant discrimination with the second-served samples for both scale lengths, but almost none with the first-served samples. Is this due to some kind of contrast? Is it a training effect, where the ratings of the second sample have the benefit of experience with the first? This research could not address such questions in all of their complexity. Besides, such effects pertain to all testing, not just when children are concerned.
In a way, Table 3 is repetitive, exhibiting effects shown in the other tables, but now separately for each scale type. However, it may add further emphasis to the following conclusions: The P&K scale gave better overall discrimination; older children showed better discrimination with all scales; and no scale discriminated when just the first-served samples were considered, but the P&K and face scales did with the second-served samples. The second pair of drinks tested was consistently better for discrimination than the first pair, no matter the scale type. Does this mean that there is a learning effect, even from the brief first exposure to the task? If so, it is both bad news and good news. The bad news is that one does not have a pure measure. But who believes that is possible anyway? The good news is that kids quickly learn to do a good job, and that the testing of multiple pairs is acceptable.
Overall, the bifurcated approach seems to offer no advantage over the straightforward. Even for the children 5 – 7 years old – the age group for whom the method was designed – the bifurcated scale was little better than the straightforward approach. The self-administration phase of the study was an embellishment done as an afterthought. It was limited in scope, utilizing only the hedonic and P&K scales, and excluding children 5 – 7 years old for the obvious reason that they are preliterate. The results (Table 5) showed that children 8 – 10 years old can handle written questionnaires effectively. Overall, the results were significant at the 1% level.
Although not shown in the table, the effect of self-administration was more pronounced with the hedonic scale, whereas discrimination with the P&K scale was about the same with both approaches (one-on-one interviewing and self-administration). This finding should cheer sensory specialists. It makes things easier. If children of this age are sufficiently knowledgeable that big words do not defeat the purpose, why bother with expensive one-on-one interviewing? Further Studies Needed The results of this study can be summarized as follows: The P&K scale performs better than the hedonic or face scale. Reducing scale length from 9 points to 7 offers no advantage. Children 5 – 7 years old do not perform any better with the face scale than with the other two scales. The bifurcated approach does not discriminate as well as the straightforward method. And older children perform as well using written questionnaires as when interviewed one-on-one. The study, as noted earlier, was not intended to be the be all and end all. Rather, it was intended as a foundation for further studies. A review of variables will show that many need further attention. While there are problems involved, there is a great deal to be obtained. References --Birch,
L.L. 1979. Dimensions of preschool children's food preferences.
J. Nutr. Educ. 2(2): 77. Based on a paper presented at the Spring Meeting of ASTM, San Francisco, Calif, May 24, 1990. – Edited by Neil H. Mermelstein, Senior Associate Editor Reprinted
from Food Technology 44(11) 78-80, 82, 84, & 86
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





