Fundamentals of doing experiments
With its roots in Psychology, experiments have always played an important role in Human-Computer Interaction (HCI). Designs benefit from data and that data often can come from experiments. From the very early days, experiments allow us to collect data to quantitatively analyze our designs and see if they perform as we hope they do. Considerable skills and scientific knowledge are necessary to do well-designed experiments. Good knowledge of statistics is also important.
As in traditional experiments, measurement is important. Unlike traditional behavioural research, however, controlling all the variables except those of direct interest is generally not possible. If you have a background in pure science you will need to get used to the more applied approaches that dominate HCI.
Why do we do experiments?
We do experiments to understand the differences in performance. Those differences in HCI and IxD come in a couple of forms. How do humans interact with your design? How fast are they? How many errors do they make? All these can be captured in preferences. Great user-experiences come from a deep and nuanced understanding of the people who will be using them. When experiments are used in HCI they tend to have a narrow scope and usually address specific aspects of human-computer interface design.
Fundamentals of doing experiments
When you plan an experiment you need to think about three aspects:
The purpose of the experiment - what is being changed, what is being kept constant, and what is being measured,
A hypothesis, which needs to be stated in a way that can be tested,
What statistical tests you will apply to the data that you collect and why.
For example, an evaluation of relative user efficiency in using function keys or menus in an industrial process control system would have to be stated in terms of the elements to be compared (that is, function kets versus menus), the constant features of the testing situation (such as the experience of the process operators and their control tasks) and the measures of user performance being studied (such as command execution speed or error rates). In this situation, one possible hypothesis is that function keys are more efficient, that is, they produce faster command execution. Groups of users would be given the two interfaces, and the statistical significance of the difference in command execution times and error rates would be determined.
A significant result tells us that is a statistically significant difference between things we are comparing. A non-significant result doesn’t tell us that there is no difference between things we are comparing, but it tells us that there was not a detectable difference with the data that we had. Whether there is a real difference depends only on the data we can analyze. Finally, a non-significant result may just mean that we didn’t have enough data to detect a difference. The result is not detectably different.
As we mentioned above, variables are an important part of an eye-tracking experiment. A variable is anything that can change or be changed. In other words, it is any factor that can be manipulated, controlled for, or measured in an experiment. The independent variable (IV) is the variable that the experimenter manipulates. The variable that is dependent upon it is the dependent variable (DV). So the independent variable should always remain uninfluenced by the dependent variable, whereas the dependent variable is expected to be influenced by the independent variable.
You will need to select your subjects carefully so that you avoid biases. For example, if you were doing an experiment to determine which out of four sets of icon designs were recognized fastest by children between 7 and 10 you would need to select children that covered the total age range so that there was no age bias, equal numbers of boys and girls to avoid gender bias and children with the same level of experience of using computers and whose academic records were similar.
The way that the experiment is designed is important, too, and there are three well-known experimental designs: independent, matched subject, and repeated-measures designs.
Independent subject design: In this condition, a group of subjects is obtained for the experiment as a whole and then subjects are allocated randomly to one of, say, two experimental conditions.
Matched subject design: In this condition, subjects are matched in pairs (often a male and a female so that any bias resulting from gender can be eliminated), and then the pairs are allocated randomly to the two conditions.
Repeated measures design: In this design, all the subjects appear in both experimental conditions, so halving the number of subjects needed in comparison with the other two designs. Although there are no problems of subject allocation with this design, there are problems to do with the order in which subjects do the task. For example, might learn on the first task influence performance on the second?
A fourth type of design is single-subject design, in which in-depth experiments are performed with just one subject. While this may be essential for reasons of scale in some studies, there can be problems. It may not be possible to validate the results statistically and any learning that occurs may bias the results. Any change in the dependent variable that is caused by a change in the independent variable is called an experimental effect. However, there may also be changes in the dependent variable that are brought about by variables other than the independent variable. As we have already said, in a repeated measures design it is possible that order will have an effect on subjects' performance. When such things happen the experimental effect is said to be confounded. The effect of order can be reduced by dividing the subjects in the repeated measure design into two groups. One group then does task A followed by task B and the other group do task B followed by A.