Image by SIMON LEE
  • antoniodellomo

Why do we do experiments?

With today's article, I would like to talk about experiment design. With its roots in Psychology, experiments have always played an important role in Human-Computer Interaction (HCI). Designs benefit from data and that data often can come from experiments. From the very early days, experiments allow us to collect data to quantitatively analyze our designs and see if they perform as we hope they do. Also, we do experiments to understand the differences in performance. Those differences in HCI and IxD come in a couple of forms. How do humans interact with your design? How fast are they? How many errors do they make? All these can be captured in preferences.


Great user-experiences come from a deep and nuanced understanding of the people who will be using them. When experiments are used in HCI they tend to have a narrow scope and usually address specific aspects of human-computer interface design.


Three major consideration

In every experiment there are at least three major considerations:

  • Participant

  • Apparatus

  • Experiments Procedure

Let’s start with participants, how many do we need for our experiments? There is not a specific number but the more participants you have, the better because a wider number of participants give you more power, which is the ability to detect differences. Also, always ask yourself who are they? Where do they coming from? How do they become part of our study? All these questions are related to sampling.


Sampling and sampling theory is a deep topic that relates to how we select subjects for our study from a larger population about which we want to draw inferences. There are many kinds of sampling, but they fall generally into two categories:

  • Probability sampling: Technique in which the researcher chooses samples from a larger population using a random selection.

  • Non-probability sampling: Does not depend on randomness as much, but uses other approaches.

It’s common in design studies and HCI to use non-probability sampling techniques because are a more valuable and practical method for researchers deploying surveys in the real world. Some types of non-probability sampling methods are convenience sampling, snowball sampling, and quota sampling.


The second consideration is apparatus, that is the technical equipment, machinery, space and, other resources needed for the study. Are we running the experiment in a lab? a studio? Do I need to build anything or have built something for people to test and try? And how data will be captured? Often, if we are doing computational artifacts we can write log files. The computer or device can write log files directly based on what the user is doing with the device, and we analyze those files later. We might have just direct human-observation where we make notes or record things based on what we see. But with COVID-19, research sessions have become fully remote. We can no longer meet participants in person. But I will talk about this another day, for now, let’s just stick to our major considerations for experiments design.


We talked about the people involved in the studies, the equipment, and other resources to run the study so, so what is next? Procedures. What do participants actually go through as they come into the study? How do they perform tasks? How many? And most importantly, for how long? Remember, it’s hard to keep people in a study for more than about an hour. If they get tired, you will introduce fatigue-effects into your study. Also, remember that informed consent is very important. AT the start of the study you must tell them what are they going through, what is the purpose of the experiment and what they can expect during their time with you. Make sure the space you run a study in is accessible particularly to people for example in a wheelchair, or who may be blind, or deaf, or have other impairments.

At the end of the study, you want to make sure to debrief your participants. Tell them what this all was about, give them more insight into how participants have performed, and compliment them for their time and effort.


What to keep in mind when running an experiment

When running an experiment, you need to create a hypothesis, assign your participants and measure user behavior. Well designed experimental studies usually have a clear hypothesis, which predicts the expected performance effects associated with the experiments and concludes with a statistical analysis of the data collected.


Create your hypothesis

Experimental design begins with a hypothesis — a guess about causation. This is what we’re going to test in our experiment. To be considered testable, there must be a possibility to prove that the hypothesis is true, false and, reproducible. Without these criteria, the hypothesis and the results will be vague. As a result, the experiment will not prove or disprove anything significant.


Assign your participants

Next, we must assign our test participants to the test itself. For example, if you want to compare several user interfaces in a single study there are two ways of assigning your test participants to these multiple conditions:

  • Between-subjects (or between-groups) study design: different people test each condition so that each person is only exposed to a single user interface.

  • Within-subjects (or repeated-measures) study design: the same person tests all the conditions (i.e., all the user interfaces).

Any type of user research that involves more than a single test condition has to determine whether to be between-subjects or within-subjects. However, the distinction is particularly important for quantitative studies. In a within-subjects design, the participants act as their own controls. This makes it a more efficient test. The consequence is that you are more likely to detect small effects by using a within-subjects design, and you will likely need fewer test participants than you would if you used a between-subjects design.


Measure user behavior

To analyze user behavior, you need to set up various user metrics to measure usability and intuitive design. There are countless user metrics you can monitor and analyze but generally fall into four main categories: Latency, frequency, duration, and intensity.


Wherever possible collect quantitative data in the form of ratio or interval data as this will allow carrying out powerful statistical analyses. If an experiment is found to be lacking in validity after it has been conducted then there is a problem. The data cannot be relied on.