experimentdesign | UI/UX Design 2/3

TRANSITION INTO UX

💡🗞️The blog has over 150 resources, and its goal is to support anyone transitioning into a UX Design career 👨🏻‍💻 I want that my blog is accessible to anyone in the world, regardless of who you are and where you came from.

I am constantly updating my blog, doing all the legwork for you!

Nowadays we all need a hand! 💪🏽

Scroll Down

antoniodellomo
- Dec 18, 2020
- 3 min read

Critical review of experimental procedure and results

When critically reviewing an experiment it is important to consider whether the experiment achieved what it set out to do and the applicability of these findings to the design. This can be assessed by examining the experimental procedure and the results obtained. To gain an understanding of an experimental procedure it is useful to view it from the subject's perspective. This can be a valuable exercise, both in planning experiments and for reviewing reports of other people's experiments.

There are four key issues to consider:

User preparation whether the instructions given to the users were adequate and whether the amount of practice allowed before starting the experiments was sufficient.
Impact of variables what the changes in the independent variables mean to the users as they undertake the experimental tasks.
Structure of the tasks whether the tasks were complex enough to allow the use of the interface facilities (or at least the ones of interest) and whether the users understood the aims of the tasks.
Time taken whether the length of the task sequence produced any fatigue or boredom in the users.

While an experiment can be very well designed at the technical level, these practical issues can have marked effects on the results. For example, an experiment that involves complex tasks and has few practice tasks may be predisposed to large error scores and poor user performance; an experiment that uses very long sequences of tasks may produce fatigue in the users. These kinds of pitfall can be avoided by carrying out small pilot studies before running the experiment on a larger scale. Although this practice might require more preparation time, it can help to avoid cumbersome or unnecessary data collection and analysis, so saving time and money and avoiding frustration in the longer term.

Experimental results need to be critically reviewed in order to establish exactly what has been found out, how useful this is, and whether it is of practical as well as theoretical significance.

There are four main points to consider:

Size of effect the absolute size of the differences found in the dependent variables is important in assessing the results. For example, performance differences of perhaps a few seconds may be statistically significant, but from a practical point of view, they may have little impact when the interface is used in a normal working environment where there are all kinds of distractions and interruptions.
Alternative interpretations experimental results are interpreted as arising from the manipulation of the independent variables. It is useful to consider whether there are any alternative interpretations of the results, perhaps based on other variables which may not have been controlled in the experiment. For example, this could be done by examining the effects of insufficient practice on complex task performance.
Consistency between dependent variables when several dependant variables are used in one experiment the relationship between them should be studied. In some cases, there may be inconsistency across variables. For example, task completion rates and error scores may indicate that one interface is better than another but user preferences and learning scores may show the reverse. Such inconsistencies indicate that the situation is complex and further experiments may be needed.
Generalization of results depending on the nature of an experiments its results may not generalize to other tasks, users, or working environments. For example, the results obtained from an experiment using one multimedia learning system may not be applicable to another. It is dangerous to over-generalize experimental results, particularly when the results are given the status of "guidelines".

Understanding and being able to apply statistical tests to validate experimental findings is important.

Photo by Tobias van Schneider on Unsplash

antoniodellomo
- Nov 4, 2020
- 4 min read

Designing Experiments

Today, I would like to share an assignment from the Interaction Design specialization’s second course “Design Principles” on Coursera. But before diving into the project, I want to emphasize the importance of MOOCs (Massive Open Online Courses). With the rapid pace of technological changes and the current job market, constant learning is the most pressing need of the day. And MOOCs are adequately equipped to address and serve it.

Said so, let's have the project overview: In this assignment, you will be giving advice to a team that is creating a new website. The design team has created the first prototype of a website. Before fully implemented, they wanted to be able to get usability feedback on this early prototype. They decide to bring several participants into the lab.

1. The team asks you whether they should videotape these sessions. What is your recommendation and why? “Yes, I think is really helpful to capture all the feedback during the session without getting distractions from taking notes. Videotaping is a great way to go back and review the session as many times as needed. Good start.”

2. The first participant arrives. The facilitator briefs them by explaining, “I’d like to show you a new design that I’ve created. I’d like to see how well you perform with this design.” What are two problems with this introduction? “On the contrary that doesn’t sound like a great introduction because by saying that the facilitator exposes the participant to pressure. In my opinion, this is the main issue.”

Since they are discussing a newly created product, the participant can feel obliged to say positive things about it, people generally don’t want to say bad things to a person's face especially regards a new design.
If a new product doesn’t perform well participants will feel like it’s their fault and not due to some technical issue.

3. Rewrite this introduction to avoid those two problems. “Thank you for participating! Today I am pleased to show you this design. We really hope we can make our product better for people like you. There are no right or wrong answers to any of the questions I'm asking in this study - we're simply interested in understanding the way people interact with the product. I will ask you to show me how you do things using the product and ask you questions to better understand what you do. We will record a little video of you so that I can go back and review things later and make sure we get everything right. We won't use your name in connection with the recordings or the results. The videotapes will only be used internally and never shared anywhere with anyone. We look forward to hearing your stories and experiences. Thanks for your time.”

4. The experimenter continues, “I’d like to get a sense of what you’re thinking about as you go through this site. As you go through the following tasks, please think aloud. Whatever’s on your mind, share it vocally.” If the experimenter uses this think-aloud protocol, what should they not do? “Thinking aloud may seem unnatural and distracting to some participants since it may be very unusual for them because under normal circumstances they rarely comment on what they do.”

5. Usability feedback in hand, the development team creates two alternative home pages for the course. They want to see which one encourages more users to sign up. If they compare these two alternatives in a controlled experiment, what is the null hypothesis? “When there is no difference in sign up rate/numbers between the two alternative home pages.”

6. One team member suggests that all participants first see one design, and then all participants see the other design. What is the problem with this approach? “In my opinion fatigue is a potential drawback of using a within-subject design. Participants may become exhausted, bored, or simply uninterested after taking part in multiple treatments or tests.”

7. The team agrees with you. The developers propose a between-subjects design. Participants who sign up in the AM will be assigned the first condition, those that sign up in the PM will be assigned to the second. What is the problem with this approach? “There can be disadvantages also between-subjects designs because they can be complex and often require a large number of participants to generate any useful and analyzable data.”

8. What would you propose that the development team do instead? “The issue is that the above examples depend on participant habits/actions, therefore a good way to avoid that can be a random assignment. Randomized controlled trials are often the most effective and successful.”

9. One hundred participants are exposed to each condition. In Design A 36 participants sign up. In Design B 24 participants sign up. To help you get started, the expected sign-up rate is 30% ((36+24)/200). You can also refer to critical values for the chi-squared variable here. “Df = 1 as there are only 2 outcomes, sign up or don’t sign up The expected sign up rate is 30%, so the expected none sign up rate is 70% The x-value = (36-30)2 /30 + (24-30)2/30 + (64-70)2/70 + (76-70)2/70 = 3.42857 So the p-value = 0.06 approx based on the chi-square table Since the probability of this result happening is 6%, which is more than a 5% significance level, we can accept (cannot reject) the null hypothesis that there is no difference in sign-up rate between the two home pages.”

10. Imagine that instead, 50 participants were exposed to each condition. In Design A, 18 sign up; in Design B, 12 sign up. The sign-up ratios are the same as in the previous question. Would the p-value increase, decrease, or stay the same, and why? “The p-value would increase as the sample size decreases.”

Photo by David Travis on Unsplash

antoniodellomo
- Oct 30, 2020
- 4 min read

Why do we do experiments?

With today's article, I would like to talk about experiment design. With its roots in Psychology, experiments have always played an important role in Human-Computer Interaction (HCI). Designs benefit from data and that data often can come from experiments. From the very early days, experiments allow us to collect data to quantitatively analyze our designs and see if they perform as we hope they do. Also, we do experiments to understand the differences in performance. Those differences in HCI and IxD come in a couple of forms. How do humans interact with your design? How fast are they? How many errors do they make? All these can be captured in preferences.

Great user-experiences come from a deep and nuanced understanding of the people who will be using them. When experiments are used in HCI they tend to have a narrow scope and usually address specific aspects of human-computer interface design.

Three major consideration

In every experiment there are at least three major considerations:

Participant
Apparatus
Experiments Procedure

Let’s start with participants, how many do we need for our experiments? There is not a specific number but the more participants you have, the better because a wider number of participants give you more power, which is the ability to detect differences. Also, always ask yourself who are they? Where do they coming from? How do they become part of our study? All these questions are related to sampling.

Sampling and sampling theory is a deep topic that relates to how we select subjects for our study from a larger population about which we want to draw inferences. There are many kinds of sampling, but they fall generally into two categories:

Probability sampling: Technique in which the researcher chooses samples from a larger population using a random selection.
Non-probability sampling: Does not depend on randomness as much, but uses other approaches.

It’s common in design studies and HCI to use non-probability sampling techniques because are a more valuable and practical method for researchers deploying surveys in the real world. Some types of non-probability sampling methods are convenience sampling, snowball sampling, and quota sampling.

The second consideration is apparatus, that is the technical equipment, machinery, space and, other resources needed for the study. Are we running the experiment in a lab? a studio? Do I need to build anything or have built something for people to test and try? And how data will be captured? Often, if we are doing computational artifacts we can write log files. The computer or device can write log files directly based on what the user is doing with the device, and we analyze those files later. We might have just direct human-observation where we make notes or record things based on what we see. But with COVID-19, research sessions have become fully remote. We can no longer meet participants in person. But I will talk about this another day, for now, let’s just stick to our major considerations for experiments design.

We talked about the people involved in the studies, the equipment, and other resources to run the study so, so what is next? Procedures. What do participants actually go through as they come into the study? How do they perform tasks? How many? And most importantly, for how long? Remember, it’s hard to keep people in a study for more than about an hour. If they get tired, you will introduce fatigue-effects into your study. Also, remember that informed consent is very important. AT the start of the study you must tell them what are they going through, what is the purpose of the experiment and what they can expect during their time with you. Make sure the space you run a study in is accessible particularly to people for example in a wheelchair, or who may be blind, or deaf, or have other impairments.

At the end of the study, you want to make sure to debrief your participants. Tell them what this all was about, give them more insight into how participants have performed, and compliment them for their time and effort.

What to keep in mind when running an experiment

When running an experiment, you need to create a hypothesis, assign your participants and measure user behavior. Well designed experimental studies usually have a clear hypothesis, which predicts the expected performance effects associated with the experiments and concludes with a statistical analysis of the data collected.

Create your hypothesis

Experimental design begins with a hypothesis — a guess about causation. This is what we’re going to test in our experiment. To be considered testable, there must be a possibility to prove that the hypothesis is true, false and, reproducible. Without these criteria, the hypothesis and the results will be vague. As a result, the experiment will not prove or disprove anything significant.

Assign your participants

Next, we must assign our test participants to the test itself. For example, if you want to compare several user interfaces in a single study there are two ways of assigning your test participants to these multiple conditions:

Between-subjects (or between-groups) study design: different people test each condition so that each person is only exposed to a single user interface.
Within-subjects (or repeated-measures) study design: the same person tests all the conditions (i.e., all the user interfaces).

Any type of user research that involves more than a single test condition has to determine whether to be between-subjects or within-subjects. However, the distinction is particularly important for quantitative studies. In a within-subjects design, the participants act as their own controls. This makes it a more efficient test. The consequence is that you are more likely to detect small effects by using a within-subjects design, and you will likely need fewer test participants than you would if you used a between-subjects design.

Measure user behavior

To analyze user behavior, you need to set up various user metrics to measure usability and intuitive design. There are countless user metrics you can monitor and analyze but generally fall into four main categories: Latency, frequency, duration, and intensity.

Wherever possible collect quantitative data in the form of ratio or interval data as this will allow carrying out powerful statistical analyses. If an experiment is found to be lacking in validity after it has been conducted then there is a problem. The data cannot be relied on.