Mel Slater's Presence Blog

Thoughts about research and radical new applications of virtual reality - a place to write freely without the constraints of academic publishing,and have some fun.

26 April, 2009

The Ethics of Virtual Milgram at the Royal Society

Emotions in Man and Machine
On 21st April in London I spoke at a Royal Society Discussion Meeting The Computation of Emotions in Man and Machines. The overall meeting was excellent and if you look at that web page you will see that there will be a webcast archive of all the talks, and also proceedings in a future publication of Philosophical Transactions B. About 300 people were in attendance and the meeting was oversubscribed. I will put a version of my paper online and talk about its contents later. For now I want to concentrate on another issue.
During the talk in order illustrate one point I briefly went through the ‘virtual reprise of the Stanley Milgram obedience experiment’ that I talked about in this blog on 28th December 2006 ‘Obedience in Plaça Espanya’.

In the discussion after the talk there were two people in the audience who raised the issue of ethics. I know that mentioning Stanley Milgram’s obedience experiment is like raising a red rag to a bull for some psychologists, and I agree that the original experiments in the 1960s were problematic. However, I also urge people interested in this issue to read Stanley Milgram’s own forceful discussion of this issue in his book: Obedience to Authority: An Experimental View.

Also the argument rages today about the ethics of the original experiment. See for example, ‘Milgram, Method and Morality’ by Charles R. Pigden and Grant R. Gillet in the Journal of Applied Philosophy 13 (3) 233 – 250, which is a response to a recent partial replication of the obedience experiment by Jerry M. Burger.

Readers should check the British Psychological Association’s Code of Ethics and Conduct, March 2006 ‘3.3 Standard of Protection of Research Participants’, and also the information sheet and consent form given to our participants is available as are the answers by some participants to a letter that was sent 6 months after the actual experiment.

The Power of Virtual Reality
First, why did we do the experiment? The answer actually was not to probe Milgram’s original question of obedience to authority. In the original 1960s experiment the experimenter deliberately used Authority of a Professor at a prestigious institution to attempt to persuade participants to carry out actions that would have normally been against their own moral principles (causing harm to a stranger, and continuing to do so in spite of that stranger’s strong protestations to the contrary). In our virtual reprise the experimenter did not at any time attempt to persuade the participants to continue against their own inclinations, in fact they were told in writing and verbally several times before the study began that they could withdraw at any time without giving reasons. What we were interested in was whether causing ‘harm’ to an entirely virtual character who protested about the shocks ‘she’ was getting would cause people anxiety so that they would want to stop. In other words in spite of knowing for sure that nothing ‘real’ was happening, would people still find the experience unpleasant, and would they still want to stop even knowing that it was virtual? Moreover, if they found it unpleasant and yet did not stop, why would they continue?

Why are these questions interesting? The fundamental answer is that we want to explore the power of virtual reality to simulate situations in reality and cause responses in people that are similar to those of real life. By ‘responses’ here we mean mainly those automatic responses that occur in spite of the person’s full knowledge that the situation is not real. For example, in an earlier study we had put people in front of audiences of virtual characters, who behaved either very negatively towards them or very positively – and the interesting thing is that although everyone knew that there was no audience there, they still responded with anxiety to the negative audience, and with a kind of joy to the positive audience .

Now Milgram’s original question remains: why is it that people can be persuaded to carry out atrocious acts at the behest of authority, acts that are against their own moral principles? We see examples of this every day in the news. This issue is something that is really worth studying, something that is as urgent today as it was in the 1960s, in the 1930s and 1940s, and probably any previous time in history. However, it is very difficult to study – the ethical concerns raised by Milgram’s original experiment stand: we cannot allow people to believe that they really are causing harm to another person in order to see how they react. On the other hand having people watch videos and asking how they would react, or even having them imagine the situation and their likely responses simply isn’t sufficient for scientific study – no one knows how they would react in such situations.

Realistic Responses
I believe that our research has shown many times that in virtual reality, and under the right conditions, that people do tend to respond realistically to what they experience (actually this was the main subject of my talk at the Royal Society). However, their knowledge that what is happening is not real tends to dampen down their responses. So their base level responses (physiological responses, feelings, emotions, automatic thoughts) are genuine, but ultimately they can use their knowledge of the situation to control their overt behaviour. For example, in the virtual reprise, when people were asked why they continued in spite of wanting to withdraw, they would invariably say something like ‘… because I kept reminding myself that it wasn’t real’. In any event I believe that what we have shown in this work is that virtual reality can be used effectively to study how people respond in extreme situations, how, for example, the terrible events with which we are only too familiar can be caused by people who in ordinary circumstances would be horrified about such things.

I do not accept the argument that causing some stress to participants in an experiment is not ethical. These are adults, who freely agree to participate in the study, and who are told that they are free to withdraw at any time, and even warned that they may experience stress. If they decide to continue in spite of experiencing stress that is their choice, they are under no obligation to continue. People voluntarily choose to engage in activities that are far more stressful than anything we have ever subjected them to in virtual reality – watching horror movies, doing dangerous sports, even simply attending a football match might be a highly stressful activity. Don’t forget – these are adults who are responsible for their own actions, and provided that they are not tricked or deceived into entering a situation that might cause them difficulties without forewarning, it is up to them to participate or not. Of course there are limits, and a major ethical consideration is to weigh up the benefits of the research in terms of knowledge gained balanced against any negative aspects of the experiment.

The other issue is ‘desensitisation’ – by participating in this experiment could it make participants more likely to actually carry out cruel acts in real life? This was suggested by one of the audience members at the Royal Society talk. Actually the question is an empirical one – does involvement in violent virtual scenarios result in greater aggressive behaviour in real life? This is an issue much studied with respect to violent video games, and the jury is still out. See, for example, the paper by Christopher John Ferguson, The Good, he Bad and the Ugly: A Meta-analytic Review of Positive and Negative Effects of Violent Video Games. Of course, the virtual Milgram study presented nothing like the kind of violence one can inflict in video games. On the other hand one could argue equally well that having experienced a virtual reality scenario where you found yourself carrying out an act that causes stress and unpleasant feelings to yourself, that in the future you might not want to do that again, and especially would be forewarned about somehow getting trapped to do this sort of thing in reality. Hence such an experience might open the door to self reflection and minimise the chance for later aggressive behaviour. But, as I said, this is an empirical question, not one that can be settled by argument or simple introspection.

Therefore, the following clause from the BSA’s code of ethics comes into force:

‘Obtain the considered and non-subjective approval of independent advisors whenever concluding that harm, unusual discomfort, or other negative consequences may follow from research, and obtain supplemental informed consent from research participants specific to such issues.’

Our complete experimental design was submitted to our University’s Research Ethics Committee, and was discussed as a full application (i.e., discussed by the Committee and not subject to Chair’s action). It was deemed an appropriate experiment. I would suggest that the person in the Royal Society audience who claimed in public that the ethics committee was wrong, who had the temerity to claim this after listening to my five minute discussion of the experiment, without knowing anything whatsoever about its details, nor about the deliberations of the committee, and presumably without ever having read the paper – that this was an example of ‘indignation’ that has no place in a scientific meeting – least of all in a place like the Royal Society.

05 April, 2009

Transcending Reality (with a diversion on Statistics)

My Senior ERC grant called TRAVERSE started 1st April 2009. TRAVERSE stands for 'Transcending Reality – Activating Virtual Environment Responses through Sensory Enrichment'.

What is it about?

I use the term ‘transcending reality’ (TR) in two ways as a noun phrase and a verb phrase. A ‘transcending reality’ is one that replaces physical reality by a virtual reality, such that you respond to the virtual reality as if it were real. However, to ‘transcend reality’ is to go beyond the boundaries of physical constraints, when the virtual reality gives you the strong illusion that you've gone beyond the boundaries of physical reality. In these ‘non realistic’ applications of virtual reality I nevertheless expect people to respond to it as a TR. The overriding background objective of this research is: to maximise the probability that participants will act as if the immersive virtual reality were real (TR).

The technical research includes the main components of virtual reality - computer graphics and haptics mainly. Haptics hasn't been a strong field for me in the past, but I'm realising its profound importance - and learning more about it (see 'Haptic Rendering' edited by Ming Lin and Miguel Otaduy). However, the research is mainly interdisciplinary, including computer science and neuroscience.

There's another important aspect - there will be several experimental studies, and of course statistical analyses of the results. For a long time I've known that the classical approach to statistics - significance levels, type I and II errors, power, Neyman-Pearson Lemma, etc - is fine, but ... it doesn't make sense. What is a 'significance level'? It is the probability of 'rejecting your null hypothesis conditional on the null hypothesis being true' P(reject H0 | H0). Who cares?

What we're really interested is the probability of the hypothesis given the observed data: P(H | O), where O stands the the observations. This isn't allowed in conventional statistics since making probability statements about a hypothesis doesn't make sense - since the probability of an event has the interpretation that it is the ratio of occurences of the event to the number of times that the event could have occurred in a long run series of independent and identical trials. Clearly the truth of a hypothesis cannot be an outcome of an experimental trial - from this point of view P(H) = 1 (it is true) or P(H) = 0 (it is false), but we don't know which one of these holds.

This way of thinking leads to Bayesian statistics, where probability is interpreted as subjective degree of belief - so a statements such as P(H) = 0.75 is valid, it means your degree of belief that H is true. From Bayes' Theorem we get P(H | O) ~P(O | H)P(H) (I'm using ~for 'is proportional to'). P(O | H) is often something that can be computed from probability theory, and P(H) is your 'prior probability' for H ('prior' because it is before you get the data). Then Bayes' Theorem allows you to update your probability for H as more and more data is accumulated. In the end two different people who might have started with quite different priors will end up with the same final probabilities (P(H | O)) given sufficient data (O). So I preferred Bayesian statistics, although it does require choosing prior probability distributions.

It is interesting that while the field of statistics has undergone a sort of revolution in the past two decades where Bayesian statistics has become completely acceptable, and considered now part of mainstream statistics, the fields in which statistics are probably used most (in psychology and the social sciences) stick rigidly and ideologically to the sacredness of the 5% significance test.

Let's consider an example of how problematic this is. Suppose this week I do an experiment, and I report results at the 5% significance level. OK. Then next week I do another experiment and I report results at the 5% significance level. And so on for the next 100 weeks. Each of these different experiments I write up in a different paper. They are all accepted (well, of course, this is virtual reality!). So no problem with that. Now in a parallel universe, one also where psychology is dominated by classical statistics, I am very energetic, and I do all 100 experiments in a single week, and I write all the results in one paper and I get exactly the same results (i.e., the same things are 'significant') as in this universe. I then submit the paper for publication, and it is rejected for being statistically unsound! Why? Because ... if you do n tests all at the 5% significance level, then 'by chance alone' on the average 0.05*n of them are going to be 'significant' (think back to the meaning of 'significance level'). Note the only difference in the two universes is that in one I spread the results out over 100 weeks, and put them in 100 different papers, but in this other universe I did them all in a short time period and submitted them in one paper.

How, in the second universe, can we get out of this problem? Well the reviewers of the fictional paper say that I should have applied something called the 'Bonferroni Correction'. What this means, at the simplest level, is that if you do n tests, then you should use a significance level of 0.05/n.

But this is unfair no? If I spread the tests out over many weeks and put each in a different paper, then - no problem. But if I'm especially energetic and do all the tests at once, and then write them all in the same paper, my significance level has to be 0.05/100 = 0.0005. Unfortunately now nothing is significant!

Let's take this argument a bit further. Why pick on me? Why not throw your tests into the pot, and in fact all the tests in this universe. n is infinite, nothing is ever significant, all those fantastic results "it was significant at the 5% level" that we've ever seen are all ... are not supported, statistically invalid, since according to the 'Bonferroni Correction' the significance level is 0, and we can't get smaller than that (at least in this universe).

Now more recently there is another 'new wave' in statistics, based on information theory. I've been reading and learning this recently, and it is ... cool. You don't need prior distributions. You consider the question: what 'information' does this data contain about the possible models under consideration? So, I really like the information theory approach to statistical inference, since it gets to the heart of what the real problem is about, without any mumbo jumbo, weird concepts, strange tricks, and sleights of hand.

If you're interested have a look at Unfortunately this approach has not reached the mass of practitioners yet, and maybe because there are a lot of new things to learn, with some not so trivial mathematics in the way. However, there is also a really nice practical book that is within this approach: Burnham, K. P., and D. R. Anderson. 2002. Model Selection and Multimodel Inference: A practical Information-Theoretic Approach. 2nd Ed. Springer. Although very practical it also explains the underlying concepts well. For the first time I felt I was doing something really appropriate in statistical analysis using these ideas analysing a recent experiment. However, probably the psychologists will not agree.

Now to get back to the point - for TRAVERSE I'm looking for researchers to fill a number of new research posts at both the post-doc and PhD student level. I expect that applicants will be from the fields of computer science, or cognitive neuroscience with computer science. Knowledge of computer graphics / virtual reality would probably be essential -

- Except for one position - I really want to have a statistician in my group. I would really like to have a statistician who is not orthodox (but who knows the orthodoxy) and is interested in furthering as a research topic, the information approach to statistics, as well as analysing the data of our experiments. Also, I have a strong intuition that the information approach to statistics may also turn out to be an interesting model for the underlying fundamental research questions that we will tackle.