Mel Slater's Presence Blog

Thoughts about research and radical new applications of virtual reality - a place to write freely without the constraints of academic publishing,and have some fun.

17 December, 2022

In the Presence of Bayesian Statistics for VR


My original degree and Masters were in Statistics. Then I also studied sociology and psychology. Then via statistics, through my involvement in the statistical language GLIM,  I moved to computer science and computer graphics, while at Queen Mary (University of London). We had our first head-mounted display from Division Inc in 1991, and together with the postdoc Dr Martin Usoh, we started investigating presence in virtual reality. At this point my own background suddenly made sense - through statistics I knew how to design experiments and analyse data. Through the social sciences I could read and understand papers in psychology and related sciences. Computer science, computer graphics,  programming etc meant that I could understand what was feasible given the hardware and software resources. I embarked on a research program in virtual reality mainly rooted in experimental studies, concerned with various aspects of how people respond to virtual environments. I moved to UCL in 1995 and then to Barcelona, and continued with this series of studies 

Over many years, for the analyses of the results of these experiments I used classical statistics - hypotheses testing, significance levels, confidence intervals, and so on. Think about the meaning of "significance level". A lot of people automatically interpret it as the probability of the null hypothesis. So if the significance level is low (< 0.05) then the probability of the null hypothesis being true is low and it should be "rejected". However, it doesn't mean that at all. Significance level is the conditional probability of rejecting the null hypothesis if it is true. This means that in the long run, a significance level of 5% means that 5% of the time, the statistical test will reject the null hypothesis when it is true. This is far from the meaning of the probability of the null hypothesis. In fact in classical statistics, the probability of the null hypothesis being true is 0 or 1 (only we don't know which). This is because in classical statistics probability is associated with frequency of occurrence of an event. Only if there were parallel universes, in some of which the null hypothesis were true, and others not, and we could access data from these universes, could we think of "the probability of the null hypothesis" in classical statistics.

When people study statistics, even if they are mathematically sophisticated, many find the concepts - type 1 error, type 2 error, significance levels, power, confidence intervals, very mysterious. They are! Think about a confidence interval. You're told - the 95% confidence interval for the true mean is between 20 and 30. Automatically this is interpreted as "the probability of the true mean being between 20 and 30 is 0.95." Again, this is a wrong interpretation. It means that if we had multiple repetitions of the same experiment, then 95% of the time the true mean will fall in the calculated confidence limits. This says nothing about the particular interval we have found - is this one of the 95% or one of the 5%? The mathematical theory of classical statistics, like the Neyman-Pearson Lemma is very elegant. But interpretation is somewhat convoluted. 

Classical statistics is easy to do. To apply it you don't need to understand it. You run the data through a statistical package and out comes a number t=2.2, on 18 degrees of freedom, and P < 0.05. (But is it one-sided or two-sided?). P < 0.05, so we have "significance": write the paper! For every different type of problem there is another type of test. There are underlying assumptions that must be obeyed - typically that the random errors in the variable under consideration with respect to the hypothesis, must be normally distributed. If not - try to find some transformation of the data (like taking the log) to make it so. Otherwise use "non-parametric" statistics, and learn another whole set of tests (also available in the statistical package). It is in principle easy.

But what about the "power"? This is the probability of rejecting the null hypothesis when it is not true. You're supposed to compute the power before you do the experiment, because this will help in determining the required sample size. But how can you compute the power? To do so you need know at least the variances of the response variables. To know those you must have already collected sufficient data. But what was the power involved in that data collection exercise? 

Bayesian statistics has a completely different philosophy. You start off with your prior probabilities of the hypothesis in question (e.g., that the true mean is in a certain range). You collect the data, and then use Bayes' Theorem to update your prior probabilities to posterior. So you end up with a revised probability conditional on the data. Of course this is also done through a statistical package, since apart from very simple problems, the integrals involved in finding the posterior probabilities must be evaluated using  numerical simulation.

People argue that there is a subjective element in this. Yes, it is true, for Bayesian statistics probability is not based on frequency but is subjective (though of course you can use informative frequency data). It is subjective, but as you add more and more data, results starting from different subjective priors will converge to the same posteriors. And ... think back to power calculations in classical statistics - these are based on guesswork, "estimations" of variance that typically have no basis in any actual data. Power calculations make everyone relax ("Oh great, it has a power of 80%!") but in reality I think that such power calculations are meaningless.

In Bayesian statistics there are not lots of different tests, no different tests for different situations. There is only one principle - based on Bayes' Theorem. You need good statistical software though, that allows you to properly express your prior distributions, the likelihood (the probability distribution of the response variables conditional on the parameters under investigation), and to compute the posterior distributions.

A few years ago I came across the language called BUGS. It appealed to the computer science side of me because it is a functional programming language where you can elegantly express prior distributions and likelihoods. So I started analysing the results of our experimental studies using Bayesian statistics - I think that this is the first paper where I dared to do this. I thought that reviewers would come down heavily against this, but to my pleasant surprise they were quite favourable, and have never raised questions ever since!

Later I came across the Stan probabilistic programming language which at first I used in conjunction with MATLAB and then using R and the R interface for Stan

Bayesian analysis appeals to me because of its simplicity in concept and interpretation, but also because it overcomes a major problem in classical statistics. When we carry out an experiment there are typically multiple response variables of interest - e.g., the results of a questionnaire, physiological measures, behavioural responses, and so on. Bearing in mind the meaning of "significance", when you carry out more than one statistical test you lose control of significance. E.g., if the significance level is 0.05 for each test, then the probability that at least one will be "significant" by chance is not 0.05 (but greater). There are ad hoc ways around this, like the extreme Bonferroni Correction, or multiple comparison tests like Scheffe, but these are not principled, even if clever.

In Bayesian statistics if you have k parameters of interest, then what happens is that the joint probability distributions of all these parameters together is computed, and then you can read off as many probability statements that you like from this, without losing anything. It is as if you have a page of text with lots of facts. In classical statistics the more facts that you read off the page, the less the reliability of your conclusions. But in Bayesian statistics you can read as many "facts" off the page as you like, and nothing is affected. If you compute a particular set of probabilities from the joint distribution, then they are all valid - those are the probabilities, and that's it.

After many experimental studies using Bayesian statistics and Stan I decided to write a text book summarising what I had learned over the years. The book starts from an introduction to probability and probability distributions, and after introducing Bayes' Theorem goes through a series of different experimental setups, with the corresponding Bayesian model, and how to program it in Stan. There are a set of slides (with commentary) available, and also every program in the book is available on my Kaggle page where they can be executed online (look for the files called "Slater-Bayesian-Statistics-*").

Finally, you might enjoy reading about some statistics of virtual reality.

Classical Statistics

Bayesian Statistics

13 April, 2020

A Shocking Outcome in Virtual Reality

This blog post is on Behavioural and Social Sciences Nature Research.

29 March, 2020

Virtual reality for training

Virtual Reality for Training

Mel Slater

Simulation has been used for training for decades, most notably for flight simulation. These go back to the 1920s, the first being the Link Trainer. A flight simulator is essentially a complete aircraft cockpit, but with all of the visual, auditory and haptic information that signifies the state of the flight being produced by digital means. So the window views are generated by computer graphics, the sound is digitally controlled, and the simulator is in a huge box that is mounted on a platform that delivers the forces to which the simulated aircraft is subject. This is a kind of mixed reality system – the cockpit and all the controls are real, but the displays and forces acting on the physical structure are digitally produced.

Virtual Reality (VR) offers the possibility of training through mainly digital feedback, in the form of graphics, sound and haptic feedback where appropriate. Also it can apply to a very wide variety of circumstances without the need for highly expensive and non-portable platforms. For example, people in a company who have to present in front of clients can learn how to improve their performance by rehearsing in front of an entirely virtual audience, with the help of a trainer. A quite different application might be preparing for emergencies such as a fire in an office. All of these different types of application require essentially the same, and these days quite portable, hardware. 

Acquiring new skills, procedures, and knowledge, through action – eventually leading to expertise. It should be generalisable, so that when new and slightly different situations arise the practitioner can deal with these based on acquired knowledge of similar situations.
Training involves skills transfer where existing skills are utilised to learn new ones – for example, a mechanic may have very well learned how to forge a particular component, but then more rapidly learn what mistakes to avoid when making another different but similar enough component. It is important that training is in line with organisational needs, and follows the organisation’s ethos. 

Why is VR any use for this? Previously I have referred to three perceptual illusions which arise from the use of VR: Place Illusion (the illusion of being in the place depicted by the VR displays), Plausibility Illusion (that events happening there are really happening) and a Body Ownership Illusion (that your co-located virtual body is your body). Together, these mean that you will tend to act in VR much as you would in similar circumstances in reality. So when you stand over a precipice in VR, even though you 100% know that it is not real, and that there is nothing there, you can’t but help have feelings of anxiety. You respond as if it is real.  However, for VR to be useful in training we have to have evidence that people do respond realistically in VR. An example of the type of study that is necessary was given by Bhagavathula (2018) who compared pedestrian behaviour in reality and VR. It was found that pedestrians made similar decisions about crossing the road in both VR and reality, and there was no difference between VR and reality regarding the perceived risk, and estimations of distance of vehicles. However, there was a difference in estimation of speed of an approaching vehicle. On the one hand it is remarkable that the differences between reality and VR were so low, because an essential feature of VR is that everyone knows that even if an approaching car would hit them, nothing would happen. However, remember that VR operates out of perceptual illusions. If your sensory system is showing you an approaching car, then no matter what you might be thinking about it, the safe thing to do is to get out of the way! This is how Place Illusion and Plausibility operate.

So VR is excellent for training because (i) it can make the abstract something tangible. Instead of learning about a complex maneuver  by reading about it, or practicing it in an artificial way, one can actually do it in situ, in a virtual environment. Or for understanding the implications of some complex mathematics, one can use a visualization that involves mobilizing body movements in order to literally grasp it in a concrete way. 

(ii) VR enables doing, not simply observing. Doing engages the whole body in a multisensory way, and the more that the body is engaged the greater the chance for learning and retention. An example of maintenance training involves the learner actually doing so rather than only watching someone else do it or a video.  Or operators can train in facilities before they were ever built Operators can learn the process of making a delivery while maintaining safety for themselves and their customers

(iii) VR is highly suited for training in complex circumstances, where for practical, ethical or safety reasons training in the real site is not possible. Our previous example illustrated this – in that case people were able to train on an installation that did not yet exist. In these complex or dangerous circumstances people can train over and over again, without additional cost of providing materials, for example. Accidents or problems that occur during the virtual training are without physical consequences, but are, of course, ideal for learning. We earlier saw one example of training for fire hazards. But there are complex problems only involving interaction with other humans without complex machinery or installations involved. For example, how do medical doctors learn to deal with intransigent patients? One study showed that doctors faced with unreasonable demands from patients for antibiotics reacted much as they would in reality, and that therefore such environments could be used for ethical and social relationship training. 

 (iv) VR can offer multiple perspectives over the same scenario, which offers also a greater chance of understanding. In VR it is even possible to have a different perspective with respect to yourself as well as experiencing a scenario from different points of view and the training can be collaborative involving several remote participants, or collaboration for bringing experts together from all over the world

(v) VR is excellent for measurement – since everything that the trainees do can, in principle, be recorded and measured: Their overall behaviour, the bodily movements, their physiological and brain responses. The trainee could also re-enter the environment and observe their own recorded data being played out.

Having considered where VR might be good for training we need to also to pay attention to the pitfalls. For example, consider the learning of a sport like table tennis in VR. You could become an excellent VR table tennis player. But as anyone who seriously plays table tennis in reality knows, it is a highly complex skill involving multiple factors associated with every strike of the ball, probably mostly below the conscious awareness of the player. Unless the simulation of table tennis were perfect, it is likely that the skill from virtual table tennis would not translate to real table tennis. Even worse, an already skilled table tennis player might find their skills weakened after playing virtual table tennis because of negative transfer of training. This means that the skilled player may pick up habits that work well in VR but which do not work well in reality. The biggest danger of VR for training is such negative transfer, where people apparently learn something in VR, but where the real world is different enough, perhaps in very subtle ways, that the learning simply does not transfer or makes things worse.

Let’s look at some evidence. Basically if you think that VR is good for training then you will find evidence to support that. On the other hand if you think that it is not good, you will find evidence to support that too.

Winther et al (2020) evaluated VR training for pump maintenance. A comparison was made between VR, and traditional methods of video training and pairwise training where one person helps another to learn. The VR did not have any haptic feedback, whereas with the video and pairwise training trainees could work directly on the real equipment. The sample size was n=36 in a between-groups study with the 3 groups. On almost all measures of training outcome the VR method did not perform as well as the other two methods. This may not be too surprising since the VR method is the only one where trainees could not work directly on the actual machinery. 

Leder (2019) compared VR and PowerPoint instruction for safety training. It was found that VR had no advantage. However, problems with this study were that the VR (in a CAVE) was non-interactive, where trainees seemed to just essentially watch something, rather than be engaged. Also the sample size was small.

Sankaranarayanan et al (2018) compared VR training for a fire in an operating theatre with a control group. VR was found to be superior in learning to successfully put out the fire – 20% of the control group compared to 70% of the VR group were able to complete all the steps correctly one week after the exposure. Note that the sample sizes were quite small (10 in each group) and also it is not clear what the control group actually did. 

Murcia-López el al (2018) compared virtual and physical training for a bimanual assembly task. The VR training seemed not to be different in outcomes compared to the best performing physical condition. The sample size involved (n = 60, over three conditions) was reasonable. Retention of learned skills over time was not particularly good for any of the conditions. 

Gavish (2015) compared VR and AR for training of industrial assembly tasks. The comparisons were VR (with haptics) against a video only control, Augmented Reality working with the real device compared to a control. There were 10 engineers randomly assigned to each group, and the test was to do the real assembly. No difference was found between the VR and the VR-control condition, but AR resulted in less errors overall. However, as usual, look for the cautionary aspects – it is not entirely clear, but it seems that the VR was non-interactive. 

Borsci et al’s 2015 paper reported a meta study of mixed reality (MR) and VR studies for car service maintenance. They only found 8 papers with sufficient rigour to be included in the meta study, and mostly these were on mixed reality – even though hundreds of papers had been published. They concluded:

  • MR systems seemed to be more useful than VR
  • MR/VR resulted in less errors, and required less time for training than previous methods
  • Trainees found the VR/MR more interesting than other methods, with better generalizability
  • VR/MR methods were adaptable to individual expertise, but too much reliance on MR/VR is less effective than on-the-job training.

A number of limitations of the studies were identified:
·      The evaluation studies tend to ignore the organisational setting and organisational needs.
·      They use a limited set of evaluation criteria - too much focussed on time and errors, ignoring: Cybersickness, skill recall and decay, motivation, acceptance, trust, prior attitudes and cognitive skills.

The authors suggested a wider set of evaluation criteria:

·      The effect of and taking account of cognitive skills - including visuospatial abilities-
·      Levels of trust/acceptance of VR/MR tools 
·      Motivation in use by the trainees and trainers
·      Trainee attitudes towards the systems
·      Their previous experience
·      The impact of cybersickness
·      Physiological Reactions – e.g., attention shift, cognitive load, stress
·      Level of presence and engagement
·      Technical aspects and tools features – e.g. effect of designed features, expected and experienced system functioning.

Overall we can conclude that immersive technologies can provide an excellent method for training. It is concrete, it is multisensory involving and engaging the whole body, it can break out of the constraints of reality and give people perspectives that they cannot ever attain in reality, it can be lead to high quality measurement, it is infinitely repeatable, and because of this it is ultimately low cost. 

The cautionary message is that just because of these reasons do not assume that it is going to produce good results. In some circumstances a seemingly unimportant detail that is wrong can lead to negative transfer of training. Before advocating a particular solution for training it must be studied extensively.

Also consider whether the application can be equally well done with other methods (direct traditional teaching, role play, video). Ask yourself why, in this particular application, is VR necessary? Take into account cost, the time scale, the logistics and the feasibility of producing a good enough training scenario without negative transfer. 

Additional Resources

Here are some aspects of training that I have not considered, but which are discussed in this paper to which the section numbers refer:

      VR in surgical training (Section 2.4)
      Navigation rehearsal (Section 2.1.3) (the body is critical)
      Military (rehearsal of scenarios)
      VR in sports (negative transfer considerations critical).
      Model based VR vs 360 video (section 7.2).

Readers might also find the following useful:-

On negative transfer:

On comparative evaluation:

      Bhagavathula et al (2018). "The Reality of Virtual Reality: A Comparison of Pedestrian Behavior in Real and Virtual Environments", in: Proceedings of the Human Factors and Ergonomics Society Annual Meeting: SAGE Publications Sage CA: Los Angeles, CA), 2056-2060.
      Sankaranarayanan et al (2018). Immersive virtual reality-based training improves response in a simulated operating room fire scenario. Surgical endoscopy, 1-11.
      Murcia-Lopez, M., and Steed, A. (2018). A Comparison of Virtual and Physical Training Transfer of Bimanual Assembly Tasks. IEEE transactions on visualization and computer graphics 24, 1574-1583.
      Gavish, N., et al (2015). Evaluating virtual reality and augmented reality training for industrial maintenance and assembly tasks. Interactive Learning Environments 23, 778-798. 

Labels: , , ,