Good science


Photo credit: Understanding Animal Research


  • Animals are used as models for humans in research
  • It is a requirement of good scientific research that such models are robust and valid
  • There are a number of factors which can reduce the quality of data obtained from animal models


On this page


animals in science


Animals have been used as experimental models in human medicines since prehistory [1]. Globally, dogs are used for a variety of purposes, whilst in the UK their principal use is to fulfil the legal requirements for the safety testing of new medicines prior to human exposure. There are two crucial reasons to ensure the most humane use of dogs in laboratory settings: our ethical obligation to prevent suffering in a species which experiences pain, discomfort or distress; and our scientific need to ensure that they are fit for use, by which we mean they are valid, reliable and predictive models for safety and efficacy testing of chemicals prior to human use. Legislative (e.g. European Directive 2010/63) and ethical (e.g. the 3Rs) guidelines provide frameworks within which dogs can be used in laboratories. However there remains a paucity of quantitative data on best practice in the dog. Research into the natural history of the dog and its welfare are critical to the development and implementation of effective Refinements, methods of minimising the negative impact of the laboratory environment, and promoting positive welfare.


The scientific importance of dog welfare in scientific research


Quality of science is inherently valuable, and obtaining the best possible quality from any scientific endeavour should always be a goal of those conducting research. The term “quality of science” may be interpreted in several ways and the following sections describe how it is interpreted in relation to the scientific use of animals. In this instance, “quality of scientific process” refers to the manner in which work is designed, conducted, analysed and presented, while “quality of scientific output” refers to the data output obtained. Quality of science can be thought of as the product of both of these factors. We can consider that there are two aspects to good science: quality of scientific process and quality of scientific output. Furthermore, Poole [2]) argues “good science” meets three central criteria, namely:

  1. “There is an important question for which an answer is sought” (validity);
  2. “The experiment should yield unambiguous results” (robustness);
  3. “Variables which are not under investigation are strictly controlled” (robustness/reliability).

It should be easy to see how each of these factors can be influenced by a desire to ensure high welfare. It is explicit in the legislation governing animal use (Home Office, 1986)[3] that to justify the use of animals, a study must have undergone a cost-benefit analysis and that the potential benefit from the information obtained (to science or society) must sufficiently justify the associated animal suffering. The study design must also be capable of obtaining the desired results. This means that the result should not be biased by poor study design or interpretation of the data. Control of extraneous variables is intrinsically linked to producing unambiguous results. Extraneous variables which are not anticipated to have an influence on results, biasing data output, resulting in incorrect interpretation of data.
Poole also stated that:

Normal physiology, in this case meaning normal biological functioning rather than normally-distributed data, may not be present or properly understood [2]. Assuming this in a situation where it is not true leads to poor quality of data, and indeed poor quality of science by designing a poor study with little chance of providing unambiguous data. The following sections discuss the existing evidence for a link between welfare and quality of science.


Linking welfare and quality of science


Animals are used as models for humans in research. Although it is never possible to say that a model responds to a treatment in exactly the same way as a human would, it is important to choose models which, as far as possible, predict that the response of humans. The first step in choosing a model is ensuring it is relevant to the target species, and the second is ensuring that the experiment is capable of detecting responses in the model (is sensitive to treatment effects, [4]). The validity of a model depends on how closely the model resembles humans for the specific characteristic being tested [5]. High fidelity models are those which which resemble the target (in this case humans). Nonhuman primates are a prime example of a high fidelity animal model owing to their close relatedness to humans [4]. However, other organisms or models such as cell cultures may model a specific human system closely despite their lack of fidelity for the human as a whole, they are high fidelity for that particular system.

Gad [1], in a manual on animal models in toxicology agrees that the effects of stress and other biological responses are amongst the “least unaccounted for variables” in laboratory animal science (pg 852). While physiological responses have been identified as influencing responses in toxicology testing, they are rarely factored into experimental design or analysis. For example, Tasker [6] found that restraint had considerable impact on many key toxicology measures in macaques. Everds et al. (2013) provides detailed information on the systems and measurements which are likely to be influenced by stressors, reproduced in the table below. Everds et [7] provide evidence of stressors affecting many of the organ systems key to safety assessment; unaccounted for, these effects of stress could bias the interpretation of results considerably.

Scientific progress is driven by developing and testing novel hypotheses and appropriate and robustly designed experiments are fundamental to this process [8]. Ensuring that studies are well-designed is not only important for ethical reasons, but also to ensure the best use of time, money and to further our scientific knowledge [2,4,5,8].

Although it may seem obvious, excessively large studies waste resources (and most importantly animals), while those which lack power or have an element of bias may give the wrong answer, so adequate time should be dedicated to developing a suitable research strategy a priori, which may involve several individual experiments in order to ensure animals and resources are not wasted [4]. While our ethical obligation to Reduce the number of animals used is often cited as a reason to ensure good experimental design, it must also be unacceptable to design studies which waste money, time and researchers.

Gad [1] provides a list of the potential causes of animal studies not predicting the results of human trials, for reasons relating to experimental design and welfare. Although Gad does not state explicitly that any of these reasons relate to welfare in the animal, there are clearly a number of reasons that animals can fail as experimental models for humans, excluding welfare. This only serves to highlight the importance of designing animal studies to achieve the best possible results, given that there are so many potential factors which can limit the ability of a study to detect the desired effect.


Improving quality of science


Building on Poole’s principles of good science, Festing [4] describes five fundamental characteristics of well-designed studies:

  1. It should be unbiased with all subjects having the same environment unless the environment is the subject of the study. This can be achieved by randomisation of factors throughout the study, or use of factorial designs to determine the influence of environmental factors (robustness);
  2. All experiments should have adequate power so that if there is an effect, there will be a high chance of detecting it. This can be achieved by controlling variation. Animals should be of the same sex, age, weight, health status and housed in the same environment as far as possible. Pathogens and disease increase variability and interfere with results. Stressed animals are also more variable physically and behaviourally. Randomisation should be used where it is not possible to control all factors. It may also be useful to take individual measures before beginning a study so that final measurements can be corrected for individual variation. Once variation has been controlled as far as possible, sample size can be determined with a power analysis (confounding factors);
  3. If it is important to know the effect of strain, sex, diet or other factors on the outcome,
    a factorial design should be used. This can result in greater information from the investigation of several variables and their interactions without the need for greater numbers of animals (robustness, validity);
  4. Experiments should be simple so that chances of making a mistake are minimised. This means studies should be well planned in advance, with no additional components added at a later point as randomisation will no longer be possible (robustness);
  5. The experiment should be amenable to suitable statistical analysis. The most important criterion in this case is independent replication of results. There should be a clear understanding of how the results will be analysed before beginning the study, with researchers consulting a statistician where necessary.

Clearly, if the experiment is not designed with sufficient power, a treatment effect may not be detected, resulting in a false negative [8]. The scientific method assumes the lack of confounding factors or uncontrolled variables (Poole, 1997) and so reducing variation is an important component in increasing power. This can be done by controlling genetic variation by using inbred strains. Although it is sometimes argued that inbred strains reduce external validity (e.g., [9]), Festing [4] states that it is false logic to use outbred strains because nothing is known about the genetics of the subject and this results in an increase in phenotypic variability, reduced power and reduced repeatability. Factorial or crossover designs are particularly powerful, utilising within-subject and between-subject factors [4,10] and result in the need for fewer animals. These designs respectively control for genetic and environmental variation and illustrate the effects of genetic and environmental factors and their interactions.

Increased or decreased variance can be caused by infection, genetics, environment, age, sex, weight, welfare state and other unknown factors (e.g., [2,5,9]) and reduces the power of an experiment to detect treatment effects. Techniques such as the use of inbred strains of mice reduce genetic variation, thereby increasing the probability of detecting a treatment effect on a specific genotype [5], and when a treatment needs to be investigated in several phenotypes, factorial design can substantially reduce the number of animals needed [10]. The use of genetically modified animals is uncommon in primates, and unknown in dogs, and so this is primarily a technique confined to early pre-clinical testing using mice or rats.

Factorial design: consists of two or more factors, each consisting of discrete levels. Crossover designs allow the analysis of the effect of each factor, and the interactions between them, on the out-

Festing and Altman [5] also support the use of historical data, which when carefully used can reduce the need for larger samples sizes in current studies. Meta-analysis and use of contemporary controls may be necessary to ensure that historical data are valid for use in a current study but may prevent the need to repeat previously conducted research. Caution should be taken when comparing data from populations which differ in welfare states however [11]. Data from animals housed under different conditions or experiencing differences in handling are unlikely to be comparable, unless such variables are factored into analysis [12]

Festing [4] states that the randomised controlled, double-blinded clinical trial is the gold standard for nearly all experiments, so where possible and appropriate, these factors should be included in experimental design. One of the most important factors in experimental design therefore is to ensure that the data produced is affected only by the variables under investigation, or where other variables may influence output, that they are accounted for in experimental design and analysis. However many of these factors may appear to be unattended in contemporary research [8], a factor which the ARRIVE guidelines [13] seek to address.


Quality of data output


While there is clearly a link between the ability of an animal to cope with its environment and its physical and emotional health, it is all the more important to understand this link where the animal is a model for human subjects. Although many therapeutic drugs target specific areas of ill health, the desired animal model in toxicology is a healthy animal, rather than one with unknown, stress-induced physiological health issues. Without the ability to understand the specific variation introduced by poor welfare, it is not possible to have an adequately-designed experimental protocol, nor obtain valid results. Quantifying the effects of welfare on quality of data output is one of the overarching aims of this project, allowing the proper design of data collection to achieve the aims of studies. The following section describes how issues in quality of data output can be identified and improved.

Once again, we must return to Poole’s ‘happy animals’: to ensure good science in research using animals, the animal subjects should have biologically normal physiology and behaviour; animals whose ‘wellbeing’ (or welfare) is compromised are often physiologically abnormal and the results of experiments using them may not be reliable [1]. The emotional, subjective experience of animal is central to their welfare and as such should be considered central to their use as experimental models.



Several reviews [4,5,8] have stated that a review of recent research using animals illustrate that many of the principles of good science outlined in the preceding section are not adhered to, which can result in the publishing of research with poor validity. Festing and Altman [5] stated that there are papers published in which the conclusions reached are not supported by the data.

In research which lead to the development of the ARRIVE guidelines, Kilkenny et al. [8] assessed the quality of current research using animals by analysing experimental design, statistical analysis and reporting of results in journal papers in a survey commissioned by the NC3Rs. The survey assessed 271 papers published between 1999 and 2005 reporting publicly-funded original research on live rats, mice and nonhuman primates, as these constituted the greatest part of the literature on research on live animals. Less than half the papers reported the age (43%) or weight (46%) of the animals used, while 24% reported neither. A small percentage (4%) did not report the numbers of animals used and no paper reported how the number of animals needed was decided. The characteristics of the animals used influences the results obtained and is required to replicate experiments. The number of subjects is important for statistical analysis and significance, and the decisions which lead to the number of animals used should be scrutinised to ensure that the 3Rs have been adhered to.

Further analysis found that 35% of papers reported different numbers of animals in the methods and results sections without clear explanation of the difference. In all, only 59% of papers reported a clear hypothesis plus three of: animal sex, strain, weight and age, and the number of animals used.

The authors also assessed the quality of experimental design. Random allocation of animals is a process used to ensure that as far as possible, differences in outcome measures cannot be attributed to random variation and is concurrent with Festing’s “gold standard”. Only 12% of papers reported the use of random allocation. Blinding is a method of minimising bias by ensuring that the experimenter does not know to which condition a subject is allocated. This is important when subjective measures are used. Of the papers using qualitative measures, only 14% used blinding.

In addition to other factors which can prevent an animal model from accurately predicting human responses in trials, without proper experimental design or correct reporting of animal characteristics and analysis methods, it is difficult to determine if experimental results are valid. In the scientific use of animals for the pursuit of new medicines, when we use animals in studies with the capacity to cause pain, suffering, distress of lasting harm (Home Office, 1986), and when the end-users of the test items under investigation are the public and health care providers, it is critical that the best possible quality of scientific investigation is adhered to.

Issues with the reporting of animal research are still present, as reported by Macleod et al., [14] of the CAMARADES project and highlighted in Nature. The ARRIVE guidelines are broadly implemented by scientific journals, and increasingly, by funders, to ensure well-designed research. The NC3Rs has also produced an Experimental Design Assistant which provides feedback on study design. More recently, a guide to the implementation of key principles of Good Statistical Practice (GSP) was published by Peers et al., [15]. This included standards in statistical practices, identification of responsibility for adhering to GSP, improvements to report writing and ensuring decisions are made data driven.


Harmonising welfare and quality of data output through Refinement

The concept of “harmonising” welfare in animals is one of providing all animals with the necessary tools to cope with the environment, in order for them to exhibit the same “harmonised” level of positive welfare [16]. Individual differences may mean that animals have varying needs, so providing a variety of Refinements increases the ability of all animals to cope. The previous sections report a number of studies which have found that rather than increasing variation, increasing enrichments and other Refinements decreased the level of variation in the population. The reason for this is that individual differences in coping styles and abilities to cope vary, and providing the greatest possible variety of coping strategies increases the ability of the animal to manage stressors in its environment.

There are many aspects of the laboratory environment with the potential to decrease the welfare of laboratory-housed dogs. The following sections describe features of the environment which can be modified to have a positive impact on welfare.

While we have discussed how quality of science can be increased through improvements in experimental design, analysis and publication, there is also an obvious role for welfare. It is widely accepted that applying the 3Rs to experiments using animals is consonant with good scientific practice [8]. While laboratory animals do not lack essential needs like food or water, potential causes of distress include social problems including aggression resulting from overcrowding, social isolation, loud sudden noises and poor handling [2]. The needs of the dog in conspecific and human contact may not be met in the laboratory environment.

Festing [4] have stated that stressed animals are more ‘variable’ than unstressed animals, and that disease and pathogens can interfere with experimental outcomes. However, positive changes in stress responses can change the outcomes of diseases, often beneficially. In a review, Van Praag, Kempermann and Gage [17] cite the examples of slower neural degeneration, faster recovery from brain damage and improvements in HPA responses to stress, in experimental studies of invertebrate, rodent and human models of brain damage. Minimising stress during experiments can reduce variation (and therefore the number of animals required) although a thorough understanding of the animal and its biology are needed in addition to experiments which are well designed and statistically valid and appropriate [2].

Animals have evolved a range of coping mechanisms to natural stressors, including changes in behaviour, hormones or immune function. However, in captivity where the environment does not allow an appropriate coping response or overloads it, the animals ability to maintain homeostasis breaks down and leads to a state of distress [18]. Poole [2] states that doubting the role of behaviour in understanding physical wellbeing is a result of misunderstanding that brain and body are linked. The brain, behaviour, hormones and the immune system are linked and interdependent. Several studies have found that stress has negative effects on physical health and the immune system. Examples of these findings are summarised below.

When considering the effects of stress, environmental enrichment has a clear role in influencing welfare and therefore experimental outcomes. Enriched environments provide more opportunities for animals to make choices, increasing their ability to maintain homeostasis or to control social interactions (Hubrecht, 2010). A number of authors reported identified concerns from scientists conduction research with animals that increasing the variability in the environment through increasing enrichment results in a loss of standardisation [19,20,21,22]. Other concerns include the cost of implementing enrichments, bias of experiments and risk to the animal [18]. The effects of changes in the environment seem to be most pronounced, or at least most readily detected in the development of the brain in mammalian species. The table below lists some examples of the influence of environmental enrichment on the brain.

1. Gad, S. C. (2006). Animal models in toxicology. Boca Raton: CRC Press Taylor and Francis Group.

2. Poole, T. (1997). Happy animals make good science. Laboratory Animals, 31 (2), 116-124.

3. Home Office. (1986). Animals (Scientific Procedures) Act 1986. Her Majesty's Stationary Office.

4. Festing, M. (2010). The design of animal experiments. In R. Hubrecht & J. Kirkwell (Eds.), The UFAW Handbook on the Care and Management of Laboratory and Other Research Animals: Eighth Edition. Chichester: Wiley-Blackwell (p. 23-36).

5. Festing, M. & Altman, D. (2002). Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR Journal, 43 (4), 244-258.

6. Tasker, L. (2012). Linking welfare and quality of scientific output in cynomolgus macaques (Macaca fascicularis) used for regulatory toxicology. (Unpublished doctoral dissertation). University of Stirling, Stirling, UK.

7. Everds, N. E., Snyder, P. W., Bailey, K. L., Bolon, B., Creasy, D. M., Foley, G. L., Rosol, T. J. & Sellers, T. (2013). Interpreting stress responses during routine toxicity Studies: A review of the biology, impact, and assessment. Toxicologic Pathology, 41 (4), 560-614.

8. Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M. & Altman, D. G. (2010). Improving bioscience research reporting: the arrive guidelines for reporting animal research. PLoS Biology, 8 (6), e1000412.

9. Wurbel, H. (2002). Behavioral phenotyping enhanced - beyond (environmental) standardization. Genes, Brain and Behavior, 1 (1), 3-8.

10. Shaw, R., Festing, M., Peers, I. & Furlong, L. (2002). Use of factorial designs to optimize animal experiments and reduce animal use. ILAR Journal, 43 (4), 223-232.

11. Hall, R. & Everds, N. (2008). Principles of clinical pathology for toxicology studies. In Hayes, A.W. & Kruger, C.L. (Eds.), Hayes' Principles and methods of toxicology. Florida: CRC Press Taylor and Francis Group. (p. 1317-1358).

12. Richter, S. H., Garner, J. P., Auer, C., Kunert, J. & Wurbel, H. (2010). Systematic variation improves reproducibility of animal experiments. Nature Methods, 7 (3), 167-168.

13. Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M. F., Cuthill, I. C., Fry, D., Hutton, J. & Altman, D. G. (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One, 4 (11), e7824.

14. Macleod, M.R., McLean, A.L., Kyyriakopoulou, A., Serghiou, S., Wilde, A., Sheratt, N., Hirst, T., Hemblade, R., Bahor, Z et al., (2015). Risk of bias in reports of In Vivo research: A focus for improvement. PLoS Biology, 1-12.

15. Peers, I. S., South, M. C., Ceuppens, P. R., Bright, J. D. & Pilling, E. (2014). Can you trust your animal study data? Nature Reviews Drug Discovery, 13, 560.

16. Buchanan-Smith, H. M. (2006). Primates in laboratories: Standardisation, harmonisation, variation and science. ALTEX: Alternatives to Animal Experiments, 23,115-119.

17. Van Praag, H., Kempermann, G. & Gage, F. H. (2000). Neural consequences of environmental enrichment. Nature Reviews Neuroscience, 1 (3), 191-198.

18.Hubrecht, R. (2010). Enrichment: Animal welfare and experimental outcomes. In R. Hubrecht & J. Kirkwell (Eds.), The UFAW handbook on the care and management of laboratory and other research animals: eighth edition. Chichester: Wiley-Blackwell.

19. Wurbel, H. (2001). Ideal homes? Housing effects on rodent brain and behaviour. Trends in Neurosciences, 24 (4), 207-211.

20. Wolfer, D. P., Litvin, O., Morf, S., Nitsch, R. M., Lipp, H.-P. & Wurbel, H. (2004). Laboratory animal welfare: Cage enrichment and mouse behaviour. Nature, 432 (7019), 821-822.

21. Benefiel, A. C., Dong, W. K. & Greenough, W. T. (2005). Mandatory "enriched" housing of laboratory animals: The need for evidence-based evaluation. ILAR Journal, 46 (2), 95-105.

22. Hubrecht, R. & Kirkwood, J. (2010). Introduction. In R. Hubrecht & J. Kirkwell (Eds.), The UFAW handbook on the care and management of laboratory and other research animals: eighth edition. Chichester: Wiley-Blackwell (p. 1-2).