by James K. M. Brown
Cereals Research Department, John Innes Centre,
Colney, Norwich, NR4 7UH, England
News (16/11/04): I am currently re-writing EDGAR. It will produce spreadsheets in Microsoft Excel format. My intention is to make EDGAR faster, more reliable and easier to use and to make the output more convenient. I also hope, in due course, to include incomplete block designs. Eventually, this will become EDGAR version 2. I would greatly appreciate feedback on the new programmes.
More news (2/3/05): The Fortran routines which the old EDGAR programmes use are about to be taken off our server. Please use the new programmes where available. I am rewriting the other programmes rather slowly so if you need one desperately, please let me know and I will try to speed it up.
News (13/6/05): The server which ran the old versions of the EDGAR programmes has now been switched off.
News (23/12/05): EDGAR has moved. The new address is www.edgarweb.org.uk. Please bookmark it for future reference.
News (20/08/10): I've fixed a bug in the way that the Latin Squares and Complete Randomisation with Unequal Reps programmes randomised the assignment of treatments to experimental units. Thanks to Michael Hinchcliffe for pointing it out.
Note that, in the documentation below, the specifics of how EDGAR operates are out-of-date. I'll get round to updating them sometime soon. The main thing you should know at the moment is that EDGAR produces spreadsheets in MS-Excel 2003.
What is EDGAR?
For those
from outside the JIC
Principles of experimental
design
What designs can EDGAR handle?
How large a design can EDGAR generate?
Are there any restrictions on the format of input to EDGAR?
How can I save output from EDGAR?
Future developments
Acknowledgments
EDGAR is written so that you can access it via the Internet. You will need a world-wide-web (WWW) browser capable of interpreting forms, such as version 2.0 or higher of Netscape.
This introduction is not intended to be a substitute for proper advice.
It is much better to spend a little time seeking advice from a statistician
about the design and analyis of your experiment beforehand than to end up with
a set of data, acquired at the cost of considerable time and money, from which
it is very difficult or even impossible to draw any conclusions. To quote Sir
R. A. Fisher, "to call in a statistician after the experiment is done may be no
more than asking him to perform a post-morten examination: he may be able to
say what the experiment died of".
A unit is the part of the experiment to which each treatment is applied. For instance, units may be pots to which different fertilisers are applied or petri dishes in which you put different media or growth rooms in which you have different environmental conditions. In an experiment involving different genotypes, you obviously can't choose the genotype of any particular plant; here, the experimental unit could be considered to be the space in the growth cabinet, glasshouse or field plot in which you decide to put a particular genotype. Other experiments may involve doing something to a set of different samples in order - extracting RNA or scoring microscope slides, for example. The experimental unit here is the slot in the time sequence in which you work on each sample.
Three principles of good experimental design are replication, randomisation and blocking.
For instance, observing that one plant of a particular genotype is more resistant to a disease than one plant of a different genotype tells you nothing about the difference between the mean disease resistance of the two genotypes; the difference you observed could have been caused by the environment or the inoculation procedure affecting the two plants differently. To make any inference about the mean difference between the genotypes, you need to test several plants of each.
The more replicates you use, the smaller the differences you should be able to identify between the means for the various treatments. In some cases, you may wish to know if some treatments are more variable than others; you may be able to reach such a conclusion if you have used sufficient replicates of each treatment.
Randomising the treatments in time or space is an insurance policy, to take account of variation that you may or may not know to exist under the conditions of your experiment. For instance, the levels of light in growth cabinets vary considerably, so randomising the layout of the plants of different genotypes is essential to make sure that no one genotype is consistently exposed to light levels which are particularly high or low. In the pathology example, randomising the layout of plants of the various varieties means that no one variety is consistently exposed to particularly high or low levels of inoculum from nearby plants.
B B D D C C C B A D A A C A B A B D D CNotice that the plants of genotype A have tended (purely by chance) to be clustered towards the later times. Another example: this is a randomised layout of plants of four genotypes on a glasshouse bench:
D A B D BHere, all of the C genotypes have been placed (again, just by chance) at the bottom left-hand corner. Such layouts, where each treatment has an equal chance of being applied to each unit of the experiment, are known as completely randomised designs.
A A C A D
C D A B D
C C C B B
However, this clumping effect can be avoided to a large extent if you group the units into blocks. Typically, in each block, one unit is given each treatment, and the treatments are randomised among units within blocks. The following are similar to the two examples above, but randomised blocks have been used rather than completely randomised designs:
Block: 1 2 3 4 5and
D C B A : D B C A : D B A C : A C D B : B D C A
Block: 1 2 3 4 5Blocking should be used to control systematic factors which might affect your experiment. Such factors might include, for example, light levels and temperature in glasshouses and growth cabinets or the fertility of soil in a field trial. Time could also be a block factor, since your concentration or expertise could alter as you carry out a task, such as measuring disease levels, scoring microscope slides or making quantitative extracts of nucleic acids. Blocks should be arranged so that systematic differences are expected to vary more between blocks than within them.
D C D A C
C B B B B
A A C C D
B D A D A
Complete randomisation allows differences between the mean effects of treatments to be estimated with higher precision than other designs do, if you are sure that there is very little extraneous, systematic variation. However, it doesn't allow for the possibility that there may be some unknown extraneous factor, so if in doubt, use a blocked design. In principle, you can use different numbers of replicates of each treatment in a completely randomised design, but this is not an option in EDGAR 1.0.
This is much the most widely-used design for a single factor, and is the easiest way of controlling extraneous, systematic variation. Randomised complete blocks are also widely used for two treatments, but split plot designs (see below) may offer an advantage in some experiments.
If you have many treatments, it may be better to use an incomplete block design, in which the experiment is arranged in smaller blocks, each of which has some of the treatments. EDGAR 1.0 does not generate incomplete block designs.
EDGAR 1.0 can generate one common type of split-plot design, in which the experiment is arranged in blocks, each of which has one main plot of each treatment of the first factor, while each main plot has one sub-plot of each treatment of the second factor. Main plot treatments are randomised within blocks and sub-plot treatments are randomised within main plots. For instance, this is a design for an experiment in which four varieties of a plant (A-D) are infected with three isolates of a fungus (P-R); the experiment is in two blocks:
Block: 1 2Here, the main plots are arranged in randomised blocks, while the sub-plots are completely randomised within main plots.
Main plot: 1 2 3 1 2 3
Isolate: R P Q Q R P
Variety: Sub-plot 1 D C B C B B
2 C D C A C D
3 A B D B A A
4 B A A D D C
Many other split-plot designs can be constructed, although EDGAR 1.0 does not provide other options. In principle, any arrangment of sub-plots - randomised blocks, a complete randomisation, Latin squares, etc - can be nested within any arrangement of main plots.
In a Latin square design, the units are arranged in a square (real or imaginary), and each treatment appears once in each row and each column. For instance, with four varieties (A-D):
B A C DSince, in a Latin square, one has to estimate the effects of three sources of variation - the row factor, the column factor and the treatment that you're actually interested in - this design is less efficient than the simpler block design. It is best used, therefore, when there is good reason to suppose that there are indeed two extraneous sources of systematic variation. Latin squares are rarely used when there are more than a few treatments.
D C B A
A B D C
C D A B
EDGAR is written by James Brown, John Innes Centre, Norwich, England.
Comments welcome. I would particularly like to know what kind of research you do, if you found this programme useful and what changes or improvements would be helpful.
Last updated on 13th June 2005