DNA Analysis
Fingerprinting
Human fingerprints are a near unique marker of identity. If a finger print is found at a crime scene, there may be reason to ask the person to explain their actions. Enabling questioning requires two things, a unique naming of the person who has a set of characteristics (sex, height, age, etc.) and association of name with the fingerprint.
A similar situation arises with trees except… crimes aren’t committed by trees. Trees and their fruit have been described by a wide range of characteristics (morphology): the size and form of tree, the type of fruit, its colour, shape, flavour, fruiting season etc. The totality of these characteristics enable us to name the species and variety; that’s how we identify the apple (link to Identification page). As a result of genomic research, the importance and uniqueness of DNA in all organisms has been progressively more understood and utilised. Over the last twenty years research has shown that each apple variety (though not yet sport) have many relatively easily determined DNA sequences that are pretty well unique. It remains then to link these sequences to a set of physical characteristics. And this is what has been happening in many parts of the world.
In the UK, we have the National Fruit Collection at Brogdale which contains about 4000 different accessions of species/varieties (link to Orchards and Sister Organisations). In other countries similar work has been carried out; in Europe a consistent methodology has been employed; it is effectively a “common language” (link to DNA Project). It enables both a more thorough checking of the method’s consistency and a much greater number of varieties to be characterised.
What it can and can’t do
From the fingerprint, the variety’s name can’t be deduced directly. That requires a pre-existing link has been made between fingerprint and morphology. However, determining the DNA fingerprint enables us to have answered whether this variety has already been described and fingerprinted? If so we can find out what others have called the variety and what they considered its morphology was. If not, then the variety investigated is almost certainly one not found before, meaning it may be one not yet described, a seedling, or perhaps a lost variety. For MAN and other Orchard Groups seeking to find lost varieties, the latter is the outcome we hope! Either of these outcomes provides useful information to us, the former helps connect with prior work for limiting or avoiding identification effort, the latter provides a necessary condition for a “new” find.
Now that DNA fingerprinting methodology has become so well automated, the costs have fallen dramatically, to a few £10’s per sample. This has meant that large collections can fingerprinted at modest cost, and even small groups or private individuals can participate with their collections. This has huge benefit by:
- increasing the number of varieties involved,
- increasing the security of these varieties with more proven examples more widely distributed,
- increasing the probability that varieties thought “re-found” are indeed so,
- consolidating information, reducing rework and differences of opinion, and,
- encouraging groups to work together.
Fingerprints of apples in the National Fruit Collection have confirmed most of the known duplicate accessions. See for instance: Fingerprinting the National Apple & Pear Collections, East Malling Research, GC0140, 31Mar10, http://sciencesearch.defra.gov.uk/Default.aspx?Menu=Menu&Module=More&Location=None&Completed=0&ProjectID=15150 . That was heartening. But there were quite a few surprises; some varieties identified unique were found to be the same as others. Reasons for this may range from handling and documentation errors to mis-identification. The DNA fingerprinting technology is far more robust than the human processes of documenting and record keeping, planting, collecting specimens and identification.
To date, over 5000 samples from Orchard Groups and members of the public have analysed, and more than 1000 remain unmatched, though some may be of mutations or have minor experimental glitches. Here’s an example of matching. The fingerprint is shown as two blocks of 6 to make it more legible. Rows against Adams’s Pearmain is the fingerprint from the NFC and that against A1155 is of a sample submitted by MAN.
All numbers, or alleles, are the same and that confirms the sample from MAN is the same as the variety in the NFC called Adams’s Pearmain. Three of the markers (second block, first three) have only one value, for these alleles from both parents are the same, and these alleles are referred to as homozygous. Nine have two different alleles, and there are none with three or more; the variety is most likely diploid.
The 2016 Fingerprinting Campaign
About 25 orchard groups have participated in a programme of DNA finger printing apples, pears, plums and cherries during 2016. The programme was co-ordinated and part sponsored by FruitID http://www.fruitid.com/#main, with analysis carried out by East Malling Research, EMR, (http://www.emr.ac.uk/commercial-services/dna-testing/ ) and results interpreted by the University of Reading (https://www.reading.ac.uk/horticulture/Research_activities/chl-research.aspx) as curator of the National Fruit Collection http://www.nationalfruitcollection.org.uk/ . In total 930 samples were submitted, of which there were 607 apples.
It required each group to collect leaves during May or June and send these in sealed bags to EMR. Over the next 3-4 months, DNA was extracted from the leaves and then analysed (for the basis of the methodology see below Under DNA molecular and basis for DNA SSR analysis), with equipment carefully calibrated and results checked. Fingerprints were then compared with those from existing examples from the National Fruit Collection, from other collections and also with other samples in this campaign. It used a rather sophisticated computer program called
“Diversity Arrays Technology” http://www.diversityarrays.com/ . Wherever a match was found the name of this variety was suggested, and when there was no match this was also reported. The preliminary report was really eagerly awaited. Many of us had baited breath. It came on schedule in November.
“In the Newsletter of 2014, p8-9 we listed 31 varieties that had been re-found. What does the DNA fingerprinting add to this? Twenty five were variously found to be unique or new finds. Two varieties we’ll be renaming and another four required further discussions and research.”
Here’s our list of varieties that we have authenticated and with their unique fingerprints for these twelve primer pairs:
It’s so important to our work that we venture repeating it, critical for success is it needs a(n agreed) reference description of each of the fingerprinted varieties to be made by expert pomologists. Humans making interpretation of the written word will remain essential for many years. But, it’s this which makes the subject such fun.
The DNA Molecule
Deoxyribonucleic acid (DNA) is the large molecular compound, the structure of which is now well known as a double helix, a discovery made by Watson and Crick with help from Franklin and others. DNA provides an organism with the information needed for its growth and reproduction. The DNA helix is made of two strands linked together. Each strand is formed from a sequence of molecules called nucleotides, and these in turn are made of three building blocks: a) one of four “bases” bound b) to a sugar called deoxyribose and c) a phosphate for linking deoxyribose molecules together. The four bases are molecules that have the ability to bond together, if a little weakly, via what is known as hydrogen bonds. These bonds are also present in water giving it the property of being a liquid at typical terrestrial temperatures; without hydrogen bonding life would likely not have been possible. There is a detailed description in Wikipeadia https://en.wikipedia.org/wiki/Nucleic_acid_double_helix
Many organisms have DNA involving 100’s millions of nucleotides, with a length if stretched out of 100’s mm or more. But they are often contained with the organism’s cell which is typically only 0.01 mm in diameter. To achieve this the DNA double helix is wrapped and folded into a compact structure to sit within the nucleus, tighter and more intricately wound than any ball of wool! When an organism needs to grow or do something it has to access the information contained within this molecule. For apples, its characteristics are contained within the two pairs of 17 Chromosomes (three sets of 17 in triploids), and it is how the organism responds to (expresses) these that determines what it will become. For apples, this is whether they are cookers, eaters or cider, whether they mature early or late, are red, russetted or green etc.
In 2006, Riccardo Velasco with a team of over 80 researchers published the results of the first analysis of an Apple DNA sequence, ‘The genome of the domesticated apple (Malus × domestica Borkh.)’ http://www.nature.com/ng/journal/v42/n10/full/ng.654.html#t2 They had selected Golden Delicious as arguably the (then) most important commercial variety. They found that the DNA had about 742 million base pairs and that the chromosomes contained about 57000 genes. Comparison of these results with those from other Malus species demonstrated that the domestic apple is most probably derived from the Malus sieversii, which is still widespread in the wild in the Tian-Shan mountain areas across Kazakhstan and China, rather than the crab apple Malus silvestris.
The basis of DNA SSR analysis
If the whole genome had to be analysed to determine the variety of a given apple, even with the current sophisticated technology, it would be slow, expensive and produce a huge amount of unnecessary information (well, for identification purposes). There is a fortunate aspect of DNA that has been cleverly exploited. Typically only 10-20% of the molecule comprises the genes, the remainder does not contribute to making (coding) of proteins, i.e. it is non-coding. What this material, perhaps 600 million base pairs in apples, does is not fully understood, it may confer structural stability, modify the expression of genes, or do things we don’t yet understand. However, as these sections are not involved in evolution, the sequence in any given variety remains (almost) identical from one grafting to the next, and is inherited from one parent to progeny. Also there are many quite long lengths where the sequence of bases is repetitive, such as AT AT AT A, or AT AT AT AT AT AT AT. Or GAC GAC GAC GAC or GAC GAC GAC GAC GAC GAC. These are known as Simple Sequence Repeats (SSR). And the length of these repeat section can be measured quite simply with the clever technology that we’ve become aware of during Covid testing, PCR or polymerase chain reaction.
At either end of the SSR there are different and more varied sequences of bases are conserved among (all) apple varieties. These two end members can be located and a copy of the DNA made comprising these end members and the SSR. This DNA fragment can be multiplied a billion fold by the PCR to give sufficient material to be analysed by gel- or electrophoresis with fluorescent tags to the markers. The process begins with taking a synthesised small fragment of about 15 nucleotides, a primer, that matches one or other of the end member (markers) of the SSR sequence. If this is mixed with the DNA and the four nucleotides, the primer will continue growing in length, potentially until it reaches the end of the DNA or the processes is stopped. It makes a complimentary copy of the single strand of DNA. By having to primers, one for each end of the SSR, two DNA fragments grow from the two end members and in opposite directions. A next step is two separate the original DNA strand from the fragment, followed by a further growth phase. This time the newly formed fragment only grows from one end marker back to the start point of the other primer. Further amplification cycle causes these shorter fragments to double in number while the original strands remain and the first generation fragments increase in proportion with the number of cycles.
The elegance of this method known as the polymerase chain reaction is explained concisely in https://www.genome.gov/about-genomics/fact-sheets/Polymerase-Chain-Reaction-Fact-Sheet
The East Malling Research methodology uses twelve of 17 primers developed for Malus x domestica SSR analysis, they are:
CH04c07 CH01h10 CH01h01 Hi02c07 CH01f02 CH01f03b
GD12 GD147 CH04e05 CH02d08 CH02c11 CH02c09
These primers are an agreed set from the European Cooperative Programme for Plant Genetic Resources (ECPGR) http://archive-ecpgr.cgiar.org/fileadmin/www.ecpgr.cgiar.org/NW_and_WG_UPLOADS/MalusPyrus2012/MalusPyrus_SSR_Markers.pdf. Analysis carried out with these markers can thus be compared with studies elsewhere, and a UK species can for example be identified if it is analysed in, say, Hungary. The base sequence of these markers can be found at the website https://www.rosaceae.org/search/markers
The chemistry of these processes is explained in for instance S. Doonan, “Nucleic Acids”, Royal Society of Chemistry, Tutorial Chemistry Text 20, 2004, ISBN 0-85404-481-7.
From the measurement of the length of the DNA fragments formed between two end members, defined by one of these twelve pairs of primers, a characteristic of that apple varieties DNA can be found. For each pair of primers, there may be one, two, three or occasionally four fragments of different lengths, typically of 100-250 bases or nucleotides. Any one primer pair is insufficient for characterizing a variety, there are just too many varieties that have identical fragment lengths. However, if multiple primer pairs are used, the chance of this diminishes dramatically. With twelve primer pairs, each variety has been found to be pretty well uniquely characterised by the set of fragment lengths from these twelve primers. And this collection of between 12 and 48 numbers is the DNA SSR fingerprint for that variety. Currently, the (small) mutations that give rise to sports can’t be distinguished, yet. It is usual for there to be two numbers found, one derived from each parent. However, if there is only one it is because they are the same length and are not differentiable. If there are three, it suggests that the variety maybe triploid (i.e. have three sets of chromosomes), and if there are four it suggests that the variety maybe tetraploid (i.e. have four sets of chromosomes). However, again the lengths, or number of oligonucleotides (often called allele), may have two or more the same, for instance a triploid might only have one allele length found because all three are the same for that primer (or marker). It is likely that several of the markers will have the same number of different allele lengths as the ploidy number, and this is another piece of information that comes with SSR. https://everything.explained.today/Microsatellite/
As an early example of the results that came from this work, Dr Matthew Ordidge and Penny Hales found that two varieties they had accessed as different, Betty Geeson and Broad-eyed Pippin, were actually the same. It prompted a closer inspection of both. That showed they both were consistent with Robert Hogg’s description of the former; thus NFC does not have the latter accessed, does anyone have it in a collection anymore? Or perhaps they have always been one and the same, just synonyms?