>Earlier today, I tweeted:
Does anyone know how to decypher a diBase GFF3 file? They don’t identify the “most abundant” nucleotide uniquely. seems useless to me.
Apparently, there is a solution, albeit undocumented:
The attribute “genotype” contains an IUB code that is limited to using either a single base or a double base annotation (eg, it should not contain, H, B, V, D or N – but may contain R, Y, W, S, M or K ), which then allows you to subtract the “reference” attribute (that must be canonical) from the “genotype” attribute IUB code to obtain the new SNP – but only when the “genotype” attribute is not a canonical base.
If only that were documented somewhere…
UPDATE: Actually, this turns out not to be the case at all — there are still positions for which the “genotype” attribute is an IUB code, and the reference is not one of the called bases. DOH!