>GFF3 undocumented feature…

>Earlier today, I tweeted:

Does anyone know how to decypher a diBase GFF3 file? They don’t identify the “most abundant” nucleotide uniquely. seems useless to me.

Apparently, there is a solution, albeit undocumented:

The attribute “genotype” contains an IUB code that is limited to using either a single base or a double base annotation (eg, it should not contain, H, B, V, D or N – but may contain R, Y, W, S, M or K ), which then allows you to subtract the “reference” attribute (that must be canonical) from the “genotype” attribute IUB code to obtain the new SNP – but only when the “genotype” attribute is not a canonical base.

If only that were documented somewhere…

UPDATE: Actually, this turns out not to be the case at all — there are still positions for which the “genotype” attribute is an IUB code, and the reference is not one of the called bases. DOH!

Leave a Reply

Your email address will not be published. Required fields are marked *