Nanopore Sequencing-Based De Novo Nucleotide Modification Analysis
Modified nucleotides play critical roles in a diverse array of biological processes. However, the majority of nucleotide modifications remain undiscovered, as “dark matters” in epigenomic and epitranscriptomic code. Nanopore sequencing opens up possibilities for bringing such “dark matters” to light, by translating biomolecule chemical structures to ionic current signals. However, recovering modification chemical structures from nanopore sequencing signals remains challenging. To solve such a problem, we recently developed a deep learning framework for the de novo inference of novel modifications. Such a framework is based on the fact that nucleotides are usually composed of shared chemical modules. For example, a 5-methylcytosine (5mC) can be seen as a cytosine (C) plus the thymine (T) methyl group. Similarly, an N2-methylguanine (2mG) can be seen as a guanine (G) plus the N6-methyladenosine (6mA) methyl group. We used the graph convolutional network to encode nucleotide chemical structures, further representing novel modifications by re-assembling existing chemical modules. By this means, we correctly predicted, in a de novo manner, the 5mC DNA and 2mG RNA modifications.