Localization of modified nucleotides via deep sequencing

Ralf Hauenschild & Lyudmil Tserovski

Besides LC-MS, for characterization of chemical nucleotide modifications one can take advantage of properties entailed by next generation sequencing (NGS) techniques. Thereby, polymerase acts as a detector for modified positions in nucleic acid strands via synthesis arrest. Moreover, also non-complementary base pairings are generated at these sites, underlying a modification-specific probability distribution of their frequencies. This can be verified in tRNA e.g. tRNA^Tyr, where one can observe the synthesis behavior at known candidate sites. After completed mapping of the reads to a reference sequence, a coverage profile is created, which is examined for significant drops and characteristic mismatch patterns by automated screening procedures. Machine learning approaches, such as Support Vector Machines, help in transcriptome-wide prediction of unknown modified positions.

For distinction of modification-specific coverage drops from other polymerase-arrests e.g. caused by secondary structures, chemical treatments are applied. Compounds, such as PBC, react with 4-thiouridine and accordingly label the corresponding positions, which can be observed in comparison of treated vs. non-treated samples. In the same way, when looking at the mapping scenario, one can distinguish different non-canonical nucleotides by means of their varying reaction to this treatment. In order to exploit the power of NGS, the RNA molecule has to be converted to a dsDNA at first and multiple copies must be synthesized. Therefore, after the chemical treatment an RNA adapter is ligated and then the RT step is performed. After this, another DNA adapter is ligated to the newly synthesized cDNA strand. This way, a PCR can be finally launched to get the ready-to-be-sequenced, amplified DNA molecules.