The deposition of tau protein aggregates in the brains of affected individuals has a critical role in Alzheimer's disease (AD). In this project, we use Drosophila as a model organism to identify the molecular mechanisms that cause Alzheimer's disease and associated tauopathies. We hypothesize that 1) deregulation of RNA expression, 2) specific transposable elements, and 3) associated piRNAs, small RNAs that silence transposable elements, mediate tau-induced neurotoxicity. We validate our findings from RNA-sequencing by RT-PCR and Nanostring, and perform comparative analyses in postmortem human brain.


  1. Whole RNA-seq on 3 control and 3 tau transgenic samples. Each sample is composed of heads of 6 male and 6 female flies, all 10 days old.
  2. Small RNA sequencing on the same above samples.
  3. Human RNA-Seq data from Mayo Clinic to test these hypotheses, which Bess made based on Drosophila data. Specifically, the raw fastq files from MayoRNAseq dataset for NCI, PSP, AD, and PA (Synapse ID: syn5550404, cerebellum and temporal cortex).
  4. Human RNA-seq data from iPSC-derived neurons from 3 patients with tauopathy, along with 3 CRISPR-corrected isogenic control samples. On 24 April 2018, Celeste Karch provided us with bam files of the data from their JAMA Neurology 2018 paper (readme.txt).


Dr. Bess Frost from the Department of Structural and Cellular Biology at UT Health Science Center in San Antonio.


  1. Adrian's project: Analyze RNA-Seq data and test the hypothesis that in AD, a) terminal differentiation genes are underexpressed (we have ~150 candidates), and b) genes associated with development and stem cell self-renewal are overexpressed (we have ~100 candidates). Here we need to perform a delicate analysis which does NOT ignore RNAs with low levels of expression.
  2. Wenyan's project: There are about 22,336 piRNAs in Drosophila , which silence the 179 currently known transposable elements (TE). Wenyan has genetically manipulated expression of the 10 proteins that are known to regulate piRNAs, and has found that they mediate tau neurotoxicity, suggesting that dysregulation of piRNA biogenesis plays a role in tau-induced neurodegeneration. We measure the expression of piRNAs and TEs in the normal and AD samples using small RNA-Seq and RNA-Seq, respectively. We compare these expression levels to identify the piRNAs that target a specific TE. We can also use the sequence similarity to enhance or validate the analysis. We submitted this manuscript on 2017/09/16.
  3. Using small RNA-Seq data, we identify the microRNAs and other types of small RNAs that are differentially expressed in AD.

mRNA Analysis

  1. Uploading RNAseq data to the Lonestare cluster and cleaning the fastq files: Habil's lab notebook, 2016/9/14-2016/9/19.
  2. Mapping reads to transcriptome and transposons: Habil's lab notebook, 2016/9/22.
  3. Differential gene expression analysis based on counts data and heatmaps of TPM values. Hanie's lab notebook, 2016/09/25.
  4. Bess created a UCSC Genome Browser session for the protein coding hits. For several of the protein-coding hits, we have an increase of one isoform and a decrease of another isoform, e.g., prage, CG42265, Rps21, CG31955, CG1147, Ndae1, Pect, sano, Treh, Glut1, Spc105R, CG14459, CG8379, and CG31381 (heatmap). See the details in the emails with subject: "UCSC Session - share" and "opposite isoforms - greater understanding". Habil could not identify any significant motives for the corresponding 3' (MEME) or 5' UTRs. When all UTRs were combined (fasta), a poly A motive with width 15 was identified (E-value: 0.003, MEME), which is similar to MA0481.1 (~FOXP1).
    Update: This motif is very popular in the UTR regions of the DE transcripts; "There were 3200 motif occurences with a p-value less than 0.0001" (html, txt, gff). See 2017-04-27 on Habil' lano.
  5. Habil used Ensembl Biomart to identify the chromosome, start and end of the transcripts, which can be used for better presentation in the Genome Browser (2017-04-27). He added this information to the DE table.
  6. Hanie used HOMER to identify a motif for opposite UTRs ( See 2017-05-05 on her lano.)
  7. Dr. Frost founded that there are some isoforms in DE transcripts that one of them is upregulated in TAU samples while the other one is down regulated. Most of these isoforms differ in their 3' or 5' UTR. She Thinks that there is a common sequence in up-regulated transcripts and absence of it is the reason that other transcripts are down regulated. There is summary of how we found the common motif on Hanie's lab notebook.(2017-05-31,2017-06-1)
  8. Hanie created pheatmaps for diffrentially expressed transposons using heatmaps_for_paper.R. It is reported on her lab notebook(2017-07-31).
  9. RNA-Seq analysis methods section for the TE manuscript.
  10. Habil copied the human cerebrum data from Ranch to Lonestar, cleaned, and mapped the data (counts, see the 2017/08/18 note). Hanie did the same for temporal cortex (counts ,2017-09-01).
  11. The results of DE analysis of human TEs are reported in the Google doc in green.
  12. Adrian asked if a specific twintron is DE, which is not as explained in Habil's email on 2017-09-20.
  13. Habil compared RNA-Seq vs. NanoString (See under 2018/04/25 in his lano).
  14. Habil's analysis on the human iPSC dataset did not identify any DE TE (see oncinfo/Habil's lab notebook|2018/05/17).

Small RNA Analysis

  1. Habil copied the fastq files from the dropbox folder in Zhao's email to the work directory on the Lonestar5 cluster. He created a symbolic link to them from his home directory. See ~/proj/alzheimer/data/Drosophila_2017-03-03/readme.txt.
  2. Using FastQC for quality check. Habil and Hanie cleaned the fastq files (QC). About 70% reads survived. "The problems about duplication and GC content are generally common in small RNA libraries."
  3. We need to concatenate drosophila_pir.fasta and dmel-all-miRNA-r6.14.fasta into dmel-all-miRNA-r6.14_pir.fasta and use it as an input to map reads. Since we did not have any drosophila_pir.fasta file, we had to create it from drosophila_pir.txt. It takes several steps. There is brief summary of what we did on Hanie's Lab notebook 2017-03-11.
  4. Habil mapped the reads similar to the 2016/9/22 approach in his lano. The mapping rate was about 50%.
  5. DE analysis is reported on Hanie's Lab Notebook on 2017-03-11,15, and 22.
  6. Network analysis using Pigengene as reported on Habil's lab notebook on 2017/3/27 (csv).
  7. Habil identified the transposons that are piRNA targets. He also updated the DESeq2analysis.R script on 2017-04-04 to add the targets of each piRNA to the DE table. There are 848 transposons that are targets of some DE piRNAs (with padj<0.05). 77 ( %) of these transposons have padj<0.01. This is more than expected because out of 5,392 annotated transposons, only 110 (2%) have a padj<0.01.
  8. There are 44 DE piRNAs with unique sequences (p-value <0.01). They map to 989 genomic locations, where 479 positions are on the sense (+) and 510 position are on the antisense (-) strand of DNA (piRnap.xls).
  9. Hanie and Habil performed DE analysis on the esiRNAs on 2017/07/22 as reported on Habil's lano.
  10. The results from NanoString validation on 2017/07/27.
  11. Hanie created pheatmaps for diffrentially expressed small RNAs (piRNAs and esiRNAs) using heatmaps_for_paper.R. It is reported on her lab notebook(2017-07-31).
  12. Habil identified 74 clusters of DE piRNAs based on genomic location. He grouped piRNAs in a cluster if they are closer than expected (i.e., length of genome/ number of DE piRNAs =142,585). Each of the resulting clusters spans a region of a few thousand bps long.

Related work

  1. Frost, Bess, et al. "Tau promotes neurodegeneration through global chromatin relaxation." Nature neuroscience 17.3 (2014): 357-366.
    The previous Bess's study using Chip-Seq suggesting that transposable elements RNAs may be increased in tau transgenic flies.
  2. Russo, Joseph, Andrew W. Harrington, and Mindy Steiniger. "Antisense transcription of retrotransposons in drosophila: an origin of endogenous small interfering RNA precursors." Genetics 202.1 (2016): 107-121.
    Analyzed esiRNAs.

Related software and resources
  1. The annotated Drosophila melanogaster (Dmel) genome, RNA-Seq data and transposable element files are available by expanding the appropriate tabs in this flybase page. The genome sizes are available from the UCSC Genome browser.
  2. piRNABank: A database of piRNAs for Drosophila and other model organisms in FASTA format. A piRNA can be present in more than 1 position of the genome.
  3. piPipes: a set of pipelines for piRNA and transposon analysis (code). It predicts piRNA transposon targets. RepEnrich is an alternative tool for this purpose.
  4. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets (code).
  5. Jiang et al. (IEEE BIBE 2016) compared 5 piRNA databases (see Table 2). They also integrated 3 miRNA target prediction tools including TargetScan, miRanda, and RNAhybrid to predict piRNA cDNA targets in human.
  6. Sequences of human transposable element, including ancestral (shared) TEs, can be downloaded from Repbase. Habil used these data to create a table that determines the class of each human transposon (csv, See his note on 2017-09-11).
  7. Sequences of esiRNAs (a.k.a. TE-siRNAs) of Drosophila (dm3 assembly) in fasta and bed formats, which were provided by Jiayu Wen in an email in July 2017.
  8. The raw human data generated in the Mayo Clinic are in bam files, which can be converted to fastq using SAMtools, Picard, bedtools, bamtools, etc.

Drafts, Next steps