Reflections on a scientific project 10 years on: the winding path to cryptic transcription in yeast
Aug 21, 2013
2665 viewsI can't really believe it is has been 10 years since we published our findings of cryptic intragenic transcription in yeast. In other ways, I can't believe it was only 10 years ago.
One aspect of working in molecular biology is that all discoveries in the end are probably inevitable. Someone will figure it out eventually. There are too many smart or lucky people thinking lots of different ways for this not to be the case. I think it is always a good thing to understand exactly how circumstances might favor discovery, and I marvel at the fact that slight changes in circumstances might have obviated the path to discovery. In the case of our discovery of transcription-dependent chromatin disruption and its subsequent allowance of widespread intragenic transcription from cryptic promoters, there were lots of things where serendipity was on our side. I'm going to be a little bit long winded here, but I hope you will recognize the twists and turns along the path to this discovery. And that even though I am detailing the history of a short, four-figure paper, there were many, many tiny observations that accumulated to allow us to make this discovery (before other people would have, eventually).
As is now known, genomes can be highly transcribed or at least widely transcribed at low levels. This transcription can be inhibited or masked in a number of ways. Surveillance pathways can degrade transcripts, or cryptic promoters can be inhibited by chromatin structure. So lots of possible reasons to not see transcription that may be happening or may be possible under some circumstances. The highly conserved Spt6 protein was first discovered through genetic identification in budding yeast. Alex Bortvin in Fred Winston's lab had shown that Spt6 could interact with histones and promote nucleosome assembly in an in vitro assay. When I started my thesis work with Fred Winston, the goal was to understand the essential function(s) of Spt6 in chromatin and transcription. I think back on it now, if I were more biochemically inclined, the experiments would have focused on mapping histone interactions very closely, determining requirements for in vitro activities, identifying additional physical interacters et cetera. We probably had an idea to do all of these things, so one of the first approaches was to start making in frame deletions in SPT6 to get a handle on different parts of the protein.
As this was cooking along, there was a short detour, which ended up involving cloning the Drosophila melanogaster Spt5 and Spt6 homologs through identification of full-length cDNAs or stitching together cDNAs to construct what would likely be the full-length coding sequences based on homology. This is a quaint notion now, given how genes, gene-models, and massive genomic data are available for so many model and non-model systems. I made fusion proteins, we generated antibodies, we tested them, affinity purified them, with the goal being immunolocalization of Spt5 and Spt6 on salivary gland polytene chromosomes. Salivary gland chromosomes are massively over-replicated, and sister chromatids/homologs align perfectly allowing localization of proteins to different genetic loci to be easily visualized. Essentially we were doing a version of a genome wide ChIP-chip experiment without either of the chips. These experiments were incredibly valuable in telling us that both Spt5 and Spt6 seemed to be widely recruited to transcribed regions with a good correlation to a particular form of phosphorylated Pol II, but again seem so quaint now with high-resolution ChIP-seq technology that could give massively higher resolution and dynamic range to the data. Interestingly, we only barely knew then what the possible differences in Pol II CTD phosphorylation states might relate to (a seminal paper on this from Philip Komarinitsky, Eun-Jung Cho and Steve Buratowski working across the quad from us at Harvard Medical School was published just one issue prior to our Drosophila paper in Genes and Development). The form of Pol II that we saw closely correlating with Spt5 and Spt6 was the form reactive with the H5 monoclonal antibody, which is differentially localized on genes than Pol II reactive with the H14 monoclonal antibody. The H5-reactive Pol II is a subset of Pol II that is widely localized over transcribed regions, gradually increasing over genes in a 5' to 3' manner.
A short note about this Drosophila project, it was always very amusing to me that I just kind of did this project, as I was currently a member of a lab well known for doing sophisticated yeast genetics. I was always grateful to Fred for supporting this project because it was a little bit left field for the lab. However, the project was not so left field that we were alone in thinking about doing it. We soon found out after our paper was accepted, a very nice study from Erik Andrulis, John Lis and colleagues would be appearing in the very same issue of Genes and Development as ours! I want to brag that our staining looked just ever so slightly nicer, but the reality was that John Lis was the one that provided us with the detailed protocol to allow a novice like myself even to do the salivary gland squashes and staining. I will add that my collaborator Ting Wu would tell me about the perfect chromosome spread with all chromosome arms nicely separated and no overlap, just like in Ashburner (Drosophila bible). I can say that I think my thumb still hurts now 13 years after the fact from the chromosome squashes and I never saw anything close to a "perfect" one. To keep with the theme of this piece- lots of luck here- luck that the antibodies we made worked. Our backup was that we did two different antigens for each protein and a couple of animals for each. Terry Orr-Weaver had suggested to Fred that we try guinea pig for our antibodies, and it worked out wonderfully. We were lucky in that we didn't get scooped – the Lis lab's antibody backup was that they tried multiple species for multiple epitopes, so we were on thinner ice than they were, even though we also had hedged our bets.
After our Drosophila excursion, with my structure function analysis on yeast Spt6 slowly simmering on the back burner, we could now approach the project in a new light – one where we had the idea that Spt6 could be working at a lot of genes, conceivably at all the Pol II transcribed genes in yeast, if the analogy to Drosophila would hold. What was Spt6 doing? One idea was to characterize how Spt6 functioned at a particular target, and we settled on GAL10 as a system for that type of approach. Another idea was to see if we could identify genes that were strongly dependent on Spt6 by finding some strongly affected in an spt6 mutant, though I remember that Fred wanted us to find a nice strong target genetically. One of the mutants we settled on was what is now called spt6-1004, encoding a protein with an internal deletion of one of the first recognized domains in Spt6, the helix-hairpin-helix domain. This deletion confers a strong Ts- phenotype and also renders Spt6 unstable, so it is degraded at the restrictive temperature. Now as it turns out, any number of spt6 alleles might have worked for the analysis in the end, but we chose this one because we surmised that it was closest to a null at the restrictive temperature. Anyhow, I still wanted to do a microarray because I just couldn't conceive of finding a strong target genetically without it taking a long time. What seems so simple now was just not the case back then. We were only 2-3 years on from the very first yeast microarrays. Universities were building their own arrayers and amplifying all the probes with PCR. Affymetrix chips were available I think, but very expensive, and tiling arrays were not quite there yet, although oligo arrays were becoming available. Somebody was building an arrayer for spotting in a room upstairs from us but everything was just a mess. As it happens, my father is a scientist at the University of Utah and his lab had access to a few yeast microarrays. I brought some spt6-1004 and WT strains home with me for the holidays, grew up the yeast in my dad's lab, worked at my mom's bench (she works with my dad in his lab). My mom sent me the array scans and the spreadsheets with the data when we got it and she had processed it. I'm not even joking, my mom and dad helped me do some homework that ended up being the critical first experiment for identifying cryptic transcripts.
So what did we get on our arrays? A bunch of genes that went up and a bunch that went down. I tried to identify any patterns for genes up or down. The yeast genome database had some information on genes, but this was before there were lots of searchable experimental databases or repositories of massive amounts of expression data. There were a few datasets out there to compare to, but not a lot. We had no "spike in" controls, so if all genes were down on average 4-fold in spt6-1004, it would have been normalized out, as in a competitive hybridization experiment the idea was always to normalize total fluorescence between both channels, which of course removes global effects. So, what I did was go through up/down genes by hand, and what I noticed was that a lot of genes that went up appeared to be stress-inducible, but a bunch were not obviously stress-inducible, and this set included a few RAD genes. Was this a coincidence? I wondered maybe was some sort of DNA damage being induced in the spt6 mutant causing RAD genes to be induced? I obtained a bunch of ORF PCR fragments to use as probes to verify expression changes by Northern blotting. These ORF probes were the same DNAs that had been spotted on the microarrays we used. The fact that these probes were entire ORFs was likely beyond critical for us to discover cryptic transcripts. I labeled a number of probes for candidate genes, I can't quite remember how many now, maybe 4-8, and prepared a bunch of Northern blots in parallel. I decided to probe with the loading control at the same time to get a quick answer to quantitate, picking candidate genes that shouldn't have transcripts near the loading control. When I developed the film, however, I had no clue what I was looking at. I saw shorter, relatively distinct bands for a number of the candidate genes, but only in the spt6-1004 39˚C lane, but the blots were hard to interpret because I also had SPT15 signal smack in the middle. However for some of the genes, the full-length product appeared to go down, contrary to expectations based on the array, but the shorter products appearing could have made up for loss of the full-length RNA. Were the transcripts getting degraded or processed in some odd way? My first model was cryptic 3' end formation and polyadenylation, and this was because we already had evidence at GAL10 that spt6-1004 seemed to promote 3'-end formation there. I also thought that since the transcripts were relatively discrete sizes, this wasn't some weird degradation, though the blots were a mess (see image of these exact blots from my thesis committee report).
Original Northerns from Craig Kaplan's thesis committee meeting report May 10, 2001.
The thing is, these blots had multiple repeat RNAs on them, so I already knew the craziness was reproducible First thing to do was redo all the Northerns on new RNA, and without the simultaneous probing for SPT15, to rule out sample handling or other weirdness. Well, the new RNA and the old RNA probed side by side (I think this is what we did next) looked identical, so the odds of degradation, unless spt6-1004-dependent, seemed more remote. Remember how I said ORF probes were probably critical? Well, if we had used a fancy new microarray with a couple of oligos spotted for every yeast gene, we possibly would have missed all of this. If I had designed PCR amplicons of some uniform size as Northern probes, but not covering entire ORFs, we might have missed all of this.
Our microarray and Northern probes were all double-stranded DNA. This meant that all of our short RNA products could have been anti-sense – we really had no idea. I decided to use single-stranded DNA oligo probes to the 5' and 3' ends of a couple of genes, RAD18 and FLO8. Nobody in our lab had used oligos as probes before for Northern blots as far as I knew, so I had no idea at what temperature to do the hybridization, so I just winged it and hoped I wouldn't light up the rRNA through non-specific hybridization. From what I remember, in the first experiment, I DID light up the rRNA with at least one of the oligos, but the signals were still good enough to tell me two things: first, the shorter transcripts were coming from the 3' ends of both genes, and that they were sense transcripts. Now we were in a little bit of crazyland – they were all over the place for a number of genes, but not all genes. It somehow had been easier to comprehend the transcripts as deriving from premature 3'-end formation and not initiation (or some unexplained processing). At this point, we had to start entertaining that the transcripts might not even be Pol II-derived. We did guess they were Pol II-derived because we used poly-A+ RNA for the starting material for the microarrays. I decided we just had to clone the transcripts and get an idea of what was going on. I turned to 5'-RACE, and the protocol we used putatively would require both a 5'-cap and a poly-A tail, so the fact that we were able to clone FLO8 transcripts with internal 5' ends by this method strongly suggested that they were Pol II transcripts. We confirmed putative 5' ends in these regions by primer extension and I remember having to expose the gels for several days to get signal – these were not highly expressed transcripts at all.
We then wanted to know how special this phenomenon was. Spt6 is genetically related to a number of factors, and suppression of cryptic initiation events was reminiscent of two long studied phenotypes in the lab – the Spt phenotype and the Bur phenotype, both of which involved mutants that allowed aberrant transcription. I rounded up a panel of spt and bur mutants, and lo and behold a number of them also showed cryptic transcription for FLO8 and RAD18. One of the mutants we tested was a histone mutation, and together with Spt6's known connections to histones (SPT6 was also originally identified as SSN20, named for Suppressor of Snf2, as it could suppress loss of a chromatin remodeling complex for SUC2 expression – so another chromatin link), we were strongly pushed into a chromatin explanation for what was going on. As an aside, when I saw that an spt16 mutant also showed cryptic transcripts, some data presented by Paul Mason from Kevin Struhl's lab on spt16 were afforded an immediate explanation. Paul had been really pushing forward an analysis of spt16 in Kevin's lab, also across the quad from ours. Paul had observed a 3' accumulation of Pol II density at a few genes in his spt16 mutants. His favored model at this point was that spt16 might be causing a delay in release of Pol II from genes, leading to a pileup of Pol II at gene 3' ends. Our data immediately suggested an alternate explanation – cryptic initiation. Here is a situation where Paul was unlucky- you see, Kevin's lab traditionally did S1 nuclease analysis to measure gene expression using RNA protection of an oligo from S1 nuclease digestion to detect transcript levels. They also had been moving to RT-PCR. These methods would not have immediately made it apparent that short, unexpected transcripts were being generated from these loci. As soon as Paul was aware of our spt6 data, he immediately made the connection and it was a little bit off to the races.
How to determine if there was a chromatin level effect on a cryptic transcript? I thought we would have to show a nucleosome changing its position over the cryptic promoter. We had found a TATA-element in the middle of FLO8 that seemed to be required for the cryptic transcript, so I set about trying to see if there was a nucleosome or nucleosomes in the region that moved in the spt6 mutant. To do this, I turned to micrococcal nuclease digesting and indirect end-labeling. Here, you digest the chromatin partially with MNase, purify DNA, cut with a restriction enzyme and then do a Southern blot with a probe right next to the restriction site. This means you can visualize DNA fragments that have one end at the restriction site and the other end generated by MNase digestion. Particular patterns can suggest protection from cleavage due to positioned nucleosomes in the population of chromatin fragments. This was a pain in the butt to do, partially because the spt6 strain required extra attention to get spheroplasting to work (digestion of the cell wall prior to permeabilization and MNase digestion). The experiment did work, except the spt6 mutant looked similar to the WT for FLO8 in the Southern blot. I thought about this result quite a bit, but I realized this experiment would only work if a nucleosome were changing position or if all the nucleosomes were moving in the population. Nucleosomes give strong patterns in the experiment, but naked DNA gives a very weak pattern. If nucleosomes were randomly removed in the population, this would look like a strong, normal pattern, superimposed on a weak, almost invisible new pattern. I also started thinking about my MNase digestions of bulk chromatin- what I was running on agarose gels to determine if my MNase digestions were working. The spt6 MNase digestions always looked bad. They were clearly always different from the WT, and it was temperature dependent. You couldn't see as pronounced a ladder of DNA fragments, and the bands were fuzzier than WT. I had originally thought it related to spt6 mutant spheroplasting defect- just an artifact of the difficulty in digesting its cell wall. If you overdigest with the enzyme preparation that digests the cell wall, it can affect the quality of the chromatin. I decided that the apparent MNase differences might truly relate to a chromatin defect- one that was global enough to see in a bulk analysis of chromatin. Our earlier Drosophila experiments helped me think about this- we already had a global connection of Spt6 to every gene or most genes, but that connection appeared to relate to transcription. In my other experiments, I already saw transcription-dependent recruitment of Spt6 to GAL genes in yeast (or did I? ZappyLab's Lenny might have something to say along these lines in the future). Then it struck me- if screwing up Spt6 screws up chromatin, could the process that does the chromatin screwing up be transcription, since in our spt6 mutant, Spt6 was degraded, so it wasn't there, and some other process had to be going awry?
How to shut off transcription to test whether transcription were required for screwing up chromatin? Remember, we could only ever indirectly link transcription mediated chromatin defects to cryptic transcription, because if we shut off transcription, we would also shut off cryptic transcription. Could we use the well-known Pol II inhibitor alpha-amanitin? Nope, it doesn't get into yeast, and also we would have to use a ton if it did. We decided on a genetic approach, there is a temperature sensitive allele of gene encoding the large subunit of Pol II, RPB1 called rpb1-1, and we crossed this into our spt6 strain. Here is another massively lucky event. spt6 mutants are synthetically lethal with a lot of mutants that relate to transcription. rpb1-1 is pretty sick on its own as well, and you need to grow it under 27 degrees (like room tempish). If spt6 had been synthetically lethal with rpb1-1 we would have had a lot longer road, and a less elegant one, to test our model. What we saw was that when we shift the spt6-1004 rpb1-1 double mutant to the restrictive temperature, rpb1-1 (normal chromatin) was epistatic to spt6-1004 (messed up chromatin) because we saw normal chromatin. Again, we were quite lucky here- the temp shutoff was a double shutoff- both alleles were being shut off at the same time. Only if rpb1-1 were faster to shutoff transcription (and it was known to be fast) than spt6-1004 was in shutting off Spt6 function would we have been able to get this result. We did a few other experiments to really nail it down. Lisa Laprade in the lab did a critical chromatin IP for H3 showing a loss of H3 over FLO8 where we saw cryptic transcripts but not at a transcriptionally uninduced GAL gene, while doing some key Southerns on MNase digests that I did in my post-doc lab (having graduated before we finished the paper). It is funny because nowadays I can't see any reviewer, let alone a Science reviewer, accepting two loci as good support for a correlative argument, but the reality is I think the MNase experiments were compelling.
Here is what is most odd about this project. There were a lot of little lucky things that just happened to be the case that put us in a good position to observe this phenomenon. For each type of experiment that we needed to do, there was a technique that was doable and that we got to work to answer the questions we were asking. In fact, the largest obstacles we faced were related to our normal experimental techniques being temperamental. Northern blots deciding to start giving really high background just at the end when we needed to probe for lowly expressed genes. Likely something related to exactly how wet the blot was and exactly how it was introduced to pre-hybridization. A 6-month nightmare when we basically knew the basic framework of the story. As I mentioned above, I started my post-doc before we got the paper sent out, so I finished up some work in my post-doc lab. I did a few more MNase gels and we ran into another obstacle. I made new MNase digestion buffer and MNase did not digest in it. Almost all materials were the same brand of materials I had used previously to make the buffer, and it just didn't work- the MNase we had wouldn't even digest naked DNA in this buffer! We went into full paranoia mode- Fred's lab Fedexed me some buffer made with components in the lab there to do the last of the experiments and everything worked like a charm after that. I mean we had to endure some reviewer comments about how modest the TBP association was with the cryptic promoter (totally true- it was barely there! Nothing we can do about it- the cryptic promoters are barely transcribed). The fact is that we only saw these intragenic transcripts becase they were in the middle of open reading frames, they used the normal 3' ends of the regular genes, likely stabilizing them and allowing them to appear as somewhat discrete bands. These transcripts are really likely to be all over the place- sense, antisense- and it is now apparent that they are from work in many many labs. I set out to talk about one paper, but I ended up kind of describing my entire thesis, but I wanted to get across the context that allowed us to do this work, and to give an idea just how fine a line it was between success and failure, between scooping, tied, and scooped.
Copyright: © 2013 Craig Kaplan. The above content is licensed under the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
If you find this essay offensive or in violation of your rights, please email to email@example.com