Gene Regulation in progress
Translation: RNA -> Protein in progress
R affy package in progress
Gene Chip etc.
2.37 10-9m diameter,
3.32 10-9m pitch,
10 bases per turn.
0.8152 10-9m per base along spiral.)
C-C 154 10-12m
C-O 143 10-12m
P-O 158 10-12m
Total 0.91 10-9m
Unwound (flattened) DNA spiral.
Note negative charge per phosphate group.
4 DNA bases
Purines - two rings. Adenine, Guanine.
Pyrimidines - one ring. Thymine, Cytosine.
Deoxyribose attached via Nitrogen to 5-ring in Purines (A and G).
4 RNA bases
The big differences are that RNA is not double sided, RNAs have the
nitrogen base uracil in place of thymine and the five carbon sugar in
the nucleotides is ribose, not deoxyribose.
Uracil is like thymine - but without the methyl group (CH3-)at the top.
Differences between Ribose-Uracil (RNA) and Deoxyribos-Thymine (DNA)
Two hydrogen bonding patterns in DNA
Both C-G and A-T are Purines...Pyrimidines,
ie one big--one small pairings.
In both cases Hydrogen bonds either N...N centre of 6-ring group or
or N...O in Ammonia and Oxygen
side elements. Never to 5-ring.
Note methyl group of thymine not directly part of hydrogen bonds.
C-G Stronger 3 hydrogen Guanine---Cytosine
A-T weaker 2 hydrogen Adenine=Thymine
Deoxyribose is derived from the pentose sugar ribose by the replacement of the
hydroxyl group at the 2 position with hydrogen.
Deoxyribose attached at Carbon adjacent to Oxygen in 5-ring to one of
Deoxyribose attached at OH (minus H) opposite base
via phosphate to next Deoxyribose
and via OH (minus H) on non-ring Carbon to previous Deoxyribose.
DNA 3' and 5' ends
3' beginning of synthesis.
Oligonucleotides are short sequences of nucleotides (RNA or DNA).
Single strand of RNA produced by RNA polymerase
working along one strand of DNA.
Ribonucleic acid (RNA)
It appears the RNA backbone is also negatively charged,
There are also numerous modified bases found in RNA that serve many
There are nearly 100 other naturally occurring modified bases, many of which are not fully understood.
RNA is less stable than DNA because it is more prone to hydrolysis.
Hybridisation of RNA to DNA
Despite both molecules being negatively charged,
the hydrogen bonds between complementary bases are sufficient to bind
RNA to DNA in a spiral structure like the DNA double helix.
NB: DNA fragments contain 25 bases, not 6.
Typically dyes attached to one in four of Uracil and one in four Cytosine,
not to end of RNA.
25 bases 20.38 10-9m (un wound)
plus (linker 1.54 10-9m)
total 22 10-9m.
For Aminosilanised glass,
suggests area 16.8 10-18m.
Lemeshko says width 10-9m,
length 16.8 10-9m
(allows for bonds not being at 180 degrees.)
(Held, 2006) suggests average distance between probes is
I.e. about same as diameter of double strand.
(Other references suggest 4 10-9m.)
NB on chip strands will not form double helix since they are
identical, rather than complementary.
Also (Held, 2006) suggests (because there is no room?)
that probes do not form hair pins or other folds.
Non gene chip hybridisation
Flexible polymer attaching active DNA strand to surface of chip (substrate).
May be ethylene glycol oligomers (OEG)
(eg triethyleneglycol phosphate)
or many thymine bases (eg T10,
etc. (Lots of possible ways of doing this?) (Pirrung, p1278).
Typically about 10 carbons long.
Ten times 154 10-12 = 1.54nm.
For hybridisation assays where the linker is attached does not matter.
For other, cases the linker must be attached to 5' end (or must be internal??)
Affymetrix normally(?, cf p1286) 3' end attached to chip.
Oligomer a chemical made of a finite number of monomers.
NB a few. In contrast a polymer consists of many (infinite)
Steps of the Expression Array
- the total mature RNA is isolated from the cell being studied. This RNA has already been
processed (removal of the noncoding introns and splicing together of the coding exon) as well
as the addition of a poly-A tail
- the RNA is turned into a double stranded DNA copy known as a cDNA. This is done
through reverse transcription. This is done because RNA itself is not a very stable molecule and
the cDNA is a way to store the RNA for a much longer period of time
- when it comes time to run the array, the cDNA is allowed to go through in vitro
transcription back to RNA (now known as cRNA), but this RNA is labelled with Biotin. This is
done by having all the uracil bases tagged with the Biotin. So, anytime a Uracil is added to the
RNA chain during the transcription, a biotin molecule is also added.
- This labelled cRNA is then randomly fragmented in to pieces anywhere from 30 to 400 base pairs
in length (there is enough Biotin to make sure each RNA fragment has some biotin found on it).
- The fragmented, Biotin-labelled cRNA is then added to the array
- Anywhere on the array where a RNA fragment and a probe are complimentary, the RNA sticks
to the probes in the feature (remember there are millions of identical probes in each feature).
- The array is then washed to remove any RNA that is not stuck to an array (i.e., no match was
made) and then stained with the fluorescent molecule that sticks to Biotin
- the entire array is scanned with a laser and the information is kept in a computer for
quantitative analysis of what genes were expressed and at what approximate level.
Calibrating Gene Chips HGU95 HGU133
Apparently used by Affymetrix in spike-in
fractional factorial experiments
with 15 concentration levels for 14 genes (16 probes per gene?).
Apparently slang for measuring against a known concentration.
What is ADI
The Affymetrix Genechip algorithm computes the hybridization signal,
termed average difference intensity (ADI), for each probe set.
ADI = PM - MM.
For each DNA sequence there is a PM and MM probe.
The mismatch probe is identical to the perfect match probe except for
the center DNA base (ie 13) is the complementary base to that on the
MM is intended to measure a background noise level,
whist PM is supposed to give the true signal.
Gene Chip spots
Probe site (also called features).
Distance between probes known as spacing or pitch.
What are problem probes?
Spurious probes are regions (spots)
on DNA chips which
consistently give incorrect fluorescence signals.
Since their signal have systematic noise
(rather than random noise),
it is difficult to impossible to infer true gene expression values
What is systematic noise?
Non random noise.
That is we cannot mitigate the noise by taking many independent
Since even averaging lots of readings will only reduce the random
What is Lag Phase?
- A state of apparent inactivity preceding a response; called also a latent phase.
- The initial growth phase, during which cell number remains relatively constant, prior to rapid growth.
- The first of five growth phases of most batch-propagated cell suspension cultures, being the phase in which inoculated cells in fresh medium adapt to the new environment and prepare to divide. See growth phases.
What is Fold Change?
FC does not appear to be anything more than a ratio between two
As in, if A =10 and B=20 then there is a two fold change between A and B.
Fold change is often expressed in powers of 2.
Biotin, also known as vitamin H or B7 and
NB: there are no fluorescent labels during the hybridisation step in
Instead fluorescent dye added afterwards and binds to biotin.
What is streptadivin?
A spalling mistake for
Streptavidin binds to biotin.
What is biotin-pseudouridine?
in RNA Uridine is combination of nucleobase Uracil
(see fig Uracil)
pseudouridine is from Uridine.
pseudouridine does not occur in messenger RNA.
What is IVT?
IVT = In vitro transcription.
In vitro transcription is used to convert from cDNA to
biotin labeled RNA (see top right of
IVT Invitro-Transcription Biotin Labeling kit replaces ENZO kit.
Photolithography steps used to add one base to genechip
A. An gene sequence is represented by 20 subsequences of the gene, each of length 25 base pairs (oligonucleotides).
These are subsequences that are a perfect match (PM) to the subsequences of the gene. Another 20 subsequences with the
same bases as the PMs, except for one mismatch (MM) at the central
base (arrow), is used.
B. Depicted is the light-directed
process of synthesising the oligonucleotides on the chip (array).
C. The schematics of the light, mask, and array in the
oligonucleotide synthesis process. Adopted from Lipshultz et al. (1999).
- Derivatized glass substrate.
- Coat underlying, photoresist.
- Expose photoresist.
- Develop photoresist, transfer image, detritylate.
- Strip photoresist and underlay, couple next DMT-nucleotide
Photolithographic DNA synthesis with a bilayer resist system
Copyright (1996) National
Academy of Sciences, USA).
Pirrung (p1285) suggest yield per step in the region of 85-95%.
But this suggests yield for 25 steps of 2-28%.
Square grid pattern
on substrate (typically glass) with different surface tension so
that drops of active ingredients on the chip (glass) do not mix.
Gene Regulation Networks
Protein-protein interactions conserved by evolution between species.
Conserved regulatory networks.
Homologous sequences. Orthologs and Paralogs are two types of
homologous sequences. Orthology describes genes in different species
that derive from a common ancestor. Orthologous genes may or may not
have the same function. Paralogy describes homologous genes within a
single species that diverged by gene duplication.
Free Energy G
ΔG = ΔH -TΔS
H = enthalpy
stringency Reaction conditions - notably temperature, salt
concentration(s) and pH - that dictate the annealing of
single-stranded DNA/DNA, DNA/RNA and RNA/RNA hybrids. At high
stringency, duplexes form only between strands with perfect one-to-one
complementarity; lower stringency allows annealing between strands
with some degree of mismatch between bases.
S = entropy
T = temperature (Kelvin).
High stringency prevents:
- Binding of non-complementary strands
- Self hybridisation -- hairpin formation
- Disassociation of strands
Stringency is affected by
Store all reagents in a -20C freezer that does not automatically
Stringency In Microarray Hybridisation
High stringency is obtained by:
Low stringency is obtained by:
- Low salt or buffer concentration
- High temperature
- Lowering the temperature of hybridisation
- Increasing salt concentration [to a point]
This is different then PCR,
because increasing salt concentration increases stringency.
This is because of the enzyme activity of taq polymerase and
The Staining Chemistry for Affymetrix Genechip
Stain is applied after RNA is stuck to cDNA on the chip.
Large Fluorescent molecules (Phycoerythein) attach to biotin.
Causes chip to glow yellow (575nm) when lit by blue laser (488nm).
Affymetrix protocol will be slightly different
The amount of biotin incorporated into
cRNA may differ depending on the tissue
Typically about 4-8 per 100 bases.
GeneChip Eukaryotic Labelling Assays for Expression Analysis
Scanner 16 bit resolution
Actual length of Probes
Glazer et al. (2006) says
The average stepwise synthesis yield is 92-94%.
The density of surface sites for the initiation of photolithographic
probe synthesis is
42-54 1016 molecules m-2.
(corresponding to 1.54 to 1.36 nanometers).
Measuring GeneChip Intensity
Light is converted into electric current using a detector (PMT).
focused emission source (solid red lines)
out-of-focus (over) emission source (dotted red lines)
out-of-focus (under) emission source (dashed red lines).
In both out-of-focus sources, most of the light
is deflected and does not enter the pinhole and thus does not reach
What is DMSO?
Dimethyl sulfoxide (DMSO) (CH3)2SO.
What is ROC?
Receiver Operating Characteristics
What is HPLC
High Performance Liquid Chromatography
What are calls?
A call is slang to say weather a gene is present or not.
Used by Affy inaddition to the gene's intensity (continuous)
to suggest their chip has reliably detected the expression of the
I.e. difference between perfect match and miss match probes and their
values are such that affy software is reasonably confident that the
gene is being detected.
The user can specify a thresholds (alpha1=0.05, alpha2=0.065)
for what the s/w means by "reasonably confident".
if p-value < alpha1 call = "P"
else if alpha1 <= p-value < alpha2 call = "M"
else if alpha2 <= p-value call = "A"
What is NLP?
Negative log10 p-value.
I.e. base10 logarithm of 1 divided by a probability value.
Reverse transcription is an experimental procedure to synthesise a DNA
strand complementary to a mRNA template, namely cDNA.
Deoxyribo nucleoside triphosphate; denotes any of dUTP, dTTP, dATP,
or dGTP; molecular building blocks for making DNAs in RT, PCR, or
in vitro replication; dNTPs in solution, not incorporated into the
nucleic acid strand yet, as molecules with three phosphates provide the
necessary energy for cDNA synthesis.
Used for synthesis of cRNA; see above description for dNTP
A short, single strand RNA or DNA that can initiate chain growth
from a template
Primer with sequence TTTT . . . used to initiate cDNA synthesise in
reverse transcription (RT).
A dendrimer is a regularly branched molecule
What is an EST?
Expressed sequence tag
A list of bases.
What are endogenous genes?
Genes from the organisms own genome.
Endogenous: undergoing development within;
This is the opposite of exogenous.
Exogenous: developing outside.
genetically modified, GM.
Production of messenger RNA transcript (precursor to Protein) where
two (or more?) parts of the transcript which come from different
transcripts are spliced together end to end.
exons are joined together and intermediate introns removed from a
single pre-mRNA transcript.
The e-value is the probability (p-value) times the sample size (N),
giving the expected number of observations in N trials.
for room temperature water.
About 0.3 10-9meters for molar NaCl solution
What is R?
What is a heatmap?
Slang for a picture of a data
(eg correlations between genes)
in which the data value is represented by a
Stock prices, and the colour of the cells,
Green means the stock price is up.
Red means it's down.
The deeper the colour, the bigger the move.
What is overexpression?
Really does not seem to mean any more than "more expressed".
Eg in liver cells gene X has more expression products
than in skin tissue.
So X is overexpressed in the liver.
What is Nascent RNA?
Nascent RNA is newly formed RNA.
Ie fresh from transcription.
Nascent RNA is neither double stranded nor bound to proteins
and as such it is highly reactive.
Cover Nature insight 11 July 2002
Electron micrograph shows nascent pre-rRNA attached to its DNA template from a lysed yeast cell.
(Image courtesy of Y. Osheim, K. Wehner, A. Beyer and S. Baserga)
Nascent RNA (wiggly black line),
about to be freed from
DNA (pair of straight lines)
PTRF Binds the Polymerase I Transcription Complex/Nascent Pre rRNA Complex paused at the TTF-I:Sal Box
Dissociation of paused ternary complexes requires the Polymerase I-transcript release factor (PTRF) a leucine zipper protein. PTRF is capable of dissociating ternary Pol I transcription complexes, interacting with both TTF-I and Pol I to mediate the release of both Pol I and nascent transcripts from the template.
What is RNAP II?
RNA polymerase II.
Also known as Pol II.
CTD stands for the C-terminal domain.
Proteins are formed linearly from the N-terminal end to their
RNA polymerase II's C-terminal region
varies between different species.
It is composed on many (50 odd) repeats of the seven
amino acids pattern,
YSPTSPS (in that sequence).
Various of the amino acid residues in the CTD
have phosphate groups dynamically attached and removed
as RNAP II moves along the DNA
This is important in controlling how fast RNA polymerase moves.
together with, at the same time, to accompany.
is a small molecule which reversibly binds with a larger one
and thereby changes the larger one in some way.
What is an R-Loop
"An R loop is a structure in which an RNA molecule is
partially or completely hybridised with one strand of a
double-stranded DNA, leaving the other strand unpaired."
Single RNA strand in red.
DNA in green.
DNA-RNA base pair links (hybridised, blue vertical bars)
are enzymes which act on the topology of the DNA double helix.
Specifically they cut one (Topoisomerase I, Topo I)
of the strands allowing the helix to unwind.
Once the tension is removed Topoisomerase relinks the DNA strand.
(Topoisomerase II does the same, but cuts and reforms both DNA strands).
A genetic change which means gene XXX is no longer active.
Proteins are polypeptides.
There does not seem to be a rigorous distinction between
a polypeptide and a protein.
A ribozyme is a ribonucleic acid enzyme.
Ie a bit of RNA that acts as an enzyme
by catalysing a reaction.
An RNA structure with three hairpin loops.
Pictures of its secondary structure vaguely resemble
the head of a hammer.
What is RNP?
I.e. proteins sticking to newly formed RNA as it is
transcribed from DNA.
What is mRNP?
In Eukaryotes mRNA is bound to proteins in the nucleus to form
messenger ribonucleoprotein complex.
Small nuclear RNP.
What is snRNA
Small nuclear RNA
are usually about 150 bases long.
DNA plaindromes with stuff in the middle.
Inverted repeats define the boundaries in transposons.
Give rise to RNA hairpins?
pH at which a molecule carries no charge.
What is MIAME?
MIAME is short for the
"Minimal Information to Annotate a Microarray Experiment".
18 May 2008