RNA Folding
Exon Arrays
Gene Regulation in progress
Translation: RNA -> Protein in progress
R affy package in progress
Stalteri Notes

Gene Chip etc.

DNA Helix (Watson-Crick B)

Double Helix, 2.37 10-9m diameter, 3.32 10-9m pitch, 10 bases per turn. (ie 0.8152 10-9m per base along spiral.) DNA double helix DNA base pairings AT and GC Looking along DNA double helix
bond lengths C-C 154 10-12m C-O 143 10-12m P-O 158 10-12m Total 0.91 10-9m phosphate-deoxyribose backbone
Unwound (flattened) DNA spiral. Note negative charge per phosphate group.

4 DNA bases

Purines - two rings. Adenine, Guanine. Pyrimidines - one ring. Thymine, Cytosine.

Deoxyribose attached via Nitrogen to 5-ring in Purines (A and G).

4 RNA bases

The big differences are that RNA is not double sided, RNAs have the nitrogen base uracil in place of thymine and the five carbon sugar in the nucleotides is ribose, not deoxyribose. Uracil is like thymine - but without the methyl group (CH3-)at the top.
Uracil+Ribose v. thymine+deoxyribose
Differences between Ribose-Uracil (RNA) and Deoxyribos-Thymine (DNA)

Two hydrogen bonding patterns in DNA

Both C-G and A-T are Purines...Pyrimidines, ie one big--one small pairings. In both cases Hydrogen bonds either N...N centre of 6-ring group or or N...O in Ammonia and Oxygen side elements. Never to 5-ring. Note methyl group of thymine not directly part of hydrogen bonds.

C-G Stronger 3 hydrogen Guanine---Cytosine

A-T weaker 2 hydrogen Adenine=Thymine



Deoxyribose is derived from the pentose sugar ribose by the replacement of the hydroxyl group at the 2 position with hydrogen. wikipedia Ribose

DNA Backbone

Deoxyribose attached at Carbon adjacent to Oxygen in 5-ring to one of the bases. Deoxyribose attached at OH (minus H) opposite base via phosphate to next Deoxyribose and via OH (minus H) on non-ring Carbon to previous Deoxyribose.

DNA 3' and 5' ends

3' beginning of synthesis. See


Oligonucleotides are short sequences of nucleotides (RNA or DNA). wikipedia


Single strand of RNA produced by RNA polymerase working along one strand of DNA. link.

Ribonucleic acid (RNA)

It appears the RNA backbone is also negatively charged, like DNA. There are also numerous modified bases found in RNA that serve many different roles. There are nearly 100 other naturally occurring modified bases, many of which are not fully understood.
RNA is less stable than DNA because it is more prone to hydrolysis.

Hybridisation of RNA to DNA

Despite both molecules being negatively charged, the hydrogen bonds between complementary bases are sufficient to bind RNA to DNA in a spiral structure like the DNA double helix. RNA fragment with fluorescent tags from sample to be tested.
RNA fragment hybridises with DNA on GeneChip
NB: DNA fragments contain 25 bases, not 6. Typically dyes attached to one in four of Uracil and one in four Cytosine, not to end of RNA.

Probe length 25 bases 20.38 10-9m (un wound) plus (linker 1.54 10-9m) total 22 10-9m.
For Aminosilanised glass, (Lemeshko,2001) suggests area 16.8 10-18m. Lemeshko says width 10-9m, length 16.8 10-9m (allows for bonds not being at 180 degrees.)

(Held, 2006) suggests average distance between probes is 2.5 10-9m. I.e. about same as diameter of double strand. (Other references suggest 4 10-9m.) NB on chip strands will not form double helix since they are identical, rather than complementary. Also (Held, 2006) suggests (because there is no room?) that probes do not form hair pins or other folds.

Non gene chip hybridisation


Flexible polymer attaching active DNA strand to surface of chip (substrate). May be ethylene glycol oligomers (OEG) (eg triethyleneglycol phosphate) or many thymine bases (eg T10, etc. (Lots of possible ways of doing this?) (Pirrung, p1278). Typically about 10 carbons long. Ten times 154 10-12 = 1.54nm.
For hybridisation assays where the linker is attached does not matter. For other, cases the linker must be attached to 5' end (or must be internal??) (p1280, p1284). Affymetrix normally(?, cf p1286) 3' end attached to chip.


Oligomer a chemical made of a finite number of monomers. NB a few. In contrast a polymer consists of many (infinite) monomers.

Steps of the Expression Array

Calibrating Gene Chips HGU95 HGU133

Latin Squares

Apparently used by Affymetrix in spike-in fractional factorial experiments with 15 concentration levels for 14 genes (16 probes per gene?).

Spike in

Apparently slang for measuring against a known concentration.

What is ADI

The Affymetrix Genechip algorithm computes the hybridization signal, termed average difference intensity (ADI), for each probe set. ADI = PM - MM.


Perfect match


Missmatch probe. For each DNA sequence there is a PM and MM probe. The mismatch probe is identical to the perfect match probe except for the center DNA base (ie 13) is the complementary base to that on the PM. MM is intended to measure a background noise level, whist PM is supposed to give the true signal.

Gene Chip spots

Probe site (also called features).
Distance between probes known as spacing or pitch.

What are problem probes?

Spurious probes are regions (spots) on DNA chips which consistently give incorrect fluorescence signals. Since their signal have systematic noise (rather than random noise), it is difficult to impossible to infer true gene expression values from them.

What is systematic noise?

Non random noise. That is we cannot mitigate the noise by taking many independent measurements. Since even averaging lots of readings will only reduce the random noise component.

What is Lag Phase?


What is Fold Change?

FC does not appear to be anything more than a ratio between two measurements. As in, if A =10 and B=20 then there is a two fold change between A and B. Fold change is often expressed in powers of 2.


Biotin, also known as vitamin H or B7 and C10H16N2O3S. biotin stick model vitamin h NB: there are no fluorescent labels during the hybridisation step in affymetrics experiments. Instead fluorescent dye added afterwards and binds to biotin.

What is streptadivin?

A spalling mistake for Streptavidin.

The protein Streptavidin binds to biotin. Streptavidin 4 biotin complexes Streptavidin-biotin-complex

What is biotin-pseudouridine?

in RNA Uridine is combination of nucleobase Uracil (see fig Uracil) and ribose. pseudouridine is from Uridine. uridine to pseudourine pseudouridine does not occur in messenger RNA.

What is IVT?

IVT = In vitro transcription.
In vitro transcription is used to convert from cDNA to biotin labeled RNA (see top right of fig steps)

Affy note: IVT Invitro-Transcription Biotin Labeling kit replaces ENZO kit.

Photolithography steps used to add one base to genechip

A chosing which parts of expressed RNA transcript probes should bind
A. An gene sequence is represented by 20 subsequences of the gene, each of length 25 base pairs (oligonucleotides). These are subsequences that are a perfect match (PM) to the subsequences of the gene. Another 20 subsequences with the same bases as the PMs, except for one mismatch (MM) at the central base (arrow), is used.
B. Depicted is the light-directed process of synthesising the oligonucleotides on the chip (array).
C. The schematics of the light, mask, and array in the oligonucleotide synthesis process. Adopted from Lipshultz et al. (1999).
Derivatized glass substrate.
Coat underlaying, photoresist.
Expose photoresist.
Develop photoresist, transfer image, detritylate.
Strip photoresist and  underlaye, couple next DMT-nucleotide
  • Derivatized glass substrate.
  • Coat underlying, photoresist.
  • Expose photoresist.
  • Develop photoresist, transfer image, detritylate.
  • Strip photoresist and underlay, couple next DMT-nucleotide

Photolithographic DNA synthesis with a bilayer resist system Copyright (1996) National Academy of Sciences, USA).

Pirrung (p1285) suggest yield per step in the region of 85-95%. But this suggests yield for 25 steps of 2-28%.


Square grid pattern on substrate (typically glass) with different surface tension so that drops of active ingredients on the chip (glass) do not mix.

Gene Regulation Networks


Protein-protein interactions conserved by evolution between species.


Conserved regulatory networks.


Homologous sequences. Orthologs and Paralogs are two types of homologous sequences. Orthology describes genes in different species that derive from a common ancestor. Orthologous genes may or may not have the same function. Paralogy describes homologous genes within a single species that diverged by gene duplication. homologs orthologs and paralogs

Free Energy G


H = enthalpy
S = entropy
T = temperature (Kelvin).

Stacking Energies


stringency Reaction conditions - notably temperature, salt concentration(s) and pH - that dictate the annealing of single-stranded DNA/DNA, DNA/RNA and RNA/RNA hybrids. At high stringency, duplexes form only between strands with perfect one-to-one complementarity; lower stringency allows annealing between strands with some degree of mismatch between bases.

High stringency prevents:

Stringency is affected by


Store all reagents in a -20C freezer that does not automatically defrost.

Stringency In Microarray Hybridisation

High stringency is obtained by: Low stringency is obtained by:

The Staining Chemistry for Affymetrix Genechip

Stain is applied after RNA is stuck to cDNA on the chip. Large Fluorescent molecules (Phycoerythein) attach to biotin. Causes chip to glow yellow (575nm) when lit by blue laser (488nm). Biotin SAPE BAP Laser 488->575nm
Affymetrix protocol will be slightly different

The amount of biotin incorporated into cRNA may differ depending on the tissue type. Typically about 4-8 per 100 bases.

Affy expression kit

GeneChip Eukaryotic one-cycle target Labeling GeneChip Eukaryotic two cycle Labeling
GeneChip Eukaryotic Labeling Assays for Expression Analysis

Figure 2.1 GeneChip Eukaryotic Labelling Assays for Expression Analysis

Scanner 16 bit resolution [affy].

Actual length of Probes

Glazer et al. (2006) says The average stepwise synthesis yield is 92-94%. The density of surface sites for the initiation of photolithographic probe synthesis is 42-54 1016 molecules m-2. (corresponding to 1.54 to 1.36 nanometers).
glazer.gnu Affymetrix probe length and separation assuming constant yeild per photolithographic step

Measuring GeneChip Intensity

Figure 5
Light is converted into electric current using a detector (PMT). Given focused emission source (solid red lines) out-of-focus (over) emission source (dotted red lines) out-of-focus (under) emission source (dashed red lines). In both out-of-focus sources, most of the light is deflected and does not enter the pinhole and thus does not reach the detector.



What is DMSO?

Dimethyl sulfoxide (DMSO) (CH3)2SO.

What is ROC?

Receiver Operating Characteristics

What is HPLC

High Performance Liquid Chromatography eg.

What are calls?

A call is slang to say weather a gene is present or not. Used by Affy inaddition to the gene's intensity (continuous) to suggest their chip has reliably detected the expression of the gene. I.e. difference between perfect match and miss match probes and their values are such that affy software is reasonably confident that the gene is being detected. The user can specify a thresholds (alpha1=0.05, alpha2=0.065) for what the s/w means by "reasonably confident".
if     p-value <  alpha1           call = "P" 
else if alpha1 <= p-value < alpha2 call = "M" 
else if alpha2 <= p-value          call = "A" 

What is NLP?

Negative log10 p-value. I.e. base10 logarithm of 1 divided by a probability value.


Reverse transcription is an experimental procedure to synthesise a DNA strand complementary to a mRNA template, namely cDNA.


Deoxyribo nucleoside triphosphate; denotes any of dUTP, dTTP, dATP, or dGTP; molecular building blocks for making DNAs in RT, PCR, or in vitro replication; dNTPs in solution, not incorporated into the nucleic acid strand yet, as molecules with three phosphates provide the necessary energy for cDNA synthesis.


Used for synthesis of cRNA; see above description for dNTP


A short, single strand RNA or DNA that can initiate chain growth from a template


Primer with sequence TTTT . . . used to initiate cDNA synthesise in reverse transcription (RT).


A dendrimer is a regularly branched molecule

design of experiments DOE

What is an EST?

Expressed sequence tag A list of bases.

What are endogenous genes?

Genes from the organisms own genome.
Endogenous: undergoing development within; living inside. This is the opposite of exogenous.
Exogenous: developing outside. Cf. transgenic, genetically modified, GM.

What is Trans-splicing?

Production of messenger RNA transcript (precursor to Protein) where two (or more?) parts of the transcript which come from different transcripts are spliced together end to end.
Normally (i.e. (cis-)splicing) exons are joined together and intermediate introns removed from a single pre-mRNA transcript.

What an Alu?

What is a MAST E-value?

Expectation value. The e-value is the probability (p-value) times the sample size (N), giving the expected number of observations in N trials.

Bjerrum length

0.7 10-9meters for room temperature water.

Debye Length

About 0.3 10-9meters for molar NaCl solution (Held, 2006).

What is R?

See http://www.r-project.org

What is a heatmap?

Slang for a picture of a data (eg correlations between genes) in which the data value is represented by a colour.

Stock prices, and the colour of the cells,
Green means the stock price is up.
Red means it's down.
The deeper the colour, the bigger the move.

What is overexpression?

Really does not seem to mean any more than "more expressed". Eg in liver cells gene X has more expression products than in skin tissue. So X is overexpressed in the liver.

What is Nascent RNA?

Nascent RNA is newly formed RNA. Ie fresh from transcription. Nascent RNA is neither double stranded nor bound to proteins and as such it is highly reactive.

nascent pre-rRNA
Cover Nature insight 11 July 2002
Electron micrograph shows nascent pre-rRNA attached to its DNA template from a lysed yeast cell. (Image courtesy of Y. Osheim, K. Wehner, A. Beyer and S. Baserga)

Nascent RNA (wiggly black line), about to be freed from DNA (pair of straight lines) by PTRF.
PTRF Binds the Polymerase I Transcription Complex/Nascent Pre rRNA Complex paused at the TTF-I:Sal Box
PTRF Binds the Polymerase I Transcription Complex/Nascent Pre rRNA Complex paused at the TTF-I:Sal Box
Dissociation of paused ternary complexes requires the Polymerase I-transcript release factor (PTRF) a leucine zipper protein. PTRF is capable of dissociating ternary Pol I transcription complexes, interacting with both TTF-I and Pol I to mediate the release of both Pol I and nascent transcripts from the template.

What is RNAP II?

RNA polymerase II. (wikipedia) Also known as Pol II. cartoon


CTD stands for the C-terminal domain. Proteins are formed linearly from the N-terminal end to their C-terminal end. RNA polymerase II's C-terminal region varies between different species. It is composed on many (50 odd) repeats of the seven amino acids pattern, YSPTSPS (in that sequence). Various of the amino acid residues in the CTD have phosphate groups dynamically attached and removed as RNAP II moves along the DNA transcribing RNA. This is important in controlling how fast RNA polymerase moves.


together with, at the same time, to accompany.



Typically a Ligand is a small molecule which reversibly binds with a larger one and thereby changes the larger one in some way.

What is an R-Loop

"An R loop is a structure in which an RNA molecule is partially or completely hybridised with one strand of a double-stranded DNA, leaving the other strand unpaired." (Li 2006).
Single RNA strand in red. DNA in green. DNA-RNA base pair links (hybridised, blue vertical bars)


Topoisomerase are enzymes which act on the topology of the DNA double helix. Specifically they cut one (Topoisomerase I, Topo I) of the strands allowing the helix to unwind. Once the tension is removed Topoisomerase relinks the DNA strand. (Topoisomerase II does the same, but cuts and reforms both DNA strands).

XXX-Null Mutation

A genetic change which means gene XXX is no longer active.


Proteins are polypeptides. There does not seem to be a rigorous distinction between a polypeptide and a protein.


A ribozyme is a ribonucleic acid enzyme. Ie a bit of RNA that acts as an enzyme by catalysing a reaction.

Hammerhead Ribozyme

An RNA structure with three hairpin loops. Pictures of its secondary structure vaguely resemble the head of a hammer. more

What is RNP?

ribonucleoprotein complexes. I.e. proteins sticking to newly formed RNA as it is transcribed from DNA.

What is mRNP?

In Eukaryotes mRNA is bound to proteins in the nucleus to form messenger ribonucleoprotein complex.

What is snRNP?

Small nuclear RNP.

What is snRNA

Small nuclear RNA are usually about 150 bases long.

What are Inverted repeat IR?

DNA plaindromes with stuff in the middle. Eg:


Inverted repeats define the boundaries in transposons. Give rise to RNA hairpins?

What is an isoelectric point?

pH at which a molecule carries no charge. Also zwitterions

What is MIAME?

MIAME is short for the "Minimal Information to Annotate a Microarray Experiment".

home 18 May 2008