Molecular Biology — Concepts, Formulas & Examples

Molecular biology is the story of how genetic information flows from DNA to RNA to protein. CBSE Class 12 dedicates a major chapter to it. NEET asks three to four questions a year. This is high weightage and high scoring if the central dogma is clear.

At its heart, molecular biology answers one question: how does a cell read the instructions written in DNA and turn them into functional proteins? The answer is a two-step relay — transcription copies the DNA message into RNA, and translation decodes that RNA message into a chain of amino acids. Every gene, every mutation, every genetic disease traces back to this flow.

Core Concepts

Central dogma

\text{DNA} \xrightarrow{\text{Transcription}} \text{RNA} \xrightarrow{\text{Translation}} \text{Protein}

Proposed by Francis Crick (1958). Information flows from nucleic acid to protein, never from protein back to nucleic acid. Reverse transcription (RNA → DNA by reverse transcriptase, in retroviruses like HIV) is a known exception to the DNA → RNA direction, but not to the overall rule.

DNA structure

Watson-Crick model (1953), based on X-ray data from Rosalind Franklin and Maurice Wilkins.

Double helix — two antiparallel polynucleotide chains wound around a common axis
Base pairing — A pairs with T (2 hydrogen bonds), G pairs with C (3 hydrogen bonds)
Dimensions: 10 base pairs per turn, 3.4 nm per turn, 0.34 nm between base pairs, 2 nm diameter
Antiparallel: one strand runs 5’ → 3’, the other 3’ → 5’
Major and minor grooves — proteins (like transcription factors) bind in these grooves to read the sequence

A = T, \quad G = C, \quad \frac{A + G}{T + C} = 1

In any double-stranded DNA, purines equal pyrimidines. This is a direct consequence of base pairing. If %A = 30%, then %T = 30%, and %G = %C = 20%.

DNA replication

Semi-conservative — each new double helix has one parental strand and one newly synthesised strand. Proved by the Meselson-Stahl experiment (1958) using $^{15}$ N-labelled DNA in E. coli.

Key enzymes and their roles:

Enzyme	Function
Helicase	Unwinds the double helix at the replication fork
Topoisomerase	Relieves supercoiling ahead of the fork
Primase	Synthesises a short RNA primer to initiate synthesis
DNA polymerase III	Main enzyme — synthesises new DNA strand (5’ → 3’ only)
DNA polymerase I	Removes RNA primers and replaces with DNA
Ligase	Seals nicks (joins Okazaki fragments)
SSB proteins	Stabilise single-stranded DNA and prevent re-annealing

Leading strand — synthesised continuously in the 5’ → 3’ direction toward the replication fork. Lagging strand — synthesised in short fragments (Okazaki fragments, ~1000-2000 nucleotides) away from the fork, then joined by ligase.

The semi-conservative nature of DNA replication is tested in NEET almost every year. Know the Meselson-Stahl experiment in detail: heavy ( $^{15}$ N) medium → shift to light ( $^{14}$ N) → after one generation, all DNA is hybrid density → after two generations, half hybrid, half light. This rules out conservative and dispersive models.

Transcription

The synthesis of RNA from a DNA template by RNA polymerase.

In prokaryotes: A single RNA polymerase transcribes all types of RNA. It binds to the promoter (a specific DNA sequence upstream of the gene), opens the DNA locally, and synthesises RNA 5’ → 3’ using the template (antisense) strand as a guide. The other strand is called the coding (sense) strand — it has the same sequence as the mRNA (with U instead of T). Termination occurs at a terminator sequence (rho-dependent or rho-independent).

In eukaryotes: Three RNA polymerases — Pol I (rRNA), Pol II (mRNA), Pol III (tRNA and 5S rRNA). Transcription produces pre-mRNA (hnRNA) which must be processed before translation:

5’ capping — a methylated guanine cap is added to the 5’ end (protects from degradation, aids ribosome binding)
3’ polyadenylation — a poly-A tail (~200 adenines) is added to the 3’ end (protects from degradation, aids nuclear export)
Splicing — introns (non-coding sequences) are removed and exons (coding sequences) are joined by the spliceosome

Genetic code

The triplet code that maps mRNA codons to amino acids:

64 codons for 20 amino acids + stop signals
Start codon: AUG (also codes for methionine)
Stop codons: UAA (ochre), UAG (amber), UGA (opal)
Degenerate — multiple codons can code for the same amino acid (e.g., leucine has 6 codons)
Unambiguous — each codon codes for only one amino acid
Universal — the same code is used by almost all organisms (with rare exceptions in mitochondria)
Non-overlapping — each nucleotide belongs to only one codon
Commaless — codons are read continuously without gaps

Translation

The synthesis of protein from mRNA by ribosomes.

The small ribosomal subunit binds to the 5’ cap of mRNA and scans to the first AUG (start codon). The initiator tRNA carrying methionine (fMet in prokaryotes) binds to AUG. The large subunit joins, forming the complete ribosome with A, P and E sites.

A charged tRNA enters the A site (aminoacyl site) with its anticodon matching the mRNA codon. The peptide bond forms between the amino acid at the P site and the one at the A site (catalysed by peptidyl transferase, a ribozyme). The ribosome translocates one codon forward — the tRNA moves from A to P to E (exit). A new charged tRNA enters the A site. This cycle repeats.

When a stop codon (UAA, UAG, UGA) enters the A site, no tRNA can bind. A release factor binds instead, triggering hydrolysis of the peptide from the tRNA. The ribosome dissociates. The polypeptide folds into its functional shape (often with help from chaperone proteins).

Gene regulation — Lac operon

The lac operon in E. coli is the classic example of gene regulation (Jacob and Monod, 1961).

Structure: promoter → operator → structural genes (lacZ, lacY, lacA).

lacZ encodes beta-galactosidase (breaks lactose into glucose + galactose)
lacY encodes permease (imports lactose into the cell)
lacA encodes transacetylase

Regulation:

Without lactose: the repressor protein (product of lacI gene) binds to the operator, blocking RNA polymerase. Genes are OFF.
With lactose: allolactose (an isomer of lactose) binds to the repressor, changing its shape so it can no longer bind the operator. RNA polymerase proceeds. Genes are ON.
This is a negative inducible system — the default is OFF, and the inducer (lactose) turns it ON.

No lactose: Repressor binds operator → genes OFF (negative regulation)

Lactose present: Allolactose binds repressor → repressor released → genes ON (induction)

Human Genome Project (HGP)

Completed in 2003, sequenced all ~3.2 billion base pairs of human DNA. Key findings:

About 20,000-25,000 genes (far fewer than expected)
Less than 2% of DNA codes for proteins — the rest is non-coding (regulatory, structural, repetitive sequences, introns)
99.9% of DNA is identical between any two humans — the 0.1% variation underlies individual differences

Worked Examples

The same codon codes for the same amino acid in bacteria, plants, animals and fungi. This suggests all life shares a common ancestor — the code was established early in evolution and has been conserved. The universality is practically useful: it allows expression of human genes in bacteria (e.g., human insulin gene in E. coli) for pharmaceutical production.

E. coli was grown in $^{15}$ N (heavy) medium until all DNA was heavy. Then shifted to $^{14}$ N (light) medium. After one replication, all DNA showed hybrid density (one heavy strand + one light strand) on CsCl gradient centrifugation. After two replications, DNA was half hybrid, half light. This proved semi-conservative replication — each new helix retains one parental strand.

A protein has 300 amino acids. Each amino acid is coded by one codon (3 nucleotides). So the coding region of the mRNA = 300 × 3 = 900 nucleotides + 1 stop codon (3 nt) = 903 nucleotides minimum. The actual mRNA is longer due to 5’ UTR, 3’ UTR, cap and poly-A tail.

Common Mistakes

Saying DNA polymerase can start synthesis on its own. It cannot — it needs an RNA primer made by primase. DNA polymerase can only add nucleotides to an existing 3’-OH end.

Confusing transcription and translation. Transcription = DNA to RNA (in nucleus). Translation = RNA to protein (on ribosomes in cytoplasm). The names tell you the process: transcription is copying within the same language (nucleic acid), translation is converting to a different language (amino acids).

Writing that the genetic code is ambiguous. It is unambiguous — each codon specifies exactly one amino acid. It is degenerate — multiple codons can specify the same amino acid. These are opposite concepts.

Confusing the template strand and the coding strand. The template strand (antisense, 3’ → 5’) is read by RNA polymerase. The coding strand (sense, 5’ → 3’) has the same sequence as the mRNA (with T instead of U). NEET questions often give one strand and ask you to write the mRNA.

Saying introns are “junk DNA.” Introns are removed during splicing, but they play roles in gene regulation, alternative splicing (one gene can produce multiple proteins), and evolutionary flexibility. They are not functionless.

Exam Weightage and Strategy

Molecular Basis of Inheritance is the highest-weightage chapter in NEET biology — expect 3-4 questions per year. CBSE boards give 6-8 marks. The PYQ clusters: (1) DNA structure and Chargaff’s rules, (2) replication enzymes, (3) transcription differences between pro- and eukaryotes, (4) genetic code properties, (5) lac operon regulation.

Memorise: three start/stop codons, five key replication enzymes, the three post-transcriptional modifications in eukaryotes, and the lac operon mechanism. That triad of facts answers most PYQs. For numerical problems, remember: 1 amino acid = 3 nucleotides = 1 codon.

Practice Questions

Q1. If one strand of DNA has the sequence 5’-ATGCCA-3’, write the mRNA sequence.

The given strand is the coding (sense) strand (5’ → 3’). The template strand is 3’-TACGGT-5’. mRNA is synthesised 5’ → 3’ complementary to the template strand: 5’-AUGCCA-3’ (same as coding strand but with U instead of T).

Q2. Why is the leading strand synthesised continuously but the lagging strand is not?

DNA polymerase can only synthesise in the 5’ → 3’ direction. The leading strand runs 5’ → 3’ toward the replication fork, so synthesis can proceed continuously as the fork opens. The lagging strand runs 3’ → 5’ toward the fork, so polymerase must work away from the fork in short stretches (Okazaki fragments), each requiring a new primer. Ligase then joins the fragments.

Q3. What would happen to the lac operon if the repressor gene (lacI) is mutated to produce a non-functional repressor?

The repressor cannot bind the operator, so the structural genes are constitutively expressed (always ON) regardless of whether lactose is present. Beta-galactosidase, permease and transacetylase are produced even when not needed. This is a loss of regulation, not a loss of gene expression.

Q4. A DNA molecule has 30% adenine. What are the percentages of the other bases?

By Chargaff’s rule: A = T = 30%. Since A + T + G + C = 100%, G + C = 40%. And G = C, so G = C = 20% each. Answer: T = 30%, G = 20%, C = 20%.

FAQs

What is the difference between DNA replication and transcription?

Replication copies the entire DNA molecule to produce two identical double helices (for cell division). Transcription copies only one gene at a time, producing an RNA molecule (for protein synthesis). Replication uses DNA polymerase; transcription uses RNA polymerase. Replication is semi-conservative; transcription produces a single-stranded RNA.

Why are there three stop codons but only one start codon?

The single start codon (AUG) also codes for methionine, so every protein begins with methionine (often removed later). Having three stop codons (UAA, UAG, UGA) provides redundancy — a mutation in one stop codon is less likely to be missed because two others exist. This redundancy is a safety mechanism to prevent runaway translation.

What is alternative splicing?

A mechanism where different combinations of exons from the same pre-mRNA are joined to produce different mature mRNAs, each encoding a different protein. This allows one gene to produce multiple proteins. About 95% of human multi-exon genes undergo alternative splicing, which is one reason humans can have ~100,000 different proteins from only ~20,000 genes.

Molecular biology is the mechanism layer under all of genetics. Once the central dogma clicks, everything from mutations to cancer becomes traceable to molecules.