prob*: Word Design for DNA Computing on Surfaces

proposed by Marc van Dongen
dongen@cs.ucc.ie

Results

A solution of size 108 is reported in the literature (Anthony G. Frutos, Qinghua Liu, Andrew J. Thiel, Anne Marie W. Sanner, Anne E. Condon, Lloyd M. Smith, and Robert M. Corn, “ Demonstration of a Word Design Strategy for DNA Computing on Surfaces,” Nucleic Acids Research, 25 4748-4757 (1997). The solution was obtained by construction.

Coding theorists at UCC have since then computed a solution of size 112. This solution was obtained using a greedy algorithm. I have understood that the authors of the original paper have also found such a solution. (Private communication).

I have managed to find the solution of size 112 listed at the end of this page with an implementation of an optimisation CSP in ECLiPSe. The solution was not easy to find in two ways. The first reason is that the CSP formulation is too large and that it is impossible to find a solution within reasonable time (at least with ECLiPSe) and without a linear programming tool. To overcome this problem I combined the greedy and optimisation approach. The second reason is that the problem (at least with my formulation) is very sensitive to heuristics. This sensitivity to heuristics (choice of bases, really) also manifests itself with the greedy algorithms. Here then is the solution of size 112:

AAGCCGTT TACGCGAT TAGCGCAT TTCGGCAA
ATGGGGAT TAGGGGTA TTCCGGTT TTGGCCTT
AGACCACT ACTCCTGT AGAGCTGA AGTCGAGA
TGAGGACA TCTGGTGA TGACGTGT TGTGCAGT
ACCTTGGA AGCATCGT TGCTACGA TGGAACCT
AGGTAGGT TCGTTCGT TGCTTGCT TGGATGGA
CAACTCCT CAAGACGA CTACAGCA GATCACCA
CATCAGGT CATGTGCA CTTCTCGA CTTGACCT
GAACTGGA GAAGAGCT GTACACGT GTAGTCCA
CTAGTGGT GATGTCGT GTTCTGCT GTTGAGGA
CTTCCTAG GAAGCTAG GAAGGATC GATCGAAG
CATGGTTG GTACGTTG GTTGCATG GTTGGTAC
CAGAAGTG CTCATGTC GAGTACAG GTCTTCAC
CTGTTGAG GAGTTGTC GTCTAGTG GTGATCTG
CACACACT CACTGTCA CAGAGAGA CTCACTGA
CTGTCACA GACTCAGA GAGACTCA GTCAGACA
CAGTCTGT CTCTGAGT CTGAGTCT GACAGTGT
GAGTGACT GTCTCTCT GTGACAGT GTGTGTGA
CGAAAACG CCATTAGG CGAATTGC GCTTAACG
GGTAATCC GCTTTTGC GGATATGG GGTATAGG
CCAACCTT CCAAGGAA CCTTCCAA GGAACCAA
CGATCGAT CGATGCTA CGTACGTA CGTAGCAT
GCATCGTA GCATGCAT GCTACGAT GCTAGCTA
CCTTGGTT GGAAGGTT GGTTCCTT GGTTGGAA
CCCCAAAA CCCGTATT CCGCTTTA CCGGATAT
CGCCTTAT CGCGATTA CGGCAATT CGGGTAAA
GCCCATTT GCCGTTAA GCGCTAAT GCGGAATA
GGCCTATA GGCGAAAT GGGCATAA GGGGTTTT