скачать Contents 1 Preface 1.1 On Tutorial 1.2 Computing v. Computation 2 Computing with Cells and Atoms 2.1 DNA Computing 2.2 Membrane Computing 2.3 Quantum Computing 3 Elements of Computer Science Related to Quantum Computing 3.1 Turing Machines 3.2 The ChurchTuring Thesis 3.3 Complexity Classes 4 Introduction to Quantum Computing 4.1 Introduction 4.2 Quantum Mechanics 4.3 Quantum Bits 4.4 Multiple Qubits 4.5 Measurement 4.6 Quantum Gates 4.7 Quantum Computer 4.8 Turing is Forever 4.9 Quantum Parallelism 4.10 Classical Computers vs. Quantum Computers 4.11 Bits vs. Qubits 4.12 Initialization, Execution and Termination 4.13 Power of Quantum Computer 4.14 Quantum Computing in Computational Complexity 4.15 Problems in Building Quantum Computers 4.16 Quantum Decoherence 4.17 Quantum Computer Candidates 5 Some Applications of Quantum Computing in Bioinformatics 5.1 Sequence Alignment 5.2 Quantum Evolutionary/Genetic Algorithm & MSA 5.3 Quantum Representation 5.4 Main Algorithms 5.5 Objective Functions 5.6 Optimization 5.7 Experimental Results 5.8 Conclusions 5.9 Open Problems 6 Further Readings  General reference  Introduction to Quantum Computation  Thermal ensembles  Using quantum computers to simulate quantum systems  Quantum cryptography  Universal quantum computer and the ChurchTuring thesis  Shor's factoring algorithm  Quantum database search  Quantum sorting  Quantum computer simulators  Quantum Programming Languages  Quantum error correction  Quantum error avoidance  Solving NPcomplete and #Pcomplete problems 7 Bibliography 8 Appendix A. Mathematics Used in this Paper B. Code for the Simulation of QEAlign B.1 header.h B.2 main.C B.3 initpopulation.C B.4 evaluation.C C. Sample output C.1 Sample output for 1aboA_ref1.msf with similarity 25 from BAliBASE C.2 Sample output for 1dox_ref1.msf with similarity 25 from BAliBASE C.3 Sample output for 1r69_ref2.msf from BAliBASE C.4 Sample output for 2trx_ref2.msf from BAliBASE C.5 Sample output for 1idy_ref3.msf from BAliBASE C.6 Sample output for 1pamA_ref3.msf from BAliBASE D. Selected Presenters’ Papers Related to Quantum and DNA Computing 1 Preface 1.1 On Tutorial The tutorial “Applications of Quantum Computing in Bioinformatics” covers:  Computing with Cells and Atoms  Elements of Computer Science Related to Quantum Computing  Quantum Computing  Some Applications of Quantum Computing in Bioinformatics  Conclusion and Open Questions  Further Readings  Bibliography The tutorial time is limited and some material will be just mentioned, skipped and left for homework.
Computing is any goaloriented activity requiring, benefiting from, or creating computers. Thus, computing includes designing and building hardware and software systems for a wide range of purposes; processing, structuring, and managing various kinds of information; doing scientific studies using computers; making computer systems behave intelligently; finding and gathering information relevant to any particular purpose, and so on. Computing is the systematic study of algorithmic processes that describe and transform information: their theory, analysis, design, efficiency, implementation, and application. Computation is a process following a well defined model that is understood and can be expressed by an algorithm, protocol, network topology, etc. Hongwei Huo Vojislav Stojkovic 2 Computing with Cells and Atoms ^ 2.1.1 DNA DNA, deoxyribonucleic acid, is a molecule found in every living cell, which directs the formation, growth, and reproduction of cells. DNA consists of nucleotides. Nucleotides contain compounds called phosphate, deoxyribose, and base. Within all nucleotides, phosphate and deoxyribose are the same, however, the bases vary. The four distinct bases are: adenine (A), guanine (G), thymine (T), and cytosine (C). The exact amount of each nucleotide and the order in which they are arranged are unique for every kind of living organism. DNA represents information as a pattern of molecules on a DNA strand. A DNA strand is a string of the alphabet {A, C, G, T}. The length of a DNA strand is equal to the length of the string that represents the DNA strand. 2.1.2. DNA Computer A DNA computer is a chemical instrument consisting of a system of connected test tubes and other auxiliary units. DNA computers use the chemical properties of DNA molecules by examining the patterns of combination or growth of the molecules or strings. DNA computers can do this through the manufacture of enzymes, which are biological catalysts that could be considered the ‘software’ used to execute the desired DNA computation. DNA computers represent information in terms of DNA. In DNA computers, deoxyribonucleic acids serve as the memory units that can take on four possible positions (A, C, G, or T). DNA computers do not have the vonNeumann architecture. DNA computers are massively parallel and are considered promising for complex problems that require multiple simultaneous computations. DNA computers perform computations by synthesizing particular sequences of DNA and allowing them to react in test tubes. The task of the DNA computer is to check each possible solution and remove those that are incorrect, using restrictive enzymes. When the chemical reactions are complete, the DNA strands can then be analyzed to find the solution. A super DNA computer is a programmable DNA computer. 2.1.3. DNA Computing DNA computing is a form of computing which uses DNA and biochemistry and molecular biology technologies instead of the traditional siliconbased computer technologies. In 1961 Feynman predicts in 1994 Adleman realized computations at a molecular level computing with DNA. DNA computing began in 1994 when Adleman showed that DNA computing was possible by solving the Traveling Salesman Problem on a DNA computer. Adelman used DNA polymerase and WatsonCrick complementary strands to do DNA computation. Since then, it has been a surge of research in the DNA computation field. DNA computation has emerged in an exciting new research field at the intersection of computer science, biology, mathematics, and engineering. DNA computation has been demonstrated to have the capability to solve problems considered to be computationally difficult for von Neumann machines. After the Hamiltonian Path problem was solved, several researchers proposed solutions to a spectrum of NPcomplete problems (such as Lipton) dealing with satisfiability, cryptography, as well as other search oriented problems. We have assumed that DNA computations are errorfree, i.e., they work perfectly without any errors. However, in reality DNA computations can be faulty because some DNA operations can introduce errors. DNA operations are constrained by biological feasibility. DNA operations may be: (i) realized by the present biotechnology or (ii) implemented by simulation on the conventional vonNeumann computers. 2.1.4. DNA Computation Model As computer components become smaller and/or more compact, scientists and engineers dream of a chemical, multiprocessor computer, whose processors are individual molecules involved in chemical processes. Following this thinking, we propose DNA computation model that involves the following three operations levels: (i) Basic DNA operations (DNA molecular interactions); (ii) Test tube operations (proposed in 1996 by Gibbons, Amos, and Hodgson) such as: remove, union, copy, select, and etc. (iii) High level operations A selection of Easel/Clike programming language statements such as: (i) beginend or {} (for grouping) (ii) ifthenelse (for selection) (iii) for (for loop) The basic DNA operations level is the chemical interactions between DNAs. It may be seen as machine programming and may be interpreted as executions of machine code. The basic DNA operations can be implemented at DNA computers or simulated at vonNeumann machines. The test tube operations level is an interface level that serves as an interface between vonNeumann machine and DNA machine. It may be seen as the hardware of a DNA computer. The test tube operations can be implemented at DNA computers or simulated at vonNeumann machines. The high level operations  the programming language level can be implemented using vonNeumann machines with standard processors, operating systems, and programming languages processors. In the last twelve years DNA computation has emerged as an exciting, fascinating, and important new research field at the intersection of computer science, mathematics, biology, chemistry, bioinformatics, and engineering. The main reasons for the interest in DNAcomputations are: (i) size and variety of available DNA molecular "tool boxes" (ii) massive parallelism inherent in laboratory and chemical operations on DNA molecule (iii) feasible and efficient models (iv) physical realizations of the models (v) performing computations in vivo. Unfortunately it is still not clear whether DNA computing can compete (or will be able to compete in the near future) with existing electronicdigital computing. We propose that in the near future it will be possible to join vonNeumann and DNA computer in a functional super biocomputer. We are confident that in 1020 years our desktop computers will be evolved into biocomputers. These machines will be able to perform calculations in seconds that take today’s PCs hours, and solve in hours problems that take today’s PCs years. A computational substrate – a substance that is acted upon by the implementation of DNA computational model is DNA. DNAs are represented by strings. DNA computational model operates upon sets of strings. A DNA computation starts and ends with a single set of strings. A DNA algorithm is composed of a sequence of operations upon one or more sets of strings. At the end of the DNA algorithm’s execution, a solution to the given problem is encoded as a string in the final set. Characterization of DNA computations using traditional measures of complexity, such as time and space is misleading due to the nature of the laboratory implementation of DNA computation. One way to quantify the time complexity of a DNAbased algorithm is to count the required numbers of “biological steps” to solve the problem. The biological steps include the creation of an initial library of strands, separation of subsets of strands, sorting strands by length, chopping and joining strands, and etc. 2.1.5. Basic DNA Operations An assignment is a finite sequence of unit assignments. A unit assignment is coded by a DNA strand. All unit assignments of an assignment have the same length. The most important basic DNA operations are: (i) Append (Concatenate, Rejoined)  appends two DNA strands with ‘sticky ends’ (ii) Melt (Anneal, Renaturation)  breaks two DNA strands with complementary sequences (iii) Cut  cuts a DNA strand with restriction enzymes. 2.1.6. Test Tube Operations A test tube contains an assignment. The most important test tube operations are: (i) Union (Merge, Create)  pours the context of more tubes into one tube. (ii) Copy (Duplicate, Amplify)  makes copies of a tube. (iii) Separate  separates an assignment into a finite sequence of assignments sorted by the length of unit assignments. (iv) Detect  confirms presence or absence of a unit assignment in a tube. (v) Select  selects on the uniformly random way from an assignment a unit assignment. (vi) Append (Concatenate, Rejoined)  appends an unit assignment to each unit assignment of an assignment. (vii) Melt (Anneal, Renaturation)  melts each unit assignment of an assignment with a unit assignment. (viii) Extract  extracts the context of one tube into two tubes using a pattern unit assignment. (ix) Remove  removes unit assignments that contain occurrence(s) of other unit assignments. (x) Cut  cuts each unit assignment of an assignment for the given length. (xi) Discard – empty the tube. 2.1.7. DNA Representations A DNA representation of a string c1 ... cm is a sequence c[1] ... c[m], where:  c[i], where i = 1, ..., m, is the character at the position i. Characters c[i] are uniquely encoded by DNA strands. If an unsigned integer number is not used for numerical calculations, then the unsigned integer number may be represented as a string of digits of some base. A DNA representation of an unsigned integer number d1...dm is a sequence d[1]...d[m], where:  d[i], where i = 1, ..., m, is the digit at the position i, and  0 <= d[i] <= base1. The base may be any integer number >= then 1. Digits d[i] are uniquely encoded by DNA strands. If an unsigned integer number is used for numerical calculations, then the given DNA representation of an unsigned integer number is not suitable because it does not care on carries what complicates implementations of arithmetic operations with unsigned integer binary numbers. 2.1.8. DNABased Algorithm for Creating a Set of Unsigned Integer Binary Numbers DNABased Algorithm for creating a set of unsigned integer binary numbers may be specified on the following way: procedure CreateInputSet(m, T) // m is the input data; // m is a unsigned integer number; // T is the input data and the output data; // T is the tube; // T as the input data // is the set of one of more empty strings. // The tube T contains a finite sequence of unit assignments // (DNA strands) that represents empty strings. // T as the output data // is the set of unsigned integer binary numbers of the length m; // T has maximum 2^{m} elements; // The tube T contains a finite sequence of unit assignments // (DNA strands) that represents // the set of unsigned integer binary numbers of the length m; { base = 2; T = {ε}; // ε is the empty string for (i = 1; i <= m; i++) { Copy(T, { T[base1] }); Parallèle for(j = 0; j <= base1; j++) { k = rand(base1); Append(T[j], k, T[j]); } Union({ T[base1] }, T); Discard( { T[base1] }); } } 2.1.9. DNABased Algorithm for Determining the Smallest Element of a Set of Unsigned Integer Binary Numbers DNABased Algorithm for determining the smallest element of a set of unsigned integer binary numbers of the given length may be specified on the following way: function Smallest(m, T) // The function Smallest returns // the smallest element of the set T; // It is the unit assignment (DNA strand). // m is the input data; // m is a unsigned integer number; // T is the input data; // T is the tube; // T as the input data // is the set of unsigned integer binary numbers of the length m; // T has maximum 2^{m} elements; // The tube T contains a finite sequence of unit assignments // (DNA strands) that represents // the set of unsigned integer binary numbers of the length m; { for i = m to 1 do { // Parallel // { // T > T[0]; // T > T[1]; // } Copy(T, { T[1] }); // It is a very important to empty the tube T. Discard(T); Parallel { Remove(T[1], {i 0}); Remove(T[0], {i 1}); } if(Detect(T[0])) then Union(T[0], T); else Union(T[1], T); } return Select(T); } Explanations i means the unsigned integer number j from the range 1 .. m. i j means the unsigned integer number j from the range 1 .. m at the position i. Remove(T[j], {i j}) removes from the tube T[j] all DNA strands which contain at least one occurrence of the DNA substrands i j. Remove(T[j], {i j}) saves in the tube T[j] only DNA strands which contain at the position i the value ¬ j. Parallel{A; B} means that operation A and B may be executed in parallel. Complexity Complexity of the DNABased Algorithm for determining the smallest element of a set of m unsigned integer binary numbers is O(m). Program The DNABased program for determining the smallest element of a set of unsigned integer binary numbers of the length m may be specified on the following way: program S { m=4; CreateInputSet(m, T); smallest = Smallest(m, T); }. ^ The program S may be executed: (i) step by step in a DNAlab or on a DNAcomputer (ii) automatically on a super DNAcomputer or on an electronicdigital computer (iii) for small (less then 10) number of elements. ^ The purpose of the test example is to "visualize" execution of the DNABased program S for determining the smallest element of a set of unsigned integer binary numbers of the length m. m=4; CreateInputSet(4, T); base = 2; T = {ε}; // ε is the empty string
i = 1; Copy(T, {T[1]});
Parallel { k = rand(1); for example k = 0 Append({ T[0], k }, T[0]); k = rand(1); for example k = 1; Append(T[1], k, T[1]); }
Union({ T[1] }, T);
Discard({ T[1] });
i = 2; Copy(T, {T[1]});
Parallel { k = rand(1); k = 0; Append({ T[0], k }, T[0]); k = rand(1); for example k = 0; Append(T[1], k, T[1]); }
Union({ T[1] }, T);
Discard({ T[1] });
And etc. k = rand(1); for example k = 1; k = rand(1); for example k = 0; k = rand(1); for example k = 0; k = rand(1); for example k = 1; ^
smallest = Smallest(4, T); i = 4; Copy(T, { T[1] });
Discard(T);
Parallel{Remove(T[1], {4 0}); Remove(T[0], {4 1});}
if(Detect(T[0])) then Union(T[0], T); else Union(T[1], T);
i = 3; Copy(T, { T[1] });
Parallel{Remove(T[1], {3 0}); Remove(T[0], {3 1});}
if(Detect(T[0])) then Union(T[0], T); else Union(T[1], T);
i = 2; Copy(T, { T[1] });
Parallel{Remove(T[1], {2 0}); Remove(T[0], {2 1});}
if(Detect(T[0])) then Union(T[0], T); else Union(T[1], T);
i = 1; Copy(T, { T[1] });
Parallel {Remove(T[1], {1 0}); Remove(T[0], {1 1});}
if(Detect(T[0])) then Union(T[0], T); else Union(T[1], T);
return Select(T); smallest = 0000ε; ^ Membrane computing is a branch of natural computing inspired by the structure and functions of the living cells. It connects:  distributed parallel computing models and  membrane systems. The field of membrane computing was initiated in 1998 by Gheorghe Paun. ^ Quantum computing is a multidisciplinary research area bringing together ideas from:  information theory,  computer science,  mathematics, and  quantum physics. 3 Elements of Computer Science Related to Quantum Computation 3.1 Turing Machines A Turing machine (TM) is a basic abstract symbolmanipulating machine which can simulate any computer that could possibly be constructed. A Turing machine that is able to simulate any other Turing machine is called a Universal Turing machine (UTM, or simply a universal machine). Studying abstract properties of TM and UTM yields many insights into computer science and complexity theory. Turing and others proposed mathematical computing models allow the study of algorithms independently of any particular computer hardware. This abstraction is invaluable. 3.1.1. Informal Description A Turing machine consists of:  A tape  A head  A table  A state register A tape is divided into cells, one next to the other. Each cell contains a symbol from some finite alphabet. The alphabet contains a special blank symbol (here written as 'B') and one or more other symbols. The tape is assumed to be arbitrarily extendable to the left and to the right. Cells that have not been written to before are assumed to be filled with the blank symbol. In some models:  the tape has a left end marked with a special symbol  the tape extends or is indefinitely extensible to the right A head can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time. In some models the head moves and the tape is stationary. A table (transition function) of instructions (usually 5tuples but sometimes 4tuples) for:  the state the machine is currently in and  the symbol it is reading on the tape tells the machine what to do. In some models, if there is no entry in the table for the current combination of symbol and state then the machine will halt. Other models require all entries to be filled. In case of the 5tuple models: (i) either erase or write a symbol, and then (ii) move the head ('L' for one step left or 'R' for one step right), and then (iii) assume the same or a new state as prescribed. In the case of 4tuple models: (ia) erase or to write a symbol or (ib) move the head left or right, and then (ii) assume the same or a new state as prescribed, but not both actions (ia) and (ib) in the same instruction. A state register stores the state of the Turing table. The number of different states is always finite and there is one special start state with which the state register is initialized. 3.1.2. Formal Description A (onetape) Turing machine is a 7tuple where  Q is a finite set of states  Γ is a finite set of the tape alphabet/symbols  is the blank symbol  Σ, a subset of Γ not including b, is the set of input symbols  is the transition function, where L is left shift and R is right shift.  is the initial state  is the set of final or accepting states 3.1.3. Example – Copy String ^ A more mathematicallyoriented definition with a similar "universal" nature was introduced by Alonzo Church. Church’s work on lambda calculus intertwined with Turing's work in a formal theory of computation resulted as the ChurchTuring thesis. The ChurchTuring thesis states that Turing machines:  capture the informal notion of effective method in logic and mathematics  provide a precise definition of an algorithm or 'mechanical procedure'. Turing proposed the following hypothesis: Every ‘function which would naturally be regarded as computable’ can be computed by the universal Turing machine. It should be noted that there is ambiguity as to what, precisely, a function which would naturally be regarded as computable means. Due to this ambiguity, this statement is not subject to rigorous proof. There are strong evidences for this hypothesis:  Many diverse models of computation compute the same set of functions as a Turing machine  No counterexamples of the hypothesis. The ChurchTuring thesis gives the insight into the “power” of computing machines. If a computing machine can solve all problems a Turing machine can solve, then the computing machine is powerful as a Turing machine. ^ It is interesting and important to determine does a problem computable or not computable. But, this distinction is not fine enough to determine does a problem solvable by a physical computer in a reasonable amount of time. If a computation will take hundreds of years to get result, it may be considered as not computable from the perspective of a pragmatist An algorithm can be characterized by the number of operations and amount of memory it requires to compute a result for the given input of size N. These characterizations of the algorithm determine what is called the algorithms complexity. Specifically, the complexity of an algorithm is determined by looking/analyzing how the number of operations and memory usage influence to getting a result for the size of the input of the algorithm. Problems are grouped into the following complexity classes:  P: Polynomial time  NP: Nondeterministic polynomial time  NPcomplete P: Polynomial time, the running time of the given algorithm is in the worst case some polynomial function of the size of the input. For example if an algorithm with an input size of 10 bits needs 10^4+7*10^2+5 operations to get result, it is a polynomial time algorithm. NP: Nondeterministic polynomial time, a candidate for an answer can be verified as a correct answer or not in the polynomial time. NPcomplete: A set of problems such that if any member is in P, the set P is equal to the set NP. [Cormen, Leiserson, Rivest , and Cliff 2001] Problems which can be solved in polynomial time or less are generally deemed to be “tractable”. Problems which require more than polynomial time are usually considered to be “intractable”. For example an algorithm which would take 2^n operations for an input size of n would be considered intractable. It should be noted at this point that in Shor's Algorithm, we will be given n, a number to be factored. In n's most compact representation, it will take log n bits to be encoded. When discussing the running time of algorithms, we must always speak in reference to the encoded input size. For Shor's algorithm (or any algorithm whose input is an integer), the input size" grows logarithmically with the value of n to be factored. Whether or not P and NP describe the same set of problems is one of the most important open questions in computer science. If a polynomial time algorithm for any NPcomplete problem be discovered, then we would know that P and NP are the same set. Many NPcomplete problems have known algorithms which can compute their solutions in exponential time. It is unknown whether any polynomial time algorithms exist. When discussing the complexity of a problem it is important to distinguish between there being no known algorithm to compute it in polynomial time, and there being no algorithm to compute it in polynomial time. There are many problems whose best known algorithm requires an exponential number of steps to compute a solution. These problems are generally considered to be intractable. Determining the prime factors of a large number is one such problem, and its assumed intractability is the bases for most cryptography. The interest in quantum computing is twofold:  Does a quantum computer can solve problems which are uncomputable on a classical computer?  Does a quantum computer can make problems thought of as intractable, tractable? ^ 4.1 Introduction Richard Feynman [Feynman 1982] observed that certain quantum mechanical effects cannot be simulated efficiently on a computer. This observation led to speculation that perhaps computation in general could be done more efficiently if it made use of these quantum effects. But building quantum computers, computation machines that use such quantum effects, proved tricky, and no one was sure how to use the quantum effects to speed up computation, the field developed slowly. It wasn’t until 1994, when Peter Shor[Shor 1994; 1997] surprised the world by describing a polynomial time quantum algorithm for factoring integers, that the field of quantum computing came into its own. This discovery promoted a flurry of activity among experimentalists trying to build quantum computers and theoreticians trying to find other quantum algorithms. Additional interest in the subject has been created by the invention of quantum key distribution and, more recently, popular press accounts of experimental successes in quantum teleportation and the demonstration of a 3bit quantum computer. During the past forty years astounding advances have been made in the manufacture of computers. The number of atoms needed to represent a bit in memory has been decreasing exponentially since 1950. Likewise the number of transistors per chip, clock speed, and energy dissipated per logical operation have all followed their own improving exponential trends. This rate of improvement cannot be sustained much longer, at the current rate in the year 2020 one bit of information will requite only one atom to represent it. The problem is that at that size the behavior of a computer's components will be dominated by the principles of quantum physics [Williams, Clearwater 1998]. Classically, the time it takes to do certain computations can be decreased by using parallel processors. To achieve an exponential decrease in time requires an exponential increase in the number of processors, and hence an exponential increase in the amount of physical space needed. However, in quantum systems the amount of parallelism increases exponentially with the size of the system. Thus, an exponential increase in parallelism requires only a linear increase in the amount of physical space needed. This effect is called quantum parallelism [Deutsch and Jozsa 1992]. There is a big catch at that. While a quantum system can perform massive parallel computation, access to the results of the computation is equivalent to making a measurement, which disturbs the quantum state. This problem makes the situation, on the face of it, seem even worse than the classical situation; we can only read the result of one parallel thread, and because measurement is probabilistic, we cannot even choose which one we get. But in the past few years, various people have found clever ways of finessing the measurement problem to exploit the power of quantum parallelism. This sort of manipulation has no classical analog and requires nontraditional programming techniques. One technique manipulates the quantum state so that a common property of all of the output values such as the symmetry or period of a function can be read off. This technique is used in Shor’s factorization algorithm. Another technique transforms the quantum state to increase the likelihood that output of interest will be read. Grover’s search algorithm [Grover 1997] makes use of such an amplification technique and searches an unstructured list of n items in O(n) steps on a quantum computer. Classical computers can do no better than O(n), so unstructured search on a quantum computer is provably more efficient than search on a classical computer. However, the speedup is only polynomial, not exponential, and it has been shown that Grover’s algorithm is optimal for quantum computers. A quantum system can undergo two types of operations: measurement and quantum state transformation. Most quantum algorithms involve a sequence of quantum state transformations followed by a measurement. For classical computer there are sets of gates that are universal in the sense that any classical computation can be performed using a sequence of these gates. Similarly, there are sets of primitive quantum state transformation, called quantum gates, which are universal for quantum computation. Given enough quantum bits, it is possible to construct a universal quantum Turing machine.
