No, quantum computers are not appropriate for big data problems. Rather, they are best for problems with a fairly small amount of data which has a very large solution space — so called combinatorial explosions. So, rather than call it Big Data, I call it Little Data with a Big Solution Space. This informal paper introduces the notions of Little Data with a Big Solution Space.
Topics in this informal paper:
- In a nutshell
- Quantum computer as a coprocessor
- Centrality of quantum parallelism
- Quantum advantage is the whole point
- Quantum supremacy
- Combinatorial explosion
- The essential goal: exponential speedup
- Big solution space
- Little data — limited input data, limited results
- Little computation — quantum computations must be simple
- No, large computations are not appropriate for quantum computing
- Big data
- 3 Vs of big data
- Big data data models
- Structured big data is not amenable to quantum computing
- Big data graphs are not amenable to quantum computing either
- Big data images are not amenable to quantum computing
- Big data text is not amenable to quantum computing
- Semi-structured big data is not amenable to quantum computing
- Big data media is not amenable to quantum computing
- 3Vs of big data are not amenable to quantum computing
- Caveat — no good examples, yet
- Variational methods as little data
- Hybrid quantum/classical computing
- Shor’s algorithm exemplifies little data with big solution space
- Quantum computers have no I/O or database access
- Big data with a quantum computer
- Hadamard transform to achieve parallelism
- The basic model for quantum parallel computation
- The challenge: extracting results
- Extracting values from a large solution space
- Using interference to select values from a large solution space
- Quantum phase estimation (QPE) to extract a value
- Quantum Fourier transform (QFT) to extract a value
- How many qubits does a quantum computer need?
- Solution space size
- Shot count or circuit repetitions
- Circuit repetitions could dramatically reduce any quantum advantage
- Pay attention to net or effective quantum advantage
- Don’t forget wall clock time
- Classical sampling, Monte Carlo simulation
- Approximate solutions
- Input data and parameters for a quantum circuit
- Generally, at least twenty qubits will be needed for significant quantum advantage
- Generally, over fifty qubits will yield a dramatic quantum advantage — and even quantum supremacy
- But what about super-exponential problems?
- Will quantum error correction (QEC) be required?
- Where are we today?
- Will quantum computing ever be able to process big data?
- Caveat for D-Wave Systems
- What about IBM’s new quantum roadmap for one million qubits?
- Future advanced quantum computer architectures
- Universal quantum computers
In a nutshell
At its heart, quantum computing is all about quantum parallelism, which is the evaluation of a relatively simple computation over a very big solution space.
A quantum computer is a coprocessor invoked from a classical computer. A quantum program (quantum circuit) running on a quantum computer has no ability to perform I/O or access mass storage, files, databases, or a network, so any input data or parameters must be encoded in the gate structure of the quantum circuit. This dramatically limits the amount of input data as well as limiting the computation.
The logic of the computation must be relatively simple, but it will be executed in parallel over the full range of a potentially large solution space.
Once the parallel computation has been completed, a variety of clever tricks are used to extract a value from that solution space.
Then some additional clever classical code is used to massage the quantum result to get a real result that can finally be used by a classical application.
Even then, the application will generally need to re-execute the same quantum computation a significant number of times to get a statistically meaningful and effectively deterministic result since quantum computations are probabilistic in nature, not deterministic. This is referred to as the shot count or circuit repetitions.
Only a single result can be obtained per computation — a little data result.
- Little data in.
- Little computation in a big solution space. Quantum parallelism.
- Little data out.
- Repeat enough times to get a statistically-meaningful result.
The remainder of this paper will flesh out that essential skeleton.
Quantum computer as a coprocessor
Quantum computing is not expected to fully replace classical computing any time soon. In fact, there won’t ever be any applications which are 100% pure quantum. Rather, applications will be a combination of classical and quantum code, referred to as hybrid quantum/classical computing.
Typically, classical code will do all of the preparation for a quantum computation, including dynamic generation of the actual quantum circuit complete with any input data or parameters, execute that circuit on the quantum computer, and then process the quantum results with classical code as well.
The quantum computer will be used as a coprocessor, to execute small, self-contained chunks of the application, while larger portions of the application will continue to be performed using classical processing.
Hopefully those small chunks executed on the quantum computer won’t be too small, or else they won’t offer a true quantum advantage over a classical computer.
But even if those quantum chunks are not so small, they are still quite small relative to traditional classical applications.
Centrality of quantum parallelism
The essence of quantum parallelism is to perform a single computation once, but with an n-qubit register initialized to a superposition of all 2^n possible binary values for a range for a parameter of the computation. It takes some cleverness and tricks to extract a single result from that parallel computation, but that’s the essence of quantum computing.
The technique used on a quantum computer to achieve the quality of exponential speedup is called quantum parallelism. As already noted in the previous section, a single calculation can be executed once for all 2^n possible values all at once.
Some clever and sophisticated techniques are needed to select and extract the proper results after executing a calculation under quantum parallelism, but that’s the essence of what a quantum computer does for you.
Quantum advantage is the whole point
The overall goal of quantum computing is quantum advantage — that a quantum computer will complete a task in far less time than a classical computer. Not just 10% to 20% faster, not just 40% to 50% faster, not just 200% faster, not just 500% faster, not just 10 times faster or 50 or 100 times faster, but many 1,000’s of times faster or millions of times faster or even billions of times faster or even more.
In any case, whatever the actual, net quantum advantage, that’s the point of even considering a quantum computer over a classical computer.
This topic is covered in more depth in these two papers:
- What Is Quantum Advantage and What Is Quantum Supremacy?
- What Is the Quantum Advantage of Your Quantum Algorithm?
Taken to the extreme, some tasks take so long that they simply can’t be practically implemented on a classical computer, a very large cluster of classical servers, or even on a classical supercomputer — and complete in our lifetimes. If such a task can be implemented on a quantum computer, this is not only a quantum advantage but is also called quantum supremacy.
Generally, quantum supremacy won’t be relevant until fifty or more qubits are used in parallel for a single quantum computation.
Quantum supremacy is discussed in more detail in this paper:
Generally, problems are harder — take more time — to solve as they get bigger. The question is at what rate the time to complete a task grows as the amount of input data grows. As long as that rate of growth is fairly low, you can probably solve the problem on a classical computer, but if the rate of growth is high, the problem gets harder and harder to solve on a classical computer, to the point where eventually it takes too long to be practical or tolerable.
This rate of growth is called algorithmic complexity or computational complexity. For more on algorithmic complexity, read this paper:
The reason that many problems get so hard to solve is typically that the number of combinations of the input data grows at a very prodigious rate. This is known as a combinatorial explosion.
Numbers like 10, 20, 30, and 40 don’t sound very big, but if the difficulty of solving a problem grows exponentially, such as 2^n, the complexity or difficulty grows dramatically — 2¹⁰ is only 1,000 (approximately), 2²⁰ is over one million, 2³⁰ is over a billion, and 2⁴⁰ is over a trillion. Now try numbers like 50, 100, 500, 1,000, or one million and the exponential growth is simply too staggering to even contemplate, let alone implement on a classical computer.
The essential goal: exponential speedup
The quality that is needed to give a quantum computer a quantum advantage and to tackle problems with combinatorial explosions is called exponential speedup. This is the essential goal of quantum computing.
Given the right algorithm, a quantum computer can take one of these exponential problems and execute a calculation once for all 2^n possibilities all at once. This is exponential speedup.
It’s a little more complicated than that, but the overall principle is just that — an exponential reduction in the algorithmic complexity of the problem.
Big solution space
A combinatorial explosion of data produces what is called a big solution space. A solution space is simply the full range of possible combinations to be considered to solve a problem.
If a classical algorithm operated on a mere 50 items of input data with exponential complexity, that would be 2⁵⁰ steps (or more) — 10¹⁵ or a million billion steps. And with factorial complexity operating on those same mere 50 items of input data would take 50! (50 factorial) steps (or more) — 10⁶⁴ steps.
A mere 40 pieces of input data could generate a trillion combinations. A mere 300 pieces of input data could generate 2³⁰⁰ combinations, or 10⁹⁰ which is larger than the number of atoms in the entire universe. That’s an unfathomably big solution space.
And with a mere 75 items of input data, 75 factorial steps is far greater than all of the atoms in the entire universe.
These are big, huge, humongous, mind-boggling solution spaces, but with a very, very modest amount of input data. Not even remotely close to big data.
One example is the so-called Traveling Salesman Problem (TSP) — what is the cheapest and faster route to visit all 50 or 75 cities or customers. A very modest amount of data (Little Data) but a spectacularly big number of possible solutions to evaluate (Big Solution Space.)
TSP is just one of many optimization problems which can be considered for quantum computers.
These are the kinds of problems that quantum computing is best for.
Little data — limited input data, limited results
Even with the power of quantum parallelism it is not possible to process a large amount of input data or to produce a large amount of output data. This is why this paper refers to little data.
There are three data qualities here:
- Small amount of input data. And parameters.
- Relatively simple calculation applied in parallel to a very big number of combinations of the limited amount of input data. This is the big solution space.
- Extraction of a small amount of result data.
Any preprocessing of input data must be performed using classical code.
Similarly any postprocessing of result data must be performed using classical code.
Little computation — quantum computations must be simple
Although a quantum computer could evaluate a computation over all values in a big solution space in parallel, the actual computation itself must be relatively simple.
Quantum computing is not a good fit for complex logic. A quantum circuit is a linear sequence of quantum logic gates executed sequentially. Actually, it’s an ordered graph so that some gates can be executed in parallel, but the effect is the same — no complex logic. No conditionals, no looping, no function calls, no recursion, no rich data types. And no I/O, or file or database access.
The net complexity of the overall quantum circuit can be very complex (large), but the complexity is limited to quantum parallelism, with no complexity permitted in the core computation to be executed in parallel.
That said, hybrid quantum/classical algorithms and applications are possible, where core algorithms are executed on the quantum computer, while other logic which is less-appropriate for a quantum computer runs on a classical computer. The quantum computer is used as a coprocessor.
Quantum parallelism is great, but it does require that the core computation is fairly small — little computations.
Technically, current quantum computers and those projected over the coming decade will have a limited number of qubits and a very limited quantum program (quantum circuit) size, as well as being limited by coherence and inter-qubit connectivity.
A classical solution requiring thousands or even hundreds of lines of code and megabytes or even kilobytes of memory would exceed the capacity of all known quantum computers.
Quantum algorithms basically require that the application problem be mapped to the raw physics operations supported by the quantum computer. That severely constrains the computations which can be performed.
No, large computations are not appropriate for quantum computing
As already indicated, quantum computing is focused on little computations evaluated over a big solution space. Large computations consisting of large amounts of code and requiring significant memory or mass storage or I/O or database access are not going to be appropriate — or even possible — on a quantum computer alone.
As mentioned earlier, a quantum computer can only be used for relatively small chunks of a larger computation. At least for now. Ask me again in ten years.
Big data is a vague and ambiguous term, but for the purposes of discussion here, it refers to a large number of rows or records of data — millions or billions of rows or records, or gigabytes (billions of bytes) or terabytes (trillions of bytes) of data, or even larger.
Some of the hype around quantum computers has suggested that quantum computers could readily solve big data problems easily, but that’s just not the case, at all.
It’s not that quantum computers can’t be applied to big data at all, just that quantum computers may not offer any significant advantage for problems driven by the time to access large amounts of data.
Quantum computers are better suited for problems where there is a relatively small amount of data for which there is a very big solution space or combinatorial explosion of potential solutions.
Sure, you could use a classical computer to access large volumes of data and then feed the data in small amounts to a quantum computer, but unless the solution space for that small amount of data is self-contained to that small amount of data and is a very, very large solution space, it is unlikely that the quantum computer would offer a very big benefit compared to a small amount of computation on that small amount of data on a classical computer.
So, no, you can’t execute a single computation across a billion rows of big data in a fraction of a second on a quantum computer.
Classical big data commonly involves a big number of classical computers with distributed storage of the large volumes of data, with each classical computer processing a relatively small or manageable subset of the data, but that model doesn’t work well for quantum computers where there are a very small number of computers with essentially no data storage — any input data must be encoded in the quantum program itself and any output data must be processed by a classical computer.
3 Vs of big data
Just for reference, big data is commonly characterized with three terms which each begin with a V — the 3 Vs of big data:
- Volume. Lots of data.
- Velocity. At a high rate.
- Variety. In a lot of different formats.
Unfortunately, none of those Vs apply to quantum computing.
There are other formulations with four, five, or even seven Vs, but they apply even less to quantum computing.
Big data data models
Not all big data is created equal. Big data comes in a variety of forms:
- Rows. Large number of linear rows of structured data.
- Graphs. With large numbers of nodes and edges.
- Images. Large numbers of large images, possibly with deep color.
- Text. Large amounts of unstructured text. Or semi-structured text.
- Semi-structured data. A combination of any of the above. Such as structured documents or XML files.
- Media. Audio, video, etc. Large sizes, in complex data formats.
Structured big data is not amenable to quantum computing
As a coprocessor, a quantum computer is well-suited to performing a single computation, maybe in parallel for a large number of discrete values, but still only a single computation, while structured data involves a number of distinct data values, each requiring its own computation.
Technically you could execute a relatively small number of distinct computations in parallel at the same time on a quantum computer, but the number would be very limited and it would be essentially for a single row of structured big data, so there would be no significant quantum advantage unless the distinct computations were very expensive and involved a large solution space.
Big data graphs are not amenable to quantum computing either
Graph databases are a special category of big data, where the data is not organized as a linear sequence of rows, but as a mathematical graph with nodes and edges. A lot of specialized classical code can be optimized to traverse and process graphs, but none of that is relevant to quantum computing.
Sure, a quantum computer can process a small amount of data extracted from a graph, but lacking I/O, database access, network access, and even mass storage, a quantum computer would be limited to processing only a small amount of data at a time, not an entire, large graph.
Big data images are not amenable to quantum computing
Although very tiny images can be processed on a quantum computer — a few dozen pixels or very small thumbnail images, large images such as a typical photograph (megapixels) with significant color depth are well beyond the limited capabilities of both current quantum computers and those projected over the next decade.
Big data text is not amenable to quantum computing
Large quantities of text (unstructured data), such as log files, news, documents, books, etc. are not readily processed by current or projected quantum computers.
Relatively small amounts of text which has been preprocessed and transformed into structured codes might be processed on a quantum computer, but not raw text, especially in volume.
Semi-structured big data is not amenable to quantum computing
Since neither large quantities of structured data nor raw text are amenable to quantum computing, neither is semi-structured big data, such as large structured documents or large XML files.
Again, there may be individual computations which are very expensive and could be offloaded to a quantum computer, but that falls under the rubric of little data rather than big data. And even if the quantum computer offers a quantum advantage for such computations, there is still the risk that the overall overhead of processing the massive amounts of data could easily swamp the relatively small quantum computations.
Big data media is not amenable to quantum computing
Digital media, such as audio (music, podcasts), and video (YouTube videos, TV shows, feature-length movies) is completely inappropriate for processing on a quantum computer, being well beyond the limited capabilities of both current quantum computers and those projected over the next decade.
3Vs of big data are not amenable to quantum computing
None of the three Vs of big data are amenable to quantum computing:
- Volume. No quantum advantage for big data.
- Velocity. Again no quantum advantage for big data.
- Variety. Not appropriate for quantum computing.
There may be narrow subsets of big data which are amenable to quantum computing, but quantum computing has nothing to offer the three Vs of big data.
Caveat — no good examples, yet
Although this paper suggests that little data with a big solution space has real potential for quantum computing, unfortunately there are not yet any good or great examples of solutions which achieve this effect. The potential is there, but reality hasn’t yet caught up with the promise — and may not for quite a few years.
All existing quantum algorithm implementations use too few qubits to offer anything resembling a quantum advantage.
Variational methods as little data
Some problems, such as in quantum computational chemistry, are simply too difficult for even the best of current quantum computers. A compromise, hybrid approach is taken for such problems using what are called variational methods which is an iterative approach using classical code to help guide the iterations of the quantum logic.
As always, this is a problem with little data input, little computation over a big solution space, producing a little data result.
The general flow of a variational method is:
- Pick an initial trial value. Possibly at random or guided by some a priori domain knowledge about the problem. This is called an ansatz.
- Perform one iteration of the quantum calculation, producing an intermediate result.
- Use classical code to examine the intermediate quantum result and optimize it to guide it in a presumably more fruitful path — or decide whether it is close enough to a final solution to the problem. This optimization yields a revised ansatz.
- Rinse and repeat. Repeat steps 2 and 3 until either the classical code decides the quantum result is close enough to a final solution, or if so many iterations have been attempted without achieving an acceptable solution that continuing to search for a solution is fruitless — and the computation fails.
And if the computation fails after hitting a limit on iterations, the human user may review the intermediate results and decide to tweak input parameters — which control the classical optimization process — or try to come up with a more judicious ansatz that is more likely to lead to an acceptable solution.
For more information on the quantum variational method, read:
- Quantum Variational Algorithm
Hybrid quantum/classical computing
The proper term for what was just described is hybrid quantum/classical computing, where parts of the application are classical code while parts are quantum code.
Shor’s algorithm exemplifies little data with big solution space
I have my own reservations about the practicality of Shor’s algorithm for factoring large semiprime numbers (product of two prime numbers) — useful for cracking large public encryption keys — but it is the ultimate example of the concept of little data with a big solution space.
A very strong public encryption key today would have 4096 bits, which is the product of two large prime numbers, but it would take forever for a classical computer to try to factor such a large number using brute force methods.
Shor’s algorithm uses some clever tricks — and quantum parallelism — to arrive at the two factors of such a large semiprime number.
Shor’s original algorithm (there are derivative algorithms) would use two 8192-qubit registers for its computations.
The number of quantum states represented by one of those 8192-qubit registers would be 2⁸¹⁹², which is far greater than counting everything in the entire universe. That’s a big solution space.
4096 is large relative to the qubits in a typical quantum computer today, but is still a fairly small amount of input data relative to a lot of classical applications and algorithms.
The actual quantum computation for Shor’s algorithm is not so little — on the order of 32 million gates, but that’s still small relative to the number of operations performed in a lot of classical applications and algorithms.
So, as extreme as Shor’s algorithm is (or would be if it could be implemented today), it still exemplifies the little data with big solution space (and little computation) model discussed by this paper.
And Shor’s algorithm is a great example of hybrid quantum/classical computing as well, with parts which are classical code and part which is quantum code.
Quantum computers have no I/O or database access
It may seem shocking, but since a quantum computer operates as a simple coprocessor, it has no ability to perform I/O or to access files or a database, or to access any resources over a network.
Actually, the quantum computer system typically has network access and in fact is accessed as a service over the network, but the quantum processor itself, the heart (or brain) of the quantum computer has no I/O, no file access, no database access, and no access to the network.
A quantum program or quantum circuit is simply a sequence (or graph) of quantum logic gates in which any input data or input parameters must be encoded in the gate structure of the quantum circuit. Upon completion of execution of the quantum circuit the qubits will be measured, yielding a string of classical bits, binary 0’s and 1’s, which are then returned to the application for further processing by classical code. But that’s the only way data can be fed into the quantum program and the only way output can be stored or processed.
In short, a quantum computer can only handle a small amount of data at one time, in a single computation. The only way for a quantum computer to handle a large amount of data would be to invoke a quantum computation a large number of times, which would be no faster than invoking a classical computation the same number of times.
Big data with a quantum computer
Since a quantum computer has no I/O, file or database access, or network access, that means that it can only process a fairly small amount of data within a single quantum circuit. That doesn’t suggest a big opportunity for solving big data problems. Technically, one could in fact use a quantum computer to solve a big data problem, but since only a small amount of data can be processed at each step, there is little opportunity to achieve a true quantum advantage with exponential speedup. If there is no substantial quantum advantage over a classical computer, there’s no reason to use a quantum computer.
For quantum computing to make any sense in the context of big data, only a relatively small number of very tiny amounts of data can be processed by the quantum computer. To repeat:
- A very small amount of data for each computation.
- A relatively small number of those computations.
For example, Google’s Sycamore quantum computer can only process about 5,000 requests per second. Processing a mere one billion rows would take 56 hours (1,000,000,000 / 5,000 / 3,600). A distributed cluster of classical computers could easily beat that.
Only in extreme cases where the time for a single computation is very large for a classical computer — and is much larger than the time to access the data, such as Shor’s algorithm to factor a very large semiprime number used as a public encryption key, could a quantum computer be competitive for big data.
Hadamard transform to achieve parallelism
It’s actually a very simple operation. An H gate (H for Hadamard) turns a simple 0 or 1 into a superposition of 0 and 1. One qubit, two values. It’s that simple.
An H gate applied to each of the qubits of an n-qubit quantum register would obviously put all n qubits into a superposition of 0 and 1. Superficially, that’s two times n distinct quantum states, which doesn’t sound terribly exciting.
And if that’s all you did and then measured all n qubits you would get a bunch of random 0’s and 1’s. That’s actually a good random number generator — treat the measured qubits as the bits of a binary integer and you will get a random number between 0 and 2^n minus one.
But if instead of measuring the qubits of the register you perform a quantum computation using that n-qubit register as an input value, which may involve conditional entanglement and interference of various qubits of the register, the superposition has the interesting effect of evaluating the computation for all possible values of the qubits of the register in parallel. Parallel computation — quantum parallelism.
If you simply measured the n qubits of the register after the parallel computation you would get only one of the computed values, chosen at random. That’s not terribly useful either — another form of random number generator.
I won’t go into the details here, but clever tricks are used to then extract the interesting values from that parallel computation, such as using a quantum Fourier transform or other techniques exploiting quantum interference.
The point here is that the Hadamard transform sets the stage for quantum parallel computation.
The basic model for quantum parallel computation
The basic model for achieving exponential speedup is to evaluate a single computation in the context of a parameter which ranges over a big number of values. In particular, with n qubits, a total of 2^n values, in the form of 2^n quantum states, can be evaluated.
To be clear, n is not the total number of qubits in the quantum computer or even the total number used by a single quantum circuit, but the number actually used in a Hadamard transform to represent the range of values for a single calculation, and 2^n represents the total number of values evaluated in parallel.
For example, if only 6 qubits of a 53-qubit machine are used in a quantum circuit, the number of possible values is only 2⁶ or 64.
General outline of an algorithm:
- Initialize an n-qubit quantum register to a superposition of all 2^n discrete values of the range of the desired parameter to be varied using a Hadamard transform (H gate on each qubit.)
- Perform the desired computation on the n-qubit register. Any input values must be encoded in the gate structure of the quantum circuit.
- Transform the 2^n computation results (quantum states) to an n-qubit result register, such as using quantum phase estimation (QPE), a quantum Fourier transform (QFT), or some other method which exploits quantum interference to map the 2^n computed values (quantum states) to a single n-qubit value to be extracted from the solution space.
- Measure the n-qubit result register, producing an n-bit classical result.
- Perform classical processing on the n-bit classical measured result.
- Repeat the above to develop an expectation value (distribution) for the results. The shot count input parameter specifies how many circuit repetitions to perform. Exactly how to select the actual result from the distribution will depend on the application.
The challenge: extracting results
The great challenge when evaluating a computation across a big solution space is exactly how to extract results. There is no easy answer for all situations. The answer is simply that you have to be clever. That is the focus of countless academic papers on quantum computing.
Extracting values from a large solution space
Three of the main approaches to extracting a result value from a large solution space are:
- Quantum phase estimation (QPE)
- Quantum Fourier transform (QFT)
Using interference to select values from a large solution space
This paper won’t go into the details, but interference is one of the primary clever tricks used to select values from a large solution space. As with classical wave interference, two waves can reinforce or cancel each other — with cancellation causing a quantum state to be ignored or discounted, and reinforcement causing a quantum state to be selected or weighed more heavily.
For more on interference in quantum computing:
- Is Interference The Main Quantum Computing Advantage?
Quantum phase estimation (QPE) to extract a value
One specialized method to exploit quantum interference to extract a value from the large solution space of a quantum computation is quantum phase estimation (QPE). Details are beyond the scope of this paper.
Quantum Fourier transform (QFT) to extract a value
Another specialized method to exploit quantum interference to extract a value from the large solution space of a quantum computation is quantum Fourier transform (QFT). Details are beyond the scope of this paper.
How many qubits does a quantum computer need?
When we say that n qubits support 2^n quantum states or 2^n discrete values, that is not to say that the quantum computer needs only n qubits. These n qubits are the ones that participate in a parallel computation, but the quantum circuit may need more qubits, even many more qubits for its full computation.
A collection of n qubits used in a parallel computation are commonly referred to as a register or quantum register.
Sometimes two registers are needed, such as using one to serve as the superposition of all values for a parameter, and the second to be used as the output, such as for a quantum Fourier transform.
Further, although the application problem may have 2^n discrete values, it may be necessary to double the number of bits (qubits). For example, the value from the application may need to be squared, which doubles the number of bits (qubits.)
And finally, the quantum algorithm may need some extra qubits, called ancilla qubits, for intermediate results in the computation.
So, for example, if the application parameter had a range of 0 to 1,000, it would take 10 qubits (2¹⁰ = 1024), maybe doubled to 20, and maybe another 3 ancilla qubits, for a total of 23.
Or, if that same data required a second register, also doubled, that would be 10 * 2 + 10 * 2 + 3 or 43 qubits.
Or if the application parameter had one million discrete values, it would take 20 qubits, maybe doubled to 40, and maybe another 3 ancilla qubits, for a total of 43.
Or if that same data required a second register, also doubled, that would be 20 * 2 + 20 * 2 + 3 or 83 qubits.
Of course, it would all depend on the specific algorithm whether a doubling, a second register, or ancilla qubits are needed.
Solution space size
Some examples for solution space size based on the number of qubits used to represent a range of values:
- 4 qubits = 2⁴ = 16 quantum states (or discrete values.)
- 6 qubits = 2⁶ = 64 quantum states.
- 8 qubits = 2⁸ = 256 quantum states.
- 10 qubits = 2¹⁰ = 1024 = 1K quantum states.
- 12 qubits = 2¹² = 4096 = 4K quantum states.
- 16 qubits = 2¹⁶ = 64K quantum states.
- 18 qubits = 2¹⁸ = 256K quantum states.
- 20 qubits = 2²⁰ = 1M (one million) quantum states.
- 24 qubits = 2²⁴ = 16M (sixteen million) quantum states.
- 28 qubits = 2²⁸ = 256M (256 million) quantum states.
- 30 qubits = 2³⁰ = 1G (one billion) quantum states.
- 40 qubits = 2⁴⁰ = 1T (one trillion) quantum states.
- 50 qubits = 2⁵⁰ = 1Q (one quadrillion or one million billion) quantum states.
At least at present, more than 2⁵⁰ discrete values is generally viewed as beyond the capacity of even the largest supercomputers.
256 qubits could represent 2²⁵⁶ = 10⁷⁷ discrete values or quantum states which is a little less than the total number of atoms in the entire universe (10²⁷ to 10⁸² atoms).
276 qubits could represent more discrete values or quantum states (2²⁷⁶ = 10⁸³) than the total number of atoms in the entire universe.
300 qubits could represent a number of states comparable to one million times the total number of atoms in the entire universe.
These are truly huge solution spaces, which could not be feasibly evaluated one state at a time on even a massively large network of classical computers.
Again, “n” is not the total number of qubits on a quantum computer or even the total number of qubits used by an algorithm (circuit), but the number of qubits used for a single Hadamard transform to set up an n-qubit register for a simple computation for quantum parallelism. Additional qubits may be needed for the rest of the quantum circuit.
For example, Shor’s algorithm to factor a 4096-bit semiprime number (such as a public encryption key) requires roughly four times that many qubits, with two registers, each with double that (8192) qubits. So the solution space size would be 2⁸¹⁹², which is an incredibly large solution space. The overall quantum circuit would require 16K qubits (8K for each of the two registers), plus a few extra, ancilla qubits. There are other derivative algorithms, adaptations of Shor’s algorithm which use fewer qubits.
Shot count or circuit repetitions
Even with the quantum parallelism of n qubits evaluating a calculation over 2^n discrete values, that apparent advantage must be discounted to account for circuit repetitions (also called shot count) which are needed to account for the probabilistic nature of quantum computing.
Generally, quantum computers do not directly compute deterministic results. Rather, they compute probabilistic results.
But most applications require deterministic results or at least approximate results.
Statistics solves the problem. Statistics or statistical aggregation can be used to calculate the expectation value of the result of the quantum computation. Basically, perform the probabilistic computation enough times that the average of the probabilistic results will be a close approximation of the same result that you would get from a precise, deterministic classical computation.
The number of times that the quantum computation must be repeated is referred to as circuit repetitions, since the quantum circuit is being repeated that many times. This is also referred to as the shot count, as in how many shots do you need to take to assure that you will likely have hit the target.
The upside of statistical aggregation is that you get a reasonably decent result. The downside of statistical aggregation is that repetitions of the circuit effectively reduce the quantum advantage of using a quantum computer in the first place. Reduced, by how much? It will vary wildly from algorithm to algorithm — there’s not even a rough rule of thumb.
In some situations circuit repetitions may be a relatively minor effect, while in other situations they could be a major effect. And in some situations the overhead could completely swamp and completely negate any quantum advantage of the core quantum computation.
For more detail on shot counts and circuit repetitions, read my paper:
Circuit repetitions could dramatically reduce any quantum advantage
If you’re really lucky, circuit repetitions could be merely dozens or low hundreds. But that’s probably for only a relatively small number of qubits.
It’s possible that even for a computation using ten qubits — 2¹⁰ = 1024 possible values, a hundred or more circuit repetitions could be required, reducing a 1,000 to 1 advantage by a factor of one hundred or more to 10 to 1 — or even less.
In fact, it’s quite possible or even probable using noisy NISQ machines that circuit repetitions could completely swamp the quantum advantage.
Unfortunately, we don’t know a priori how much statistical aggregation might be required. In truth, at this stage, trial and error testing is needed to determine an acceptable shot count for a particular algorithm — and for particular input data.
As a more generous example, take 20 qubits representing 2²⁰ = one million possible values — that’s nominally a 1,000,000 to 1 quantum advantage. But that’s before any necessary statistical aggregation. Some possibilities and their impacts on the ideal quantum advantage:
- Shot count = 10. Very unrealistic, but to establish a baseline or best case. Reduces 1,000,000 to 1 advantage to 100,000 to 1.
- Shot count = 100. Also unrealistic. Reduces 1,000,000 to 1 advantage to 10,000 to 1.
- Shot count = 1,000. Possible, but not guaranteed. Reduces 1,000,000 to 1 advantage to 1,000 to 1.
- Shot count = 2,000. Possible, but not guaranteed. Reduces 1,000,000 to 1 advantage to 500 to 1.
- Shot count = 5,000. Reduces 1,000,000 to 1 advantage to 200 to 1.
- Shot count = 10,000. Reduces 1,000,000 to 1 advantage to 100 to 1.
- Shot count = 50,000. Reduces 1,000,000 to 1 advantage to 20 to 1.
- Shot count = 100,000. Reduces 1,000,000 to 1 advantage to 10 to 1.
- Shot count = 1,000,000. Reduces 1,000,000 to 1 advantage to 1 to 1 — no advantage at all.
Even with a shot count exactly matching the number of possible discrete values, there’s still no guarantee that a statistically meaningful result will be achieved.
As another example, take 30 qubits, which represents one billion discrete values, shot count could be 10,000, 25,000, 50,000, or 100,000 or… much larger, which yield a net quantum advantage of 100K, 40K, 20K, or 10K or worse.
10K is a nice quantum advantage, but clever techniques, including sampling, could achieve approximately the same result using a cluster of servers in some time which is not so much worse than what the quantum computer can achieve.
What’s likely in reality for such an example? Any of the above, really.
In any case, the point of this paper is not to warn that effective quantum advantage is minimal or elusive, but simply to urge analysts to actually use effective or net quantum advantage when evaluating the prospective benefits of quantum computing.
This topic is covered in more depth in this paper:
Pay attention to net or effective quantum advantage
Just to reemphasize the point of the preceding section, the raw quantum advantage of a quantum algorithm may be exponential — 2^n, but that raw advantage must be discounted by the number of circuit repetitions needed to achieve a statistically meaningful result.
Dividing the raw quantum advantage by the circuit repetitions gives you the net quantum advantage or effective quantum advantage.
For example, an algorithm computing 20 qubits in parallel could achieve a raw quantum advantage of 2²⁰ or one million to one, but if 10,000 circuit repetitions are needed that reduces to a net quantum advantage or effective quantum advantage of only 100 to one.
Don’t forget wall clock time
Many computations can be performed in such a tiny fraction of a second or minutes that nobody could care how much time it really is, but the combination of shot count or circuit repetitions coupled with practical limitations on how many circuit repetitions can be performed in a second (e.g., low thousands, not millions) could cause a quantum computation to take multiple seconds, tens of seconds, or even minutes or hours. The Google quantum supremacy experiment took over three minutes — and it wasn’t attempting to do anything practical.
Granted, some applications, such as real-time processing or involving network latency need to complete in a tiny fraction of a second.
A lot of the types of applications being considered for quantum computing seem less time-sensitive. Minutes, tens of minutes, even a few hours may be just fine.
The main thing we’re really looking at with quantum computing is to tackle problems which would take years, decades, centuries, even millennia to complete on even a supercomputer.
But there are other interesting problems, such as business process optimization where even a clever classical sampling algorithm might take eight hours or more, but the business may need to be able to optimize a process overnight in less than two hours. In such a case, even a quantum advantage of only four or five to one might make the difference between a viable and an impractical solution to a business problem.
The point is that raw quantum advantage is not necessarily the key determinant of whether a quantum solution is viable or acceptable. In some contexts even a 4–5x improvement is a fantastic win, while in other contexts (such as where thousands of cheap processors are readily available) even a 10,000x advantage is not a true advantage. And onerous circuit repetition requirements can wipe out even a billion to one advantage.
Classical sampling, Monte Carlo simulation
Even when the solution space is large, classical computing is not automatically out of the running. A variety of clever techniques can be used, such as sampling. Monte Carlo simulation is a popular sampling technique. For many problems a few hundred or a few thousand samples may be sufficient to gain a decent approximation of the overall solution space.
Sampling is one approach to achieve an approximate solution. Some problems really do require precise solutions, but approximate solutions are usually quite acceptable.
Even in the quantum realm, such as quantum computational chemistry, they speak of achieving chemical accuracy.
Input data and parameters for a quantum circuit
Since there is no way for a quantum circuit to read input data, any input data — our little data — as well as any input parameters must be encoded in the gate structure of the quantum circuit.
In reality, the algorithm developer will also develop classical code which can indeed access input data and parameters from any classical source of data. This classical code will then generate the quantum circuit, encoding the input data and parameters in the gate structure of the generated quantum circuit.
This may seem odd — and it is — but it’s a reasonable compromise that works reasonably well.
A lot of input parameters are either binary options indicating whether some logic should be included, a choice of which case of logic to use, or parameters for the classical code which preprocesses the input data and postprocesses the result data.
Generally, at least twenty qubits will be needed for significant quantum advantage
A computation involving twenty qubits would involve 2²⁰ or roughly one million quantum states. A one-million to one advantage would generally be considered a significant advantage that would not be easy to overcome using even very clever techniques on even a large cluster of classical computers.
Anything less than twenty qubits for a parallel computation seems unlikely to offer a dramatic quantum advantage
Shot count could reduce that advantage by some significant degree. For example, for these shot counts, here are the total qubits needed for the computation to achieve a one-million to one advantage:
- Shot count = 10. 24 qubits.
- Shot count = 100. 27 qubits.
- Shot count = 1,000. 30 qubits.
- Shot count = 2,500. 32 qubits.
- Shot count = 5,000. 33 qubits.
- Shot count = 10,000. 34 qubits.
- Shot count = 25,000. 35 qubits.
- Shot count = 50,000. 36 qubits.
- Shot count = 100,000. 37 qubits.
- Shot count = 250,000. 38 qubits.
- Shot count = 1,000,000. 40 qubits.
- Shot count = 10,000,000. 44 qubits.
At this stage, I’m not prepared to suggest that a quantum calculation requiring more than 10,000,000 shots is practical at this time. Even then, it’s far from clear that ten million shots would be enough to get a statistically valid result for 2⁴⁴ quantum states. But it’s at least a ballpark to consider.
As far as computations for fewer than twenty qubits, consider the net quantum advantage if 1,000 shots were required (or round to a power of 2–1024 shots):
- 19 qubits = 512K quantum states. Net quantum advantage 512.
- 18 qubits = 256K quantum states. Net advantage 256.
- 17 qubits = 128K quantum states. Net advantage 128.
- 16 qubits = 64K quantum states. Net advantage 64.
- 15 qubits = 32K quantum states. Net advantage 32.
- 14 qubits = 16K quantum states. Net advantage 16.
- 13 qubits = 8K quantum states. Net advantage 8.
- 12 qubits = 4K quantum states. Net advantage 4.
- 11 qubits = 2K quantum states. Net advantage 2.
- 10 qubits = 1K quantum states. Net advantage 1 — no net advantage.
Granted, all of these numbers are sensitive to the actual shot count.
Net advantage of 4 to 64 is hardly worth touting for most applications since running 4 to 64 classical machines in parallel is no big deal, so quantum algorithms implementing computations using 12 to 16 qubits are unlikely to deliver a true net quantum advantage.
So if a quantum computation using 20 qubits but requiring 1,000 shots will deliver a net quantum advantage of roughly 1,000, which if run in parallel on 100 classical machines would deliver a net quantum advantage of 10, you would need to have a use case which was so sensitive to wall clock time that a factor of 10 in wall clock time delivered a significant business advantage. That may indeed be possible, but seems less than likely.
Generally, over fifty qubits will yield a dramatic quantum advantage — and even quantum supremacy
Over fifty qubits in a single parallel computation is essentially a slam-dunk for quantum advantage — and even quantum supremacy as well.
An exception would be scenarios where there are some clever tricks for Monte Carlo simulation which enable a classical computer to get a “close enough” result with far less total calculations.
And don’t forget to factor shots or circuit repetitions needed to get a statistically meaningful quantum result.
Please note that this is not simply a quantum computer with fifty qubits, but fifty qubits used in a single quantum parallel computation. For example, if the overall quantum circuit also requires a separate output register, fifty qubits becomes one hundred qubits, plus possibly some ancilla qubits as well.
Regardless of how many total qubits are needed for the full quantum circuit, it’s the maximum number in a single quantum parallel computation which determines the quantum advantage — moderated by the number of circuit repetitions or shot count needed to get a statistically meaningful result.
But what about super-exponential problems?
A problem or algorithm with super-exponential complexity is one whose algorithmic complexity is worse than a problem or algorithm which has exponential complexity. The traveling salesman problem has factorial complexity, which is worse than exponential complexity.
The trick with very complex problems is to find some method to reduce the complexity by mapping the problem or a designated subset of the problem to a problem of simpler complexity. This is referred to as reduction. Sometimes this can be done, and sometimes it can’t.
Unless the problem or algorithm can be reduced to exponential complexity or less, any problem with super-exponential complexity is not a slam dunk or even a great candidate for a quantum computer. It all depends — on lots of factors.
That said, sometimes even if the raw algorithmic complexity is super-exponential, the particular problem being solved may have data values or parameters which are small enough that the net algorithmic complexity is small enough to actually be practical to solve on a quantum computer — or maybe even a classical computer. For example, for small values of n (10 or under), factorial(n) is still fairly small, even though for even modest values of n (20 or above) factorial(n) is impractical to implement.
Sometimes a super-exponential problem or algorithm can still offer a solution on a quantum computer even if it doesn’t offer an exponential speedup. It may still be slow, but it will still be much faster than a classical computer. But it may also be true that a statistical sampling approach could deliver an approximate solution in less time than a quantum computer which is good enough for application requirements. It all depends on the particular details of the problem, the algorithm, and the actual input data and parameter values. No guarantees either way.
In any case, great care is needed if the complexity of a problem or algorithm is worse than exponential. And one is well-advised not to blindly assume that any hard problem is readily and easily solved using a quantum computer.
Will quantum error correction (QEC) be required?
The model of computation discussed in this paper is independent of whether the physical quantum computer supports quantum error correction (QEC) or not.
A fault-tolerant quantum computer would certainly be a big boon all around, and certainly enable larger computations and eliminate the need for extra circuit repetitions needed to statistically compensate for uncorrected errors.
Achieving quantum advantage may indeed require quantum error correction. It’s so hard to say for sure since we don’t even have any credible proposed algorithms that theoretically might achieve quantum advantage.
Full exploitation of advanced quantum computing techniques such as quantum phase estimation (QPE) and quantum Fourier transform (QFT) will likely require full quantum error correction.
Shor’s algorithm or any algorithm of similar complexity would likely require full quantum error correction as well.
At our current stage of experimentation, with both hardware and algorithms still being mere laboratory curiosities, quantum error correction is not yet needed, but looking years into the future, production-scale quantum applications will most likely require either full-blown quantum error correction or at least much more reliable qubits which are not as susceptible to errors as qubits are today.
So, to be safe and sure, yes, quantum error correction is (will be) required for production-scale quantum applications, but that’s so far in the future as to not be very relevant to the types of experimentation currently envisioned for quantum computers in the next few years.
Ask me again in five years.
Where are we today?
These days, we’re ecstatic when somebody can simply figure out how to even solve a problem at all on a quantum computer, even if the solution is very restricted and doesn’t even provide any quantum advantage at all.
We’re a long way (years) from solving any really big problems using a quantum computer.
Will quantum computing ever be able to process big data?
Never say never, but at present there isn’t even a speculative theoretical prospect for a quantum computer to be able to deliver an exponential speedup for big data. As noted, relatively small computations can be performed on a relatively small amount of input data. Any exponential speedup comes when the computation has a large solution space.
But with big data the raw size of the raw input data is the limiting factor, not the resources or time needed to process each unit of input data. Quantum computing can speed up the processing of each unit, but not the raw processing of the vast bulk of the input data.
Sure, you can process all of the input data classically and then feed one unit at a time to a quantum algorithm, but any exponential speedup will come only from processing each unit, and will not provide any exponential speedup for processing all of the raw input data before the quantum processing is performed.
Who knows, maybe someday somebody will come up with a novel form of quantum storage capable of organizing a very large amount of information in a form which can be operated on in parallel at quantum speeds, but nobody is currently proposing such a fantastic storage model.
Caveat for D-Wave Systems
D-Wave Systems has a specialized quantum computer which comes with a hardwired application algorithm which is focused on a fairly narrow niche of optimization problems, which it handles fairly well, but doesn’t have the gate-level flexibility or generality of other quantum computers.
If your application problem — including input data size — cleanly maps to the D-Wave data model, you’re set. Otherwise, all of the issues discussed in this paper apply.
What about IBM’s new quantum roadmap for one million qubits?
Just as I was about to finalize this paper IBM released its roadmap to one million qubits:
- IBM’s Roadmap For Scaling Quantum Technology
- 5 Things to Know About the IBM Roadmap to Scaling Quantum Technology
So, does this open up the door to big data on a quantum computer? No, not really, at least as far as the roadmap goes.
I won’t go into a detailed critique of the roadmap here (I may address a future paper to it), but a few comments:
- One million qubits or even 5, 10, 20, 50, or 100 million qubits just won’t cut it for truly big data. And even a single million qubits is a whole decade away, or more.
- It’s not clear if one million qubits is physical qubits or logical qubits. Either way is still not enough for big data. I suspect it is one million physical qubits, which at 16 to 25 physical qubits per logical qubit would have a capacity of 40,000 to 62,500 logical qubits, which is certainly not appropriate for big data.
- The 1,121-qubit Condor system planned for late 2023 still has a relatively limited capacity for data.
- Still no prospect of I/O, file or database access, or network connectivity to access large amounts of data from within a quantum circuit.
- Still no prospect for rich data and media types.
- Still no prospect for structured data.
- Still no prospect for complex processing, especially for large amounts of data. Still focused on the little data with a big solution space model detailed in this paper. The roadmap does refer to “feedback and feed-forward system capabilities, where we’ll be able to control qubits based on classical conditions while the quantum circuit runs”, but that is too vague and sounds too limited to suggest true complex processing, let alone for large amounts of data.
I welcome IBM’s transparency and commitment to progress, but they’re only going so far. Granted, we don’t know what progress might bring us in the decade or two after this decade, but I’m focused on the here and now and near-term future, the next 2–5 years, seven years, maybe ten years.
Future advanced quantum computer architectures
The discussion in this paper is limited to existing notions of the design and architecture of quantum computers as known and practiced today and likely in the near future, but those notions may evolve in a dramatic manner in the coming years and decades, so that quantum computers 5, 10, 15, and 20 years from now may operate in a radically different manner than they do today.
We can’t predict what the architecture of quantum computers will look like many years from now. Existing quantum computing technologies may evolve incrementally or even by occasional leaps and bounds over the coming years, but new and innovative technologies may occasionally appear which completely eclipse those existing technologies, not unlike the evolution of classical computing from the 1930’s, 1940’s, 1950’s, 1960’s, 1970’s, 1980’s, 1990’s, and beyond.
The purpose of this section is not to attempt to predict the details of such advances, but simply to establish a placeholder in our thinking and planning so that we can better adapt to and exploit such new technologies as they appear.
The notion of little data with a big solution space is unlikely to be made obsolete in the near future. The notion may expand somewhat as quantum computers evolve and become more capable over the next few to five to seven years, but the overall essential principle will remain the same.
The notions of quantum parallelism, superposition, entanglement, interference, etc. are also unlikely to disappear over the next few years, but the technologies for implementing qubits may change so that newer and improved notions may become available for exploitation, such as more than two basis states or the qumodes and squeezed states of photonic qubits, or even more exotic and dramatic innovations. Even so, the core notion of little data with a big solution space is likely to remain intact and very vital.
Whether or when quantum computing architectures evolve so dramatically that the notion of little data with a big solution space becomes obsolete remains to be seen. I wouldn’t bet on it or against it, at least in the long-term, even if it remains vital for years to come. But for now, it’s the only real game in town.
Other approaches to quantum architecture include integration of quantum computing with the other areas of quantum information science:
- Quantum communication.
- Quantum networking.
- Quantum measurement.
- Quantum sensing.
For example, maybe having qubits directly connected to quantum sensors, so that real-time data, images, and media could be directly processed as raw quantum information rather than sent through analog to digital conversion, and then converted to quantum state again, where they came from in the first place. Large arrays or grids of quantum sensors directly connected to quantum processors could enable a quantum version of big data.
Quantum networking has great potential as well, allowing quantum computers to directly communicate and even directly share quantum state — via entanglement — rather than the clumsy mechanisms used today to prepare and measure state to move data between classical and quantum computers. This could allow the construction of large clusters of quantum computers with very low or even zero latency for sharing data (again due to entanglement), raising the prospect for something resembling big data.
Some form of quantum mass storage is another possibility, storing very large quantities of information directly as quantum state rather than classical bits. That would enable a quantum form of big data. But for now that is just a pie in the sky notion with no realistic prospect even on the distant horizon.
For now, we’re limited to little data with a big solution space.
Universal quantum computers
Someday we may see the advent of advanced universal quantum computers which truly and fully integrate the capabilities of both classical and quantum computers, possibly opening the door to quantum computing in the context of big data.
But even then, true conceptual integration which offers exponential speedup for all of classical big data processing is not a slam dunk. Such a machine may offer only a factor of improvement by dramatically reducing or even eliminating latency to switch between classical and quantum computing rather than exponentially speeding up the classical processing itself.
It would probably dramatically speed up the ability to move data in and out of the quantum computer, which is a big deal, but not an exponential factor in that movement. Maybe ten to a hundred times faster, but not a million to a billion times faster. Maybe a thousand times faster, but not a trillion times faster.
Only time will tell, and not in the very near future.
My thoughts on the notion of a universal quantum computer:
- What Is a Universal Quantum Computer?
The key factors to consider:
- A quantum computation using anything less than 20 qubits will not deliver a substantial quantum advantage — like one million to one.
- Shot count significantly reduces the quantum advantage. If a shot count of 25,000 is needed, you would need a computation using at least 35 qubits for a single parallel computation before you would see a net quantum advantage of one million to one.
- Shot count may not scale well for some algorithms so that the net quantum advantage might be relatively small — 4 to 64 — even when a significant number of qubits are used.
- Some algorithms may always take longer than on a classical computer if the shot count is too high.
- A net quantum advantage of less than 256 may not be very useful since it is relatively easy to run a quantum algorithm in parallel on that many classical machines.
- Sure, you may be able to demonstrate quantum advantage on under 20 qubits, but it won’t be very impressive for computations using less than about 32 to 40 qubits.
- Although 50 qubits or so (maybe 53–60) is nominally enough to get to quantum supremacy, shot count requirements may add another 10 to 20 qubits to achieve quantum supremacy. 60–80 is a safer bet. And this is the number of qubits used in a single parallel computation, not total qubits on the machine or used by the overall algorithm.
- There may be applications which are very sensitive to wall clock time, so that even a very modest quantum advantage could be significant. But the potential of running the classical algorithm in parallel on dozens or hundreds of classical machines should be taken into account as well before deriving the true net quantum advantage.
- Even with IBM’s ten-year roadmap, we’re still limited to relatively small amounts of input data.
- The discussion here is based on current conceptions of quantum computer architectures — all bets are off for more advanced notions of quantum computer architectures, including some form of universal quantum computer which may in fact be perfect for super-big data.