# A Quantum distance-based classifier (part 2)¶

Robert Wezeman, TNO

$$\newcommand{\ket}{\left|{#1}\right\rangle}$$

## Introduction¶

In the first part of this notebook series on a distance-based classifier, we looked at how to implement a distance-based classifier on the quantum inspire using QASM code. We looked at the simplest possible case, that is: using two data points, each with two features, to assign a label (classify) to a random test point, see image. In this notebook we will extend the previous classifier, by increasing the number of data points to four. In this notebook we demonstrate how to use the projectQ framework to do this. ## Problem¶

Again we have the following binary classification problem: Given the data set $$\mathcal{D} = \Big\{ ({\bf x}_1, y_1), \ldots ({\bf x}_M , y_M) \Big\},$$

consisting of $M$ data points $x_i\in\mathbb{R}^n$ and corresponding labels $y_i\in \{-1, 1\}$, give a prediction for the label $\tilde{y}$ corresponding to an unlabeled data point $\bf\tilde{x}$. The classifier we shall implement with our quantum circuit is a distance-based classifier and is given by \begin{equation}\newcommand{\sgn}{{\rm sgn}}\newcommand{\abs}{\left\lvert#1\right\rvert}\label{eq:classifier} \tilde{y} = \sgn\left(\sum_{m=0}^{M-1} y_m \left[1-\frac{1}{4M}\abs{{\bf\tilde{x}}-{\bf x}_m}^2\right]\right). \hspace{3cm} (1)\end{equation}

This is a typical $M$-nearest-neighbor model where each data point is given a weight related to the distance measure. To implement this classifier on a quantum computer we use amplitude encoding. For the details see the previous notebook.

Back to Contents

## Theory¶

The algorithm requires a $n$-qubit quantum system to be in the following state initially \begin{equation}\label{eq:prepstate} \ket{\mathcal{D}} = \frac{1}{\sqrt{2M}} \sum_{m=0}^{M-1} \ket{m}\Big(\ket{0}\ket{\psi_{\bf\tilde{{x}}}} + \ket{1}\ket{\psi_{\bf{x}_m}}\Big)\ket{y_m}.\hspace{3cm} (2) \end{equation}

Here $\ket{m}$ is the $m^{th}$ state of the computational basis used to keep track of the $m^{th}$ training input. The second register is a single ancillary qubit entangled with the third register. The excited state of the ancillary qubit is entangled with the $m^{th}$ training state $\ket{\psi_{{x}_m}}$, while the ground state is entangled with the new input state $\ket{\psi_{\tilde{x}}}$. The last register encodes the label of the $m^{th}$ training data point by \begin{equation} \begin{split} y_m = -1 \Longleftrightarrow& \ket{y_m} = \ket{0},\\ y_m = 1 \Longleftrightarrow& \ket{y_m} = \ket{1}. \end{split} \end{equation} Once in this state the algorithm only consists of the following three operations:

1. Apply a Hadamard gate on the second register.

2. Measure the second register. We restart the algorithm if we measure a $\ket{1}$ and only continue when we are in the $\ket{0}$ branch.

3. Measure the last qubit $\ket{y_m}$.

In the special case where the amount of training data for both labels is the same, this last measurement relates to the classifier as described in previous section by \begin{equation} \tilde{y} = \left\{ \begin{array}{lr} -1 & : p(q_4 = 0 ) > p(q_4 = 1)\\ +1 & : p(q_4 = 0 ) < p(q_4 = 1) \end{array} \right. \end{equation} By setting $\tilde{y}$ to be the most likely outcome of many measurement shots, we obtain the desired distance-based classifier.

In the previous notebook we saw the implementation for $N=2$ data points, each with $M=2$ features. Now we will consider the case for two datapoints with $M=4$ features.

## Algorithm¶

To describe the desired initial state for $M = 4$ and $N = 2$ we need 5 qubits:

• Two qubits for the index register $\ket{m}$
• One ancillary qubit
• One qubit to store the information of the two features of the data points
• One qubit to store the information of the classes of the data points

Furthermore, these $4$ require us to implement a triple controlled-$R_y$ gate, or $CCCR_y$. ProjectQ does this automatically for us, at the cost of two extra ancillary qubits, resulting in a total of 7 qubits. The algorithm is divided in different parts:

• Part A: In this part the index register is initialized, as is the ancillary qubit. Part A consists of three Hadamard gates. After this step the system is in the state $$\ket{\mathcal{D}_A} =\frac{1}{\sqrt{8}} \sum_{m=0}^{3}\ket{m}\Big(\ket{0}+\ket{1}\Big)\ket{0}\ket{0}$$ • Part B: In this part we encode the unlabeled data point $\bf\tilde{x}$ by means of a controlled rotation. The encoding of the data will first require some preprocessing on the data, see the previous notebook for the details. We entangle the fourth qubit with the ancillary qubit. The angle $\theta$ of the rotation should be chosen such that ${\bf\tilde{x}}=R_y(\theta)\ket{0}$. After this step the system is in the state $$\ket{\mathcal{D}_B} =\frac{1}{\sqrt{8}} \sum_{m=0}^{3}\ket{m}\Big(\ket{0}\ket{\tilde{{\bf x}}}+\ket{1}\ket{0}\Big)\ket{0}$$ • Part C: In this step the encoding of the data points $\bf x_m$ is done. So far it has been almost analogous to the $M =2$ case, however, this step is a bit more involved. First, use a $CCCR_y$-rotation so that the datapoint $\bf x_m$ is connected to $\ket{m} = \ket{11}$ and entangled with the ancillary qubit $\ket{1}$. Next, we rotate $\ket{m}$ cyclic around such that we can reuse this $CCCR_y$-rotation for next data point, see the figure \begin{equation} C2 : \hspace{1cm}\ldots \rightarrow \ket{00} \rightarrow \ket{01} \rightarrow \ket{10}\rightarrow \ket{11} \rightarrow \ldots \end{equation} After doing step $C1$ four times with the right angles(${\bf x_m}=R_y\ket{0}$) and in the right order alternated with step $C2$ the system will be in the state $$\ket{\mathcal{D}_C} =\frac{1}{\sqrt{8}} \sum_{m=0}^{3}\ket{m}\Big(\ket{0}\ket{\tilde{{\bf x}}}+\ket{1}\ket{\bf x_m}\Big)\ket{0}$$

• Part D: In this part we encode the known labels of the data in the last qubit. This can be done easily using a Toffoli-gate with the controls on the first two qubits and the target on the fifth. Note that this requires only one rotation through $\ket{m}$ of the labels if we choose the data points such that the two points labeled with $y_m = 1$ are $\ket{00}$ and $\ket{11}$. The combination of twice circuit $D1$ with circuit $D2$ inbetween does the job. The desired state is now produced: \begin{equation} \ket{\mathcal{D}_D} = \frac{1}{\sqrt{8}} \sum_{m=0}^{3} \ket{m}\Big(\ket{0}\ket{\bf\tilde{{x}}} + \ket{1}\ket{\bf{x}_m}\Big)\ket{y_m}. \end{equation} • Part E: In this part the actual distance-based classifier part of the algorithm is done. This part is independent of number of data points, which is precisely the strength of this quantum classifier. This step consists of a simple Hadamard-gate. Results are then obtained by post-processing the measurement results. ## Implementation¶

We will implement the above algorithm using the projectQ framework. First we need to import some modules and set up the authentication for connecting to the Quantum Inspire API.

Before we consider the four point classifier, let us first review the 2 point classifier again. Consider the following data points from the Iris flower dataset: \begin{equation} \begin{split} &\ket{\psi_{\tilde{x}}} = \;\;\;0.9999 \ket{0} -0.0011\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_0}} = -0.4583 \ket{0} - 0.8889\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_1}} = -0.3728 \ket{0} +0.9279\ket{1}, \hspace{2cm}y = 0. \end{split} \end{equation}

The code for this implementation is shown below and treated in detail in the previous notebook.

The classifier predicts the wrong label 0 for the test point with this combination of data points as: \begin{equation} 0.04 + 0.1043719 > 0.0366663 + 0.1019124 \rightarrow \text{Assign label 0 to } \tilde{y} \end{equation} The left figure gives intuition why the classifier fails to predict the correct label. For this specific combination of data points, the test point is closer to the data point with label 0 than to the data point with label 1. As the prediction is based on only these two data points, it is expected to give this wrong prediction. Note, this problem has nothing to do with the quantum computer used for the calculation. The same results would be obtained with a classical distance-based classifier.

By adding two more data points, one for each label, we improve the classifier so that it is better able to give the right label as prediction. Add the following two points to the calculation: \begin{equation} \begin{split} &\ket{\psi_{x_2}} = -0.3728 \ket{0} +0.9279\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_3}} = -0.4583 \ket{0} - 0.8889\ket{1}, \hspace{2cm}y = 0. \end{split} \end{equation}

Consider the quantum circuit for the four point classifier below.

Just as we did in previous notebook, we need to count only those outcomes where the third qubit is equal to a 0. That is, we only consider the 16 outcomes in the set: $$\{00000, 01000, 10000, 11000, 00001, 01001, 10001, 11001, 00010, 01010, 10010, 11010, 00011, 01011, 10011, 11011\}$$

The label $\tilde{y}$ is then given by a majority vote:

\begin{equation} \begin{split} \tilde{y}_{0} = \# \{00000, 01000, 10000, 11000, 00010, 01010, 10010, 11010\}\\ \tilde{y}_{1} = \#\{00001, 01001, 10001, 11001, 00011, 01011, 10011, 11011\} \end{split} \end{equation}\begin{equation} \tilde{y} = \left\{ \begin{array}{lr} -1 & : \tilde{y}_{0} > \tilde{y}_{1}\\ +1 & : \tilde{y}_{1} > \tilde{y}_{0} \end{array} \right. \end{equation}

By adding these two points we see that the classifier now gives the correct label for the test data. To see how well this 4-point distance-based classifier performs we also apply it to randomly selected training and test data.

## Conclusion and further work¶

In general, a distance-based classifier gets better in predicting classes once more data is included. In this notebook we showed how to extend the previous algorithm to 4 data points. We saw an example where the 2-point distance-based classifier fails to predict the right label, while, when extended to four data points, it managed to classify the test point correctly. This is what we expect classically, however, the key takeaway here is that the quantum algorithm itself (step f) did not change. It is independent of the size of dataset and it is from this that we can expect huge speedups.

Extending the classifier to more data points is now analogus. Extending the classifier to $8$ data points will be a nice challenge for the reader to test their understanding, a solution is given below.

In this notebook we used the projectQ backend to generate the quantum algorithm. Note that for controlled gate operations in generall ancillary qubits are required. ProjectQ code does not minimize the amount of ancillary qubits needed in the algorithm, this could be improved.

The next notebook in this series will look at how to include more than two features for the data.