.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/tutorial_getting_started.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_tutorial_getting_started.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_tutorial_getting_started.py:


Getting started
===============

This getting started tutorial shows an example of how to use the PyntBCI library for analysing code-modulated responses.
This tutorial makes use of a small dataset of EEG data, recorded from one participant during a single session. This data
is already minimally preprocessed following a spectral filter with a band-pass at 2-30 Hz and a downsample from 2 kHz to
240 Hz. The stimuli that were shown are a circularly shifted modulated Gold code (shift register length 6, shift between
stimuli 2 bits), presented at 60 Hz. The participant focused on each of the 32 stimuli once in a 4x8 matrix speller,
where each presentation lasted 4.2 seconds (2 code cycles) after a 0.8 second cue.

In this notebook, the reconvolution CCA (rCCA) method for decoding EEG is demonstrated, see [1]_ and [2]_. Additionally,
rCCA is compared to eCCA, the so-called reference pipeline as discussed in the c-VEP review [3]_.

References
----------
.. [1] Thielen, J., Marsman, P., Farquhar, J., & Desain, P. (2021). From full calibration to zero training for a
       code-modulated visual evoked potentials for brain–computer interface. Journal of Neural Engineering, 18(5),
       056007. DOI: https://doi.org/10.1088/1741-2552/abecef
.. [2] Thielen, J., van den Broek, P., Farquhar, J., & Desain, P. (2015). Broad-Band visually evoked potentials:
       re(con)volution in brain-computer interfacing. PLOS ONE, 10(7), e0133797.
       DOI: https://doi.org/10.1371/journal.pone.0133797
.. [3] Martínez-Cagigal, V., Thielen, J., Santamaria-Vazquez, E., Pérez-Velasco, S., Desain, P., & Hornero, R. (2021).
       Brain–computer interfaces based on code-modulated visual evoked potentials (c-VEP): A literature review. Journal
       of Neural Engineering, 18(6), 061002. DOI: https://doi.org/10.1088/1741-2552/ac38cf

.. GENERATED FROM PYTHON SOURCE LINES 27-38

.. code-block:: Python


    import os

    import matplotlib.pyplot as plt
    import numpy as np
    import seaborn

    import pyntbci

    seaborn.set_context("paper", font_scale=1.5)


.. GENERATED FROM PYTHON SOURCE LINES 39-45

The data
--------
The dataset consists of: (1) The EEG data X that is a matrix of k trials, c channels, and m samples; (2) The labels y
that is a vector of k trials; (3) The pseudo-random noise-codes V that is a matrix of stimuli with n classes and m
samples. Note, the stimuli are upsampled to the EEG sampling frequency and contain only one stimulus-cycle. During a
trial, however, the stimuli were repeated 2 times (2 stimulus cycles).

.. GENERATED FROM PYTHON SOURCE LINES 45-107

.. code-block:: Python


    # Path to pyntbci (to read the tutorial data and standard cap files)
    path = os.path.join(os.path.dirname(pyntbci.__file__))

    # Load tutorial data
    tmp = np.load(os.path.join(path, "data", "tutorial.npz"))
    X = tmp["X"]
    y = tmp["y"]
    V = tmp["V"]
    fs = int(tmp["fs"])
    fr = 60
    print("X", X.shape, "(trials x channels x samples)")  # EEG
    print("y", y.shape, "(trials)")  # labels
    print("V", V.shape, "(classes, samples)")  # codes
    print("fs", fs, "Hz")  # sampling frequency
    print("fr", fr, "Hz")  # presentation rate

    # Extract data dimensions
    n_trials, n_channels, n_samples = X.shape
    n_classes = V.shape[0]

    # Read cap file
    capfile = os.path.join(path, "capfiles", "biosemi64.loc")
    with open(capfile, "r") as fid:
        channels = []
        for line in fid.readlines():
            channels.append(line.split("\t")[-1].strip())
    print("Channels:", ", ".join(channels))

    # Visualize EEG data
    i_trial = 0  # the trial to visualize
    plt.figure(figsize=(15, 15))
    plt.plot(np.arange(0, n_samples) / fs, 25e-6 * np.arange(n_channels) + X[i_trial, :, :].T)
    plt.xlim([0, 1])  # limit to 1 second EEG data
    plt.yticks(25e-6 * np.arange(n_channels), channels)
    plt.xlabel("time [s]")
    plt.ylabel("channel")
    plt.title(f"Single-trial multi-channel EEG time-series (trial {i_trial})")
    plt.tight_layout()

    # Visualize labels
    plt.figure(figsize=(15, 3))
    hist = np.histogram(y, bins=np.arange(n_classes + 1))[0]
    plt.bar(np.arange(n_classes), hist)
    plt.xticks(np.arange(n_classes))
    plt.xlabel("label")
    plt.ylabel("count")
    plt.title("Single-trial labels")
    plt.tight_layout()

    # Visualize stimuli
    Vup = V.repeat(20, axis=1)  # upsample to better visualize the sharp edges
    plt.figure(figsize=(15, 8))
    plt.plot(np.arange(Vup.shape[1]) / (20 * fs), 2 * np.arange(n_classes) + Vup.T)
    for i in range(1 + int(V.shape[1] / (fs / fr))):
        plt.axvline(i / fr, c="k", alpha=0.1)
    plt.yticks(2 * np.arange(n_classes), np.arange(n_classes))
    plt.xlabel("time [s]")
    plt.ylabel("code")
    plt.title("Code time-series")
    plt.tight_layout()


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_001.png
         :alt: Single-trial multi-channel EEG time-series (trial 0)
         :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_002.png
         :alt: Single-trial labels
         :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_002.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_003.png
         :alt: Code time-series
         :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_003.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    X (32, 64, 1008) (trials x channels x samples)
    y (32,) (trials)
    V (32, 504) (classes, samples)
    fs 240 Hz
    fr 60 Hz
    Channels: Fp1, AF7, AF3, F1, F3, F5, F7, FT7, FC5, FC3, FC1, C1, C3, C5, T7, TP7, CP5, CP3, CP1, P1, P3, P5, P7, P9, PO7, PO3, O1, Iz, Oz, POz, Pz, CPz, Fpz, Fp2, AF8, AF4, AFz, Fz, F2, F4, F6, F8, FT8, FC6, FC4, FC2, FCz, Cz, C2, C4, C6, T8, TP8, CP6, CP4, CP2, P2, P4, P6, P8, P10, PO8, PO4, O2


.. GENERATED FROM PYTHON SOURCE LINES 108-120

The event matrix
----------------
The first step for reconvolution is to find within the sequences the repetitive events. This can be imposed "manually"
by choosing the event definition that we believe the brain responds to. Here, the so-called "duration" event is used,
which marks the length of a flash as the important piece of information. As the sequences in this dataset were
modulated, there are only two events: a short and a long flash. Additionally, a third event is added that will account
for the onset of a trial, during which all of a sudden the screen started flashing. The event matrix is a matrix of n
classes, e events, and m samples.

Please, note that more event definitions exist, which can be explored with the `event` variable of `rCCA`. For
instance, `event="contrast"` is a useful event definition as well, which looks at rising and falling edges,
generalising over the length of a flash.

.. GENERATED FROM PYTHON SOURCE LINES 120-144

.. code-block:: Python


    # Create event matrix
    E, events = pyntbci.utilities.event_matrix(V, event="duration", onset_event=True)
    print("E:", E.shape, "(classes x events x samples)")
    print("Events:", ", ".join([str(event) for event in events]))

    # Visualize event time-series
    i_class = 0  # the class to visualize
    fig, ax = plt.subplots(1, 1, figsize=(15, 3))
    pyntbci.plotting.eventplot(V[i_class, ::int(fs/fr)], E[i_class, :, ::int(fs/fr)], fs=fr, ax=ax, events=events)
    ax.set_title(f"Event time-series (code {i_class})")
    plt.tight_layout()

    # Visualize event matrix
    i_class = 0
    plt.figure(figsize=(15, 3))
    plt.imshow(E[i_class, :, :], cmap="gray")
    plt.gca().set_aspect(10)
    plt.xticks(np.arange(0, E.shape[2], 60), np.arange(0, E.shape[2], 60) / fs)
    plt.yticks(np.arange(E.shape[1]), events)
    plt.xlabel("time [s]")
    plt.title(f"Event matrix (class {i_class})")
    plt.tight_layout()


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_004.png
         :alt: Event time-series (code 0)
         :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_004.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_005.png
         :alt: Event matrix (class 0)
         :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_005.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    E: (32, 3, 504) (classes x events x samples)
    Events: 4, 8, onset


.. GENERATED FROM PYTHON SOURCE LINES 145-155

The structure matrix
--------------------
The second step for reconvolution is to model the expected responses associated to each of the events and their
overlap. This is done in the so-called structure matrix (or design matrix). The structure matrix is essentially a
Toeplitz version of the event matrix. It allows to model the c-VEP as the dot product of r (the transient response to
an event) and M (the structure matrix for a specific class) for the ith class. The structure matrix is a matrix of n
classes, l response samples, and m samples.

An important parameter here is the `encoding_length` argument. An easy abstraction is to assume the same length for
the responses to each of the events. However, one could also set different lengths for each of the events.

.. GENERATED FROM PYTHON SOURCE LINES 155-172

.. code-block:: Python


    # Create structure matrix
    encoding_length = int(0.3 * fs)  # 300 ms responses
    M = pyntbci.utilities.encoding_matrix(E, encoding_length)
    print("M: shape:", M.shape, "(classes x encoding_length*events x samples)")

    # Plot structure matrix
    i_class = 0  # the class to visualize
    plt.figure(figsize=(15, 6))
    plt.imshow(M[i_class, :, :], cmap="gray")
    plt.xticks(np.arange(0, M.shape[2], 60), np.arange(0, M.shape[2], 60) / fs)
    plt.yticks(np.arange(0, E.shape[1] * encoding_length, 12), np.tile(np.arange(0, encoding_length, 12) / fs, E.shape[1]))
    plt.xlabel("time [s]")
    plt.ylabel(events[::-1])
    plt.title(f"Structure matrix (class {i_class})")
    plt.tight_layout()


.. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_006.png
   :alt: Structure matrix (class 0)
   :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_006.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    M: shape: (32, 216, 504) (classes x encoding_length*events x samples)


.. GENERATED FROM PYTHON SOURCE LINES 173-185

Reconvolution CCA
-----------------
The full reconvolution CCA (rCCA) pipeline is implemented as a scikit-learn compatible class in PyntBCI in
`pyntbci.classifiers.rCCA`. All it needs are the binary sequences `stimulus`, the sampling frequency `fs`, the event
definition `event`, the transient response size `encoding_length` and whether to include an event for the onset of a
trial `onset_event`.

When calling `rCCA.fit(X, y)` with training data `X` and labels `y`, the full decomposition is performed to obtain
spatial filters `rCCA.w_` and temporal filter `rCCA.r_`.

Please note that the transient responses are concatenated in this temporal filter `rCCA.r_`. One can use
`rCCA.events_` to disentangle these and find which response is associated to which event.

.. GENERATED FROM PYTHON SOURCE LINES 185-206

.. code-block:: Python


    # Perform CCA decomposition with duration event
    encoding_length = 0.3  # 300 ms responses
    rcca = pyntbci.classifiers.rCCA(stimulus=V, fs=fs, event="duration", encoding_length=encoding_length, onset_event=True)
    rcca.fit(X, y)
    print("w: ", rcca.w_.shape, "(channels)")
    print("r: ", rcca.r_.shape, "(encoding_length*events)")

    # Plot CCA filters
    fig, ax = plt.subplots(1, 2, figsize=(15, 3))
    pyntbci.plotting.topoplot(rcca.w_, capfile, ax=ax[0])
    ax[0].set_title("Spatial filter")
    tmp = np.reshape(rcca.r_, (len(rcca.events_), -1))
    for i in range(len(rcca.events_)):
        ax[1].plot(np.arange(int(encoding_length * fs)) / fs, tmp[i, :])
    ax[1].legend(rcca.events_)
    ax[1].set_xlabel("time [s]")
    ax[1].set_ylabel("amplitude [a.u.]")
    ax[1].set_title("Transient responses")
    fig.tight_layout()


.. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_007.png
   :alt: Spatial filter, Transient responses
   :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_007.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    w:  (64, 1) (channels)
    r:  (216, 1) (encoding_length*events)


.. GENERATED FROM PYTHON SOURCE LINES 207-218

Cross-validation
----------------
To perform decoding, one can call `rCCA.fit(X_trn, y_trn)` on training data `X_trn` and labels `y_trn` and
`rCCA.predict(X_tst)` on testing data `X_tst`. In this section, a chronological cross-validation is set up to evaluate
the performance of rCCA.

Additionally, a second classifier is introduced, `eCCA`, which is the so-called "reference" method for c-VEP decoding.
Instead of using reconvolution for template generation (rCCA), eCCA computes templates by computing average responses
to repeated trials. As in this dataset a single circularly shifted code was used, we can compute one template for this
code, and circularly shift it to generate templates for all other classes. Therefore, eCCA requires a `lags` parameter
that specifies the relationship between the different classes.

.. GENERATED FROM PYTHON SOURCE LINES 218-255

.. code-block:: Python


    # Chronological cross-validation
    n_folds = 4
    n_trials = int(X.shape[0] / n_folds)
    folds = np.repeat(np.arange(n_folds), n_trials)

    # Loop folds
    accuracy = np.zeros((2, n_folds))
    for i_fold in range(n_folds):
        # Split data to train and test set
        X_trn, y_trn = X[folds != i_fold, ...], y[folds != i_fold]
        X_tst, y_tst = X[folds == i_fold, ...], y[folds == i_fold]

        # rCCA
        rcca = pyntbci.classifiers.rCCA(stimulus=V, fs=fs, event="contrast", encoding_length=0.3, onset_event=True)
        rcca.fit(X_trn, y_trn)
        yh_tst = rcca.predict(X_tst)
        accuracy[0, i_fold] = np.mean(yh_tst == y_tst)

        # eCCA
        ecca = pyntbci.classifiers.eCCA(lags=np.arange(0, 2 * 63, 4) / 60, fs=fs, cycle_size=2 * 63 / 60)
        ecca.fit(X_trn, y_trn)
        yh_tst = ecca.predict(X_tst)
        accuracy[1, i_fold] = np.mean(yh_tst == y_tst)

    # Plot accuracy (over folds)
    plt.figure(figsize=(15, 3))
    plt.bar(-0.2 + np.arange(n_folds), accuracy[0, :], 0.4, label="rCCA")
    plt.bar(0.2 + np.arange(n_folds), accuracy[1, :], 0.4, label="eCCA")
    plt.axhline(1 / n_classes, color="k", linestyle="--", label="chance", alpha=0.5)
    plt.xticks(np.arange(n_folds))
    plt.xlabel("(test) fold")
    plt.ylabel("accuracy")
    plt.legend()
    plt.title("Chronological cross-validation")
    plt.tight_layout()


.. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_008.png
   :alt: Chronological cross-validation
   :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_008.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 256-262

Learning curve
--------------
When comparing eCCA and rCCA, one can appreciate that rCCA typically requires fewer data than eCCA. The reason for
this is that rCCA reduce the number of free parameters to those of the transient responses instead of the full c-VEP,
which at the same time allows to increase the amount of data to perform a kind of average over. This can be observed
in the so-called learning curve, which shows the performance as a function of the amount of training data.

.. GENERATED FROM PYTHON SOURCE LINES 262-308

.. code-block:: Python


    # Chronological cross-validation
    n_folds = 4
    n_trials = int(X.shape[0] / n_folds)
    folds = np.repeat(np.arange(n_folds), n_trials)

    # Loop folds
    accuracy = np.zeros((2, n_trials * (n_folds - 1), n_folds))
    for i_fold in range(n_folds):

        # Split data to train and test set
        X_trn, y_trn = X[folds != i_fold, ...], y[folds != i_fold]
        X_tst, y_tst = X[folds == i_fold, ...], y[folds == i_fold]

        # Loop trials for the learning curve
        for i_trial in range(n_trials * (n_folds - 1)):
            # rCCA
            rcca = pyntbci.classifiers.rCCA(stimulus=V, fs=fs, event="duration", encoding_length=0.3, onset_event=True)
            rcca.fit(X_trn[:1 + i_trial, ...], y_trn[:1 + i_trial])
            yh_tst = rcca.predict(X_tst)
            accuracy[0, i_trial, i_fold] = np.mean(yh_tst == y_tst)

            # eCCA
            ecca = pyntbci.classifiers.eCCA(lags=np.arange(0, 2 * 63, 4) / 60, fs=fs, cycle_size=63 / 60)
            ecca.fit(X_trn[:1 + i_trial, ...], y_trn[:1 + i_trial])
            yh_tst = ecca.predict(X_tst)
            accuracy[1, i_trial, i_fold] = np.mean(yh_tst == y_tst)

    # Plot learning curve
    plt.figure(figsize=(15, 3))
    avg = accuracy[0, ...].mean(axis=-1)
    std = accuracy[0, ...].std(axis=-1)
    plt.plot(np.arange(n_trials * (n_folds - 1)), avg, label="rCCA")
    plt.fill_between(np.arange(n_trials * (n_folds - 1)), avg + std, avg - std, alpha=0.2)
    avg = accuracy[1, ...].mean(axis=-1)
    std = accuracy[1, ...].std(axis=-1)
    plt.plot(np.arange(n_trials * (n_folds - 1)), avg, label="eCCA")
    plt.fill_between(np.arange(n_trials * (n_folds - 1)), avg + std, avg - std, alpha=0.2)
    plt.axhline(1 / n_classes, color="k", linestyle="--", label="chance", alpha=0.5)
    plt.xlabel("train trials [#]")
    plt.ylabel("accuracy")
    plt.legend()
    plt.title("Learning curve")
    plt.tight_layout()

    # plt.show()


.. image-sg:: /tutorials/images/sphx_glr_tutorial_getting_started_009.png
   :alt: Learning curve
   :srcset: /tutorials/images/sphx_glr_tutorial_getting_started_009.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 47.574 seconds)


.. _sphx_glr_download_tutorials_tutorial_getting_started.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: tutorial_getting_started.ipynb <tutorial_getting_started.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: tutorial_getting_started.py <tutorial_getting_started.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: tutorial_getting_started.zip <tutorial_getting_started.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_