dc.description.abstract |
Ambulatory EEG is a widespread test used in hospitals for the neurological evaluation
of patients. EEG waveforms are typically reviewed by a trained neurologist
to classify EEG into clinical categories. Methodologically, there is a need to classify
EEG recordings automatically. Ideally, the classification models should be interpretable,
able to deal with EEG of varying durations, and robust to various artifacts.
We aimed to test and validate a framework for EEG classification, which satisfies
such requirements by symbolizing EEG signals and adapting a method previously
proposed in natural language processing (NLP).We considered an extensive sample
of routine clinical EEG (n=5’850), with a wide range of ages between 0 and 100 years
old. We symbolized the multi-variate EEG times series and applied a byte-pair encoding
(BPE) algorithm to extract a dictionary of the most frequent patterns (tokens)
reflecting the variability of EEG waveforms. To demonstrate the performance of
such an approach, we used newly-reconstructed EEG features to predict the biological
age of patients with Random Forest. We also correlated the relative frequencies
of tokens with age. We found that the age prediction model achieved the mean absolute
error of 15.9 in years. The correlation between actual and predicted age was
0.56. The most significant correlations between the frequencies of tokens and age
were observed at frontal and occipital EEG channels. Our findings demonstrate the
feasibility of an approach based on applying NLP methods to time series classification.
Notably, the proposed algorithms could be instrumental in classifying clinical
EEG with minimal preprocessing and sensitivity to the appearance of short events,
such as epileptic spikes. |
uk |