Classification of Auditory-Driven EEG Signals Using Machine Learning: A Stanford Inspirit AI project - 09/2022
Overview
This programming project focused on the classification of electroencephalography (EEG) signals generated in response to auditory stimuli. The objective was to explore how well various machine learning models could differentiate between brainwave patterns triggered by native and non-native sounds experienced by the participants. This investigation is rooted in computational neuroscience and demonstrates the potential of EEG-based classification systems in understanding auditory processing and real-time linguistic cognition.
Project Objective
The goal of this study was to:
Preprocess EEG signals appropriately to fit into machine learning models.
Apply and compare traditional machine learning classifiers (Logistic Regression and Support Vector Machines) with deep learning models (Convolutional Neural Networks).
Assess model performance in identifying sound stimuli categories based on temporal-spatial EEG features.
By isolating a specific channel of interest (F8 electrode, associated with auditory and decision-making regions), the study aimed to investigate whether auditory event-related potentials (ERPs) can be used to distinguish the linguistic familiarity of the auditory stimuli.
Data and Preprocessing
Data Source and Context
This project employed the Auditory Evoked Potential EEG-Biometric Dataset published by Abo Alzahab et al. (2021) on PhysioNet. The dataset consists of over 240 two-minute EEG recordings from 20 participants, collected at Marche Polytechnic University (UNIVPM) using the OpenBCI Ganglion Board. EEG data was recorded at a sampling rate of 200 Hz using four electrodes positioned at T7, F8, Cz, and P4 according to the international 10/10 system.
The experiments included both resting-state recordings (with eyes open and closed) and auditory stimulation tasks involving music in participants' native and non-native languages, as well as neutral instrumental music. These stimuli were delivered through both in-ear and bone-conduction headphones. For the purposes of this project, analysis was restricted to Experiment 7, which featured auditory stimulation using in-ear headphones—a condition chosen for its clear linguistic relevance and consistent setup across participants.
To ensure a clean signal and reduce electrical interference, the filtered version of the dataset was used. This version includes data processed through a 1–40 Hz first-order Butterworth bandpass filter and a 50 Hz notch filter. The recordings were provided in .csv
format, and each file corresponded to a unique combination of subject, stimulus type, and session.
Channel Selection
Among the four EEG channels available in the dataset, the analysis focused exclusively on the F8 electrode. Positioned over the right prefrontal cortex, F8 is commonly associated with executive control, auditory attention, and language-related processing. This choice was guided both by neuroscientific literature and by practical considerations, as previous studies have shown that the F8 site is particularly sensitive to auditory stimuli and exhibits distinguishable patterns in response to linguistic content. By isolating this channel, the project aimed to streamline feature extraction while preserving meaningful signal dynamics relevant to the classification task.
Preprocessing
Given the high sampling rate and duration of the recordings, each EEG file was initially too large for efficient real-time processing. To address this, each file was split into smaller segments of 4000 rows using Python’s pandas
library. This chunking strategy not only helped manage memory usage but also allowed for standardized input lengths across subjects and conditions.
Following segmentation, the F8 signals were normalized to a consistent scale to minimize amplitude-related variability across participants. Each segment was then vectorized into a fixed-length one-dimensional array, preserving the temporal structure of the EEG signal while making it suitable for use in machine learning models. This preprocessing pipeline ensured that the data was clean, consistent, and computationally manageable, laying the groundwork for effective feature extraction and model training.
Modeling Approaches
To explore the classification of auditory-evoked EEG signals, three modeling pipelines were developed: Logistic Regression, Support Vector Machines (SVM), and Convolutional Neural Networks (CNN). Each model was designed to evaluate a different level of representational complexity in the data, from simple linear separability to deeper, hierarchical feature extraction.
The Logistic Regression model served as a baseline. It was trained on vectorized EEG sequences extracted from the F8 channel and offered a benchmark for evaluating the linear separability of the data. While computationally efficient and interpretable, it struggled to capture the temporal dynamics inherent in EEG signals.
The second approach involved Support Vector Machines, using both linear and Radial Basis Function (RBF) kernels. The RBF kernel, in particular, was better suited to the non-linear characteristics of EEG responses to auditory stimuli. Although this model improved upon the logistic regression baseline, it exhibited sensitivity to noise and had difficulty distinguishing between overlapping classes.
Finally, a Convolutional Neural Network (CNN) was implemented to leverage the temporal and spatial patterns in the EEG signal. The architecture consisted of one-dimensional convolutional layers designed to extract localized signal features, followed by pooling layers to reduce dimensionality, and dense layers to perform the final classification. This model was better able to capture complex, time-dependent features in the data. However, the limited dataset size introduced overfitting risks, suggesting the potential benefit of further regularization or data augmentation strategies.
Results and Evaluation
The performance of the models was evaluated based on classification accuracy and qualitative observations of their generalization ability:
Logistic Regression achieved an accuracy of approximately 60%, revealing its limitations in modeling the subtle and overlapping patterns in EEG signals.
SVM with an RBF kernel reached about 68% accuracy, indicating an improvement in handling non-linearity but still facing challenges with generalization across classes.
CNN models achieved the highest accuracy, ranging between 75–80%, and demonstrated a superior ability to learn from the sequential nature of EEG inputs.
The results clearly demonstrated that deep learning models, particularly CNNs, are more effective at modeling auditory-evoked EEG patterns than traditional linear approaches. Their success is attributed to their capacity to extract localized and hierarchical features, which are crucial in capturing the dynamics of brain activity. Nevertheless, signs of overfitting were observed, emphasizing the need for additional training data and the integration of regularization techniques such as dropout layers, early stopping, or L2 penalties to enhance generalization.
Key Insights
This study demonstrates that EEG signals, even from a single electrode (F8), contain meaningful discriminative features capable of distinguishing between native and non-native auditory stimuli. The findings underscore the importance of modeling the temporal dynamics of brain activity, as traditional classifiers like logistic regression are limited in capturing sequential dependencies. In contrast, models such as CNNs—and potentially other time-aware architectures like recurrent neural networks (RNNs) or long short-term memory networks (LSTMs)—are better suited to extract and interpret these patterns.
The decision to focus on the F8 channel proved effective, as it consistently yielded reliable features for classification. However, it is likely that incorporating signals from additional electrodes and employing multi-channel fusion techniques would enhance overall model performance by providing richer spatial context.
Technical Tools and Implementation
The entire modeling pipeline was developed in Python, utilizing key libraries such as NumPy, pandas, and scikit-learn for data preprocessing and classical machine learning, and TensorFlow/Keras for deep learning implementation. Training and experimentation were conducted on Google Colab, which provided accessible GPU acceleration and an efficient collaborative environment.
The modeling approaches included logistic regression and SVMs—both evaluated with hyperparameter tuning via grid search—as well as a 1D convolutional neural network tailored to the structure of EEG data. The dataset was processed through a custom CSV-based pipeline that chunked each EEG recording into fixed-size windows, ensuring memory efficiency and consistent input shapes for training.
Challenges Encountered
One of the primary challenges was overfitting in the CNN models. While these models achieved high training accuracy, their performance on unseen data sometimes deteriorated, reflecting the limited size and variability of the dataset. Incorporating additional regularization techniques, such as dropout or early stopping, only partially mitigated this issue.
Another hurdle was class imbalance, as minor discrepancies between native and non-native labels introduced bias, particularly affecting the SVM’s decision boundary. Additionally, inherent EEG signal noise—including movement artifacts and eye blinks—further complicated classification, suggesting that future iterations should integrate more advanced filtering and artifact removal techniques.
Future Directions
To build upon the results of this study, several future directions are proposed. First, multi-channel modeling would allow the integration of spatial information across all available electrodes, potentially improving both accuracy and robustness. Secondly, exploring recurrent architectures such as LSTMs or gated recurrent units (GRUs) could better capture long-range temporal dependencies within the EEG signal.
Another promising avenue is the application of transfer learning, where pretrained models on similar EEG tasks are fine-tuned on auditory classification tasks to improve generalization and reduce training time. Finally, deploying the trained models in a real-time streaming context could pave the way for practical brain-computer interface (BCI) applications in language comprehension and auditory cognition.
Conclusion
This project highlights the feasibility of classifying auditory-evoked EEG signals using machine learning, with deep learning models—particularly CNNs—demonstrating superior performance. By effectively capturing both temporal and spatial aspects of EEG data, these models contribute to advancing neural decoding techniques and pave the way for future BCI applications.
The results validate the informativeness of the F8 electrode for auditory processing and illustrate how convolutional architectures outperform traditional classifiers when applied to temporal neurophysiological data. While challenges such as overfitting and signal noise remain, this work sets the foundation for more advanced and scalable EEG-based auditory cognition systems.