Perception of sounds sequences: predictions for behavioural measurements generated with a computational model of auditory cortex

Dublin Core

Title

Creator

Zsofia Belteki

Date

2015

Description

Behavioural and neuroscientific research into sound perception shows that our auditory system is able to represent the temporal structure of sounds over a wide range of time windows – a process labelled as temporal binding. Recent computational modelling work suggests that synaptic depression in auditory cortex, responsible for adaptation of neural responses to repeated stimuli, is also the memory mechanism which allows for temporal structure of sounds to be represented. This project aimed to generate behavioural predictions of this explanation of temporal binding. Simulations examined how the cortex is able to discriminate between sound sequences differing from each other in terms of the timing, amplitude, and frequency of the sequence elements. Along with the temporal length of the sequences, the lifetime of neural adaptation was manipulated. The results predict that the thresholds for discriminating sound sequences should be tuned to a given sequence duration. These findings are discussed in light of the previous research on how the dynamics and anatomical structures within the auditory cortex may facilitate neural adaptation.

Subject

Cartesian state space difference calculations

Source

The current project investigated the processing of sound sequences by applying a computational model of the auditory cortex described in more detail in the study by May and Tiitinen (2013). The coding of the experiment, including stimulus design and experimental design, along with data analysis all took place in MATLAB.

Model structure and dynamics

Structure. The model is made up of 14 brain areas, with 13 cortical areas that include three core, eight belt and two parabelt areas of the auditory cortex, and one sub-cortical area simulating the thalamus. Each area is made up of 16 computational units representing cortical columns, with each column comprising one excitatory (pyramidal) and one inhibitory population of neurons. Altogether, this meant that there are 224 columns within the model. There are three levels of structural connections, namely interaction within columns, interaction between columns and interactions between the different cortical areas.
Structural connections are expressed through connection matrices that describe synaptic connections between the excitatory populations (Wee), from inhibitory to excitatory populations (Wei), and from excitatory to inhibitory populations (Wie). Intra-column connections are assumed to be the strongest, with the synaptic weight for the within-column excitatory feedback (i.e., the diagonal values of Wee ) being set to 6, and the within-column weight values of Wei and Wie (inter-neuron connections targeting excitatory and inhibitory cells respectively) being set to 3.5.
The excitatory population of each column made lateral connections to the excitatory populations of neighboring columns within the area. These connections extended to a distance of two columns on either side. Similarly, the excitatory population connected to neighboring inhibitory populations across a distance of five columns, with these connections accounting for lateral inhibition (see Figure 1). In both cases, there was a Gaussian drop off of the weight strength. Also, there was a stochastic element to the weights, with a 10% random jitter added to them. These procedures represent modifications in relation to the original model of May & Tiitinen (2013) and are described in Hajizadeh, Matysiak, May, König (in preparation).
Connections made by the inhibitory population were assumed to be local, and so targeted only the excitatory population in the same column (see Figure 1). Inter-area connections were modelled from anatomical research in primates (Kaas and Hackett, 2000) and were contained entirely in Wee. The tonotopically organized afferent input Iaff targeted the thalamus, where each column functioned as a frequency channel to spectrally organize the input. The thalamus was connected to the three core areas. These were interconnected with the eight belt areas, and the eight belt areas were subsequently interconnected with two parabelt areas. The model had a serial structure, with no direct connections between the core and the parabelt (see Figure 2). Core and belt connections only occurred between neighboring areas, resulting in multiple core-belt-parabelt streams that had roughly a rostral and caudal subdivision (De la Moethe, Blumell, Kajikawa & Hackett, 2006). Connections between the areas were topographic, with each inter-area sub-division of Wee being characterized by most connections occurring near the diagonal, with a Gaussian drop-off in weight strength (as explained in Hajizadeh et al., in preparation).
Dynamics. The dynamical unit of the model was the cortical microcolumn, which was made up of a population of excitatory and inhibitory cells, characterized by a single state variable u and v, respectively, expressing the mean activity of the population. For each excitatory population, its mean firing rate g depended on the state variable u through a non-linear monotonically increasing function g(u) = tanh (2/3) (u - ) for u > , g(u) = 0 otherwise, where  = 0.1 was a threshold constant. The mean firing rate of the inhibitory population was similarly determined as g(v). Collecting the states of the excitatory and inhibitory cell populations into vectors u = [u1….uN] and v = [v1…vN], the dynamic equation of the neural interactions were where m = 30ms is the membrane time constant and Iaff describes the afferent input targeting the thalamus.

Adaptation. The underlying mechanism for neural adaptation operating on the time scale of seconds is short-term synaptic depression (Wehr & Zador, 2005). To simulate this, all excitatory connections in cortex (i.e., the elements ij of Wee and Wie) were modulated by a time dependent depression term aij(t), where i and j are the index of the post- and presynaptic population, respectively. This term depended on the pre-synaptic spiking rate through the equation.
Here, on = 100ms is the onset time constant and rec is the time constant for the adaptation recovery from depression and thus expresses the lifetime of adaptation. In the current experiments, rec was varied in the 800-2000ms range in seven steps of 200ms. This range reflects electrophysiological findings whereby the adaptation of the N1m response (the MEG equivalent of the N1) can be encapsulated in a time constant that varies across participants in the range of 1-4 seconds (Lu, Williamson & Kaufman, 1992).
Stimuli and Procedures
Stimuli sets comprised sequences of three consecutively presented tones (50ms duration, 5ms linear onset & offset ramps), with the sequence being characterized by its total duration, measured as the onset from first tone to onset of third tone. For each measurement, two sequences of the same duration were presented to the model. While the third tone in each sequence was always the same (amplitude = 1; input via thalamic frequency channel 7, middle of tonotopic map), the two sequences differed in terms of the first two tones, that is, in terms of the stimulation history of the final tone (see Figure 3). Simulations were carried out in three experiments where the difference across the sequences was either in the timing, amplitude, or frequency of the first tones. In each experiment, all other aspects of the sequences were kept constant. This eliminated any counter-effects, with distinctions between sequences depending solely on the manipulation made (independent variable). In each experiment, the total duration of the sequence was varied in the range of 500-4000ms in steps of 200ms, creating a total of 18 different sequence durations. As explained above, the lifetime of adaptation was also varied (from 800-2000ms) to simulate a population of participants. For a diagram of the Stimuli sets, see Figure 3.

Experiment 1: variations in timing. This looked at the model’s ability to discriminate temporal patterns represented by two sequences of three identical tones (amplitude = 1; frequency channel 7). These sequences were identical, except for the presentation time of the middle tone. In the first sequence, the SOI of the middle tone was jittered away from regular presentation by an amount representing 5% of the total duration of the sequence away (see Figure 3). The second sequence was a reversed version of the first.

Experiment 2: variations in amplitude. Here, the two sequences varied in terms of the amplitude of the first two tones. In the first sequence, the amplitude of the first and second tone was 1.05 and 0.95, respectively. In the second sequence, these values were reversed. The final third tone had a fixed amplitude of 1. The three tones were presented at regular intervals, and their frequency was 7 on the tonotopic map of the thalamus.
Experiment 3: variations in frequency. Here, the frequency history preceding the third tone was varied. In the first sequence, the first tone had a frequency of 6 and the second tone had frequency 8. Reversed frequencies were used in the second sequence. The three tones were presented at regular SOIs.

Analysis

The third tone was kept constant both within the two-sequence stimuli sets and across the experiments to ensure that the variations in the response to this final tone reflected changes in the stimulation history only. Thus, the ability of the model to discriminate between the temporal structure of two sequences could be analyzed by examining the activity elicited by the third tone of each sequence.
As such, the firing rates of the excitatory populations in the cortical areas were treated as coding the previous stimulation history. The response to the third tone was quantified by averaging the firing rate of each excitatory population in a 200-ms time window following the onset of the third tone (see Figure 3). This resulted in a 208-dimensional vector, that is, a point in 208-dimensional state space where each axis represents the activity of one cortical column. The difference in the responses to two sequences was then quantified as the Cartesian distance (using the norm.m function in MATLAB) between the two respective points in state space. This distance measure, denoted by Dstate, was taken to represent the ability of cortex to discriminate between tone sequences.
For each experiment, the analysis determined how Dstate changed as a function of the total duration of the sequence. Also, this dependence of Dstate on duration was examined in the case of different adaptation lifetimes

Publisher

Lancaster University

Format

Data/MATLAB

Identifier

Belteki2015

Contributor

Ellie Ball

Rights

Open

Relation

None

Language

English

Type

Data

Coverage

LA1 4YF

LUSTRE

Supervisor

Perception of sounds sequences: predictions for behavioural measurements generated with a computational model of auditory cortex

Project Level

MSc

Topic

Modelling (Computational)

Sample Size

Unknown

Statistical Analysis Type

Cartesian distance

Files

Citation

Zsofia Belteki, “Perception of sounds sequences: predictions for behavioural measurements generated with a computational model of auditory cortex ,” LUSTRE, accessed August 1, 2026, https://www.johnntowse.com/LUSTRE/items/show/93.