Fitting a model#
We can fit a model to individual participant data in a free-recall dataset by maximizing the probability of the data according to the model. This involves using a search algorithm to adjust the model parameters until the probability, or likelihood (see Evaluating a model), of the data is maximized.
First, load some sample data to fit:
In [1]: from cymr import fit, cmr
In [2]: data = fit.sample_data('Morton2013_mixed').query('subject <= 3')
Search Definition#
Next, we need to define our search parameters. There are two types of parameters used specifically for searches:
- fixed
Parameters that have a fixed value. These parameters are not searched.
- free
Parameters that may vary to fit a dataset. For a search, must specify a range to be searched over.
We’ll also use two other types of parameters that set properties of the model based on a given parameter set:
- dependent
Parameters that are derived from other parameters. These parameters are specified using an expression that generates them from other parameters.
- weights
Parameters that define weighting of different patterns in the model.
We can organize these things by creating a Parameters object. To run a simple and fast search, we’ll fix almost all parameters and just fit one, \(\beta_\mathrm{enc}\). For a real project, you may want to free other parameters also to fit individual differences in the primacy effect, temporal clustering, etc.
In [3]: par = cmr.CMRParameters()
In [4]: par.set_fixed(T=0.1, Lfc=0.15, Lcf=0.15, P1=0.2, P2=2,
...: B_start=0.3, B_rec=0.9, X1=0.001, X2=0.25)
...:
In [5]: par.set_free(B_enc=(0, 1))
In [6]: par.set_dependent(Dfc='1 - Lfc', Dcf='1 - Lcf')
To simulate free recall using the context maintenance and retrieval (CMR) model, we must first define pre-experimental weights for the network. For this example, we’ll define localist patterns, which are distinct for each presented item. They can be represented by an identity matrix with one entry for each item. See Evaluating a model for details.
In [7]: n_items = 768
In [8]: study = data.query("trial_type == 'study'")
In [9]: items = study.groupby('item_index')['item'].first().to_numpy()
In [10]: patterns = {'items': items, 'vector': {'loc': np.eye(n_items)}}
In [11]: par.set_sublayers(f=['task'], c=['task'])
In [12]: weights = {(('task', 'item'), ('task', 'item')): 'loc'}
In [13]: par.set_weights('fc', weights)
In [14]: par.set_weights('cf', weights)
We can print the parameter definition to get an overview of the settings.
In [15]: print(par)
fixed:
T: 0.1
Lfc: 0.15
Lcf: 0.15
P1: 0.2
P2: 2
B_start: 0.3
B_rec: 0.9
X1: 0.001
X2: 0.25
free:
B_enc: (0, 1)
dependent:
Dfc: 1 - Lfc
Dcf: 1 - Lcf
dynamic:
sublayers:
f: ['task']
c: ['task']
weights:
fc: {(('task', 'item'), ('task', 'item')): 'loc'}
cf: {(('task', 'item'), ('task', 'item')): 'loc'}
sublayer_param:
The to_json()
method of
CMRParameters
can be used to save out parameter
definitions to a file. The output file uses JSON format, which is
both human- and machine-readable and can be loaded later to restore
search settings:
In [16]: par.to_json('parameters.json')
In [17]: restored = cmr.read_config('parameters.json')
Parameter Search#
Finally, we can run the search. Parameters will be optimized separately
for each participant. For speed, we’ll set the tolerance to
be pretty high (0.1); normally this should be much lower to ensure
that the search converges. It is also often a good idea to run multiple
replications of each search using the optional n_rep
input to ensure
that the search converges on the best fitting parameter set. Here,
we’ll just run one search for each participant.
In [18]: model = cmr.CMR()
In [19]: results = model.fit_indiv(data, par, patterns=patterns, tol=0.1)
In [20]: best = fit.get_best_results(results)
In [21]: best[['B_enc', 'logl', 'n', 'k']]
Out[21]:
B_enc logl n k
subject
1 0.693528 -954.976072 373 1
2 0.615342 -1109.066717 426 1
3 0.743079 -983.635155 379 1
The results give the complete set of parameters, including fixed parameters, optimized free parameters, and dependent parameters. It also includes fields with statistics relevant to the search:
- logl
Total log likelihood for each participant. Greater (i.e., less negative) values indicate better fit.
- n
Number of data points fit.
- k
Number of free parameters.
Using search output#
To use the output from the search for evaluating the model on new data or running simulations, we must first convert the results DataFrame into a dictionary.
In [22]: subj_param = best.T.to_dict()
As an example of using the best-fitting parameters, we can use them to confirm the likelihood values from the search.
The group_param
input to likelihood()
sets parameters that
are the same for all participants, while the subj_param
sets
subject-specific parameters. Here, we’ll just set everything through the
subj_param
input.
In [23]: group_param = {}
In [24]: model.likelihood(
....: data,
....: group_param,
....: subj_param=subj_param,
....: param_def=par,
....: patterns=patterns,
....: )
....:
Out[24]:
logl n
subject
1 -954.976072 373
2 -1109.066717 426
3 -983.635155 379
In Generating simulated data, we’ll use a set of parameter values to generate simulated data for analysis.