Generating simulated data#
While fitting a model can be accomplished by evaluating the log likelihood of the data (see Fitting a model), to interpret the behavior of a model with a given set of parameters it’s important to generate simulated data. These data can then be analyzed the same way as real data.
Loading data to simulate#
First, we need an experiment to run. This is specified using psifr DataFrame format. We’ll use data from a sample experiment. Only the study trials are needed in this case. They’ll specify the order in which items are presented in each list during the simulation.
In [1]: from cymr import fit, cmr
In [2]: from psifr import fr
In [3]: data = fit.sample_data('Morton2013_mixed')
In [4]: items = (data
...: .query('trial_type == "study"')
...: .groupby('item_index')
...: .first()['item']
...: .to_numpy()
...: )
...:
In [5]: data = fr.filter_data(data, subjects=1)
In [6]: fr.filter_data(data, trial_type='study')
Out[6]:
subject list position ... response response_time list_category
0 1 2 1 ... 3.0 1.255 mixed
1 1 2 2 ... 3.0 1.040 mixed
2 1 2 3 ... 2.0 1.164 mixed
3 1 2 4 ... 2.0 0.829 mixed
4 1 2 5 ... 3.0 0.872 mixed
... ... ... ... ... ... ... ...
1062 1 48 20 ... 3.0 0.641 mixed
1063 1 48 21 ... 3.0 0.997 mixed
1064 1 48 22 ... 2.0 0.589 mixed
1065 1 48 23 ... 3.0 0.733 mixed
1066 1 48 24 ... 3.0 0.495 mixed
[720 rows x 12 columns]
Note
It’s also possible to use columns of the DataFrame to set dynamic parameter values that vary over trial. For example, if some lists have a distraction task, you could have context integration rate vary with the amount of distraction. See dynamic parameter methods in Parameters.
Setting parameters#
Next, we define parameters for the simulation. Often these will be
taken from a parameter fit (see Fitting a model). Here, we’ll
just define the parameters we want directly. We also need to create
a CMRParameters
object to define how the model
patterns are used.
In [7]: param = {
...: 'B_enc': 0.6,
...: 'B_start': 0.3,
...: 'B_rec': 0.8,
...: 'Afc': 0,
...: 'Dfc': 0.85,
...: 'Acf': 1,
...: 'Dcf': 0.85,
...: 'Aff': 0,
...: 'Dff': 1,
...: 'Lfc': 0.15,
...: 'Lcf': 0.15,
...: 'P1': 0.8,
...: 'P2': 1,
...: 'T': 0.1,
...: 'X1': 0.001,
...: 'X2': 0.35
...: }
...:
In [8]: patterns = {'items': items, 'vector': {'loc': np.eye(768)}}
In [9]: param_def = cmr.CMRParameters()
In [10]: param_def.set_sublayers(f=['task'], c=['task'])
In [11]: weights = {(('task', 'item'), ('task', 'item')): 'loc'}
In [12]: param_def.set_weights('fc', weights)
In [13]: param_def.set_weights('cf', weights)
Running a simulation#
We can then use the data, which define the items to study and recall on each list, with the parameters and patterns, to general simulated data using the CMR model. We’ll repeat the simulation five times to get a stable estimate of the model’s behavior in this task.
In [14]: model = cmr.CMR()
In [15]: sim = model.generate(data, param, param_def=param_def, patterns=patterns, n_rep=5)
Analying simulated data#
We can then use the Psifr package to score and analyze the simulated data just as we would real data. First, we score the data to prepare it for analysis. This generates a new DataFrame that merges study and recall events for each list:
In [16]: sim_data = fr.merge_free_recall(sim)
In [17]: sim_data
Out[17]:
subject list item ... intrusion prior_list prior_input
0 1 2 SEAN PENN ... False NaN NaN
1 1 2 AUDREY HEPBURN ... False NaN NaN
2 1 2 ST PATRICKS CATHEDRAL ... False NaN NaN
3 1 2 LES INVALIDES ... False NaN NaN
4 1 2 GREAT ZIMBABWE RUINS ... False NaN NaN
... ... ... ... ... ... ... ...
3595 1 240 CHE GUEVARA ... False NaN NaN
3596 1 240 OAHU BEACH ... False NaN NaN
3597 1 240 GATEWAY ARCH ... False NaN NaN
3598 1 240 WHITE HOUSE ... False NaN NaN
3599 1 240 WRIGLEY BUILDING ... False NaN NaN
[3600 rows x 11 columns]
Next, we can plot recall as a function of serial position:
In [18]: recall = fr.spc(sim_data)
In [19]: g = fr.plot_spc(recall)
We can also analyze the order in which items are recalled by calculating conditional response probability as a function of lag:
In [20]: crp = fr.lag_crp(sim_data)
In [21]: g = fr.plot_lag_crp(crp)
Peaks at short lags (e.g., -1, +1) indicate a tendency for items in nearby serial positions to be recalled successively.
See psifr.fr
for more analyses that you can run using Psifr.