Spiketimes Pandas Tutorial¶
Pandas dataframes (dfs) provide a convienient way to store and manipulate complex data. Dataframes are a natural way to store grouped data such as collections of spiketrains belonging to different trials, neurons and animals.
Spiketimes provides a wide range of functions to analyse neuroscience data stored in pandas dataframes. Data is always assumed to follow tidy data principals: one row per observation and one column per variable.
Generating Dataframes of Spiketrains¶
Conversion from Numpy Arrays¶
The spiketimes.df.conversion
moudle provides various functions to convert between numpy arrays and dfs.
To convert between a list of arrays and a df:
>>> from spiketimes.df.conversion import list_to_df
>>> from spiketimes.simulate import homogeneous_poisson_process
>>>
>>> # generate list of 20 spiketrains
>>> st_list = [homogeneous_poisson_process(rate=3, t_stop=60) for _ in range(20)]
>>> df = list_to_df(st_list)
>>> df.sample(3)
spiketrain spiketimes
91 11 31.853600
55 14 19.636683
159 7 48.439054
Simulate Spiketrains¶
The spiketimes.df.simulate
moudule provides functions for simulating spiketrains.
To simulate 50 spiketrains as homogeneous poisson processes:
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>>
>>> df = homogeneous_poisson_processes(rate=3, t_start=0, t_stop=20, n=50)
>>> df.sample(3)
spiketrain spiketimes
13 5 2.965396
38 9 14.640657
34 28 8.546893
To simulate 50 spiketrains as with fluctuating firing rates:
>>> from spiketimes.df.simulate import imhomogeneous_poisson_processes
>>>
>>> time_rate = [
>>> (60, 2),
>>> (10, 10),
>>> (50, 3)
>>> ]
>>> df = imhomogeneous_poisson_processes(time_rate=time_rate, n=50, t_start=0)
>>> df.sample(3)
spiketrain spiketimes
320 42 109.499891
98 3 52.532897
349 13 106.532938
Generate Surrogate Spiketrains¶
The spiketimes.df.surrogates
module contains functions for generating spiketrain surrogates. Surrogates are are spiketrains which are similar share some properties with a parent spiketrain but are otherwise different. They are often used in resampling statistics to generate bootstrap replicates.
To generate multiple surrogates for each spiketrain in a dataframe:
>>> from spiketimes.df.surrogates import shuffled_isi_spiketrains_by
>>>
>>> # simulate many parent spiketrains
>>> df_parents = homogeneous_poisson_processes(rate=20, t_stop=20, n=10)
>>> df_parents.sample(3)
spiketrain spiketimes
246 2 13.239410
323 9 14.699663
192 4 10.124832
>>> # generate 500 shuffled ISI surrogates for each spiketrain
>>> df_surrogates = shuffled_isi_spiketrains_by(
>>> df_parents,
>>> spiketimes_col="spiketimes",
>>> by_col="spiketrain",
>>> n=500
>>> )
>>> df_surrogates.sample(3)
spiketrain surrogate spiketimes
1130311 5 395 3.975811
738878 3 434 12.215798
719863 3 386 17.599182
To generate 50 jitter spiketrains for each parent spiketrain:
>>> from spiketimes.df.surrogates import jitter_spiketrains_by
>>>
>>> df_surrogates = jitter_spiketrains_by(
>>> df_parents,
>>> jitter_window_size=1,
>>> spiketimes_col="spiketimes",
>>> by_col="spiketrain",
>>> n=50)
>>> df_surrogates.sample(3)
spiketrain surrogate spiketimes
45771 2 25 18.159243
172165 9 9 17.885836
155980 8 18 11.078205
Aligning Data¶
The spiketimes.df.alignment
module contains functions for aligning data to events.
To align all data to an array of evenets:
>>> from spiketimes.df.alignment import align_around
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>> import numpy as np
>>>
>>> # generate events
>>> events = np.cumsum(np.random.random(40) * 5)
>>>
>>> # generate spiketrains
>>> df = homogeneous_poisson_processes(rate=10, t_stop=120, n=50)
>>>
>>> # align the spiketrains to the events
>>> df = align_around(df, data_colname="spiketimes", events=events, t_before=1)
>>>
>>> df.sample(3)
spiketrain spiketimes aligned
297 33 28.858288 0.837890
453 49 44.039706 1.588492
274 42 27.450466 -0.569932
To align different sets of data to different sets of events:
>>> from spiketimes.df.alignment import align_around_by
>>> import pandas as pd
>>>
>>> # generate spiketrains recorded accross 5 sessions
>>> df_data = pd.concat(
>>> [
>>> homogeneous_poisson_processes(rate=3, t_stop=100, n=20).assign(session=i)
>>> for i in range(1, 6)
>>> ]
>>> )
>>>
>>> # generate 100 events across 5 sessions
>>> df_events = pd.concat(
>>> [
>>> pd.DataFrame({"session": i, "spiketimes": np.random.randint(0, 100, size=100)})
>>> for i in range(1, 6)
>>> ]
>>> )
>>>
>>> print(df_data.sample(3), "\n")
spiketrain spiketimes session
271 10 92.705551 5
281 7 90.418194 5
227 19 85.093771 4
>>> print(df_events.sample(3))
session spiketimes
19 4 80
0 3 80
85 3 68
>>> # align data from each session to events from the corresponding session
>>> df_data = align_around_by(df_data=df_data,
>>> df_data_data_colname="spiketimes",
>>> df_data_group_colname="session",
>>> df_events=df_events.sort_values("spiketimes"),
>>> df_events_event_colname="spiketimes",
>>> df_events_group_colname="session")
>>>
>>> # data aligned to events from that session.
>>> df_data.sample(3)
spiketrain spiketimes session aligned
117 2 41.899730 5 1.899730
271 12 75.683014 4 0.683014
19 13 6.800494 5 0.800494
Binning Data¶
The spiketimes.df.binning
module contains functions for binning data.
To bin events into counts along a regular interval:
>>> from spiketimes.df.binning import binned_spiketrain
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>>
>>> # Generate some random spiketrains
>>> df_data = homogeneous_poisson_processes(rate=4, t_stop=120, n=20)
>>> df_data.head(3)
spiketrain spiketimes
0 0 0.509930
1 0 0.643373
2 0 0.751396
>>> # Count the number of spikes occuring every 0.5 seconds (2Hz sampling rate) per neuron
>>> df_binned = binned_spiketrain(
>>> df=df_data, spiketimes_col="spiketimes", by_col="spiketrain", fs=2, t_start=0
>>> )
>>> df_binned.tail(3)
spiketrain time spike_count
4777 19 118.0 3
4778 19 118.5 4
4779 19 119.0 2
To get event counts at user-specified bins per spiketrain.
>>> from spiketimes.df.binning import binned_spiketrain_bins_provided
>>> import numpy as np
>>>
>>> # specify bins
>>> bins = np.arange(0.5, 110, 5)
>>>
>>> # get counts of events in each bin by spiketrain
>>> binned = binned_spiketrain_bins_provided(df_data, bins=bins)
>>> binned.head()
spiketrain bin counts
0 0 0.50 17
1 0 5.50 19
2 0 10.5 14
3 0 15.5 19
4 0 20.5 15
To get the closest event to each spiketrain (useful for assigning each spike to a trial):
>>> from spiketimes.df.binning import which_bin
>>>
>>> # get bin value and idx for corresponding bin for each event
>>> binned = which_bin(df=df_data, bin_edges=bins)
>>> binned.head()
spiketrain bin_idx bin_values spiketimes
0 0 NaN NaN 0.204892
1 0 NaN NaN 0.243031
2 0 NaN NaN 0.343491
3 0 0.0 0.5 1.166362
4 0 0.0 0.5 1.172659
To get spike counts following events:
>>> from spiketimes.df.binning import spike_count_around_event
>>>
>>> # generate 5 spiketrains
>>> df_data = homogeneous_poisson_processes(rate=2, t_stop=120, n=5)
>>> # generate some events
>>> events = np.arange(5, 120, 5)
>>>
>>> # get spike counts 0.5 seconds after each event per spiketrain
>>> df_counts = spike_count_around_event(df=df_data, events=events, binsize=0.5, spiketimes_col="spiketimes")
>>> df_counts.head(4)
spiketrain event counts
0 0 05 1
1 0 10 0
2 0 15 0
3 0 20 4
To get spike counts following events where different spiketrains have different sets of events. For example different event times and spiketrains from different sessions.
>>> from spiketimes.df.binning import spike_count_around_event_by
>>> import pandas as pd
>>>
>>> # generate spiketrains recorded accross 5 sessions
>>> df_data = pd.concat(
>>> [
>>> homogeneous_poisson_processes(rate=3, t_stop=100, n=20).assign(session=i)
>>> for i in range(1, 6)
>>> ]
>>> )
>>> df_data.head(3)
spiketrain spiketimes session
0 0 0.048724 1
1 0 0.821620 1
2 0 1.283268 1
>>> # generate events at slightly differnt times across 5 sessions
>>> df_events = pd.concat(
>>> [
>>> pd.DataFrame({"session": i, "spiketimes": np.arange(2, 100, 3) + np.random.random() * i})
>>> for i in range(1, 6)
>>> ]
>>> )
>>> df_events.tail(3)
session spiketimes
30 5 96.450331
31 5 99.450331
32 5 102.450331
>>> # get spikecount 0.2s following each event per spiketrain recorded in that session
>>> df_counts = spike_count_around_event_by(df_data=df_data,
>>> binsize=0.2,
>>> df_data_data_colname="spiketimes",
>>> df_data_group_colname="session",
>>> df_data_spiketrain_colname="spiketrain",
>>> df_events=df_events,
>>> df_events_event_colname="spiketimes",
>>> df_events_group_colname="session")
>>> df_counts.head(4)
spiketrain event counts session
0 0 2.972297 0 1
1 0 5.972297 0 1
2 0 8.972297 0 1
3 0 11.972297 1 1
Statistics¶
The spiketimes.df.statistics
module contains functions for calculating statistics on groups of spiketrains.
To calculate the mean firing rate of each spiketrain in a DataFrame:
>>> from spiketimes.df.statistics import mean_firing_rate_by
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>>
>>> df_spikes = homogeneous_poisson_processes(rate=5, t_stop=120, n=5)
>>> df_mfr = mean_firing_rate_by(df=df_spikes, t_start=0, t_stop=120)
>>> df_mfr.head(3)
spiketrain mean_firing_rate
0 0 5.166667
1 1 5.316667
2 2 4.758333
To calculate mean firing rate excluding periods of silence:
>>> from spiketimes.df.statistics import mean_firing_rate_ifr_by
>>> from spiketimes.df.simulate import imhomogeneous_poisson_processes
>>>
>>> # generate 20 spiketrains with a firing rate of 10 Hz 120 second silent period
>>> time_rate = [
>>> (120, 10),
>>> (320, 0.2),
>>> (120, 10)
>>> ]
>>> df_spikes2 = imhomogeneous_poisson_processes(time_rate=time_rate, n=20)
>>>
>>> # calculate mean firing rate by spiketrain excluding silent periods
>>> df_mfr2 = mean_firing_rate_ifr_by(df=df_spikes2,
>>> fs=1,
>>> exclude_below=0.5,
>>> t_start=0)
>>> df_mfr2.head(3)
spiketrain mean_firing_rate_ifr
0 0 9.631736
1 1 9.669442
2 2 9.486311
To estimate “instantaneous” firing rate at a regular interval:
>>> from spiketimes.df.statistics import ifr_by
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>>
>>> # simulate 40 spiketrains with a 10Hz firing rate
>>> df_spikes = homogeneous_poisson_processes(rate=10, t_stop=120, n=40)
>>>
>>> # estimate the firing rate of each neuron twice every second from 0 to 120 seconds
>>> df_ifr = ifr_by(df=df_spikes,
>>> fs=2,
>>> t_start=0,
>>> t_stop=120)
>>> df_ifr.head(3)
spiketrain time ifr
0 0 0.0 10.579114
1 0 0.5 10.578063
2 0 1.0 10.575974
To calculate the coefficient of variation of inter-spike-intervals for each spiketrain in a dataframe:
>>> from spiketimes.df.statistics import cv_isi_by
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>>
>>> df_spikes = homogeneous_poisson_processes(rate=5, t_stop=120, n=5)
>>> df_cv = cv_isi_by(df_spikes)
>>> df_cv.head(3)
spiketrain cv_isi
0 0 0.964865
1 1 0.971645
2 2 1.010274
Correlating Spiketrains¶
The spiketimes.df.correlate
module contains functions for correlaing spiketrains.
To calculate the autocorrelation histogram for each spiketrain in a DataFrame:
>>> from spiketimes.df.simulate import homogeneous_poisson_processes
>>> from spiketimes.df.correlate import auto_corr
>>>
>>> df_spikes = homogeneous_poisson_processes(rate=3, t_stop=1200, n=10)
>>> df_auto = auto_corr(df=df_spikes,
>>> num_lags=50,
>>> spiketrain_col="spiketrain",
>>> spiketimes_col="spiketimes")
>>> df_auto.head(3)
spiketrain time_bin autocorrelation
0 0 -0.50 119
1 0 -0.49 108
2 0 -0.48 92
To the cross correlation histogram between each spiketrain in a DataFrame:
>>> from spiketimes.df.correlate import cross_corr
>>>
>>> df_cc = cross_corr(df_spikes,
>>> binsize=0.1,
>>> num_lags=50)
>>> df_cc.head(3)
spiketrain_1 spiketrain_2 time_bin crosscorrelation
0 0 1 -5.0 995
1 0 1 -4.9 999
2 0 1 -4.8 1068
To calculate spike count correlations between all pairs of neurons in a DataFrame:
>>> from spiketimes.df.correlate import spike_count_correlation
>>>
>>> df1 = spike_count_correlation(df_spikes, binsize=0.1)
>>> df2 = spike_count_correlation(df_spikes, binsize=0.1, use_multiprocessing=True)
>>> df1.head()
spiketrain_1 spiketrain_2 R_spike_count
0 0 1 -0.009033
1 0 2 -0.009751
2 0 3 0.015005
3 0 4 0.000109
4 0 5 -0.013520
To test significance of correlations:
>>> from spiketimes.df.correlate import spike_count_correlation_test
>>>
>>> df1 = spike_count_correlation_test(df, binsize=0.01, use_multiprocessing=True, max_cores=10, adjust_p=True)
>>> df1.head()
spiketrain_1 spiketrain_2 R_spike_count p
0 0 1 -0.003057 2.0
1 0 2 -0.001368 2.0
2 0 3 0.001874 2.0