YSPH Biostatistics Seminar: “Estimation and Inference for Networks of Multi-Experiment Point Processes”

October 06, 2022

Information

Speaker: Ali Shojaie, PhD, Professor of Biostatistics, Adjunct Professor of Statistics, University of Washington

October 4, 2022

ID8142

To CiteDCA Citation Guide

00:00<v ->Today it's my pleasure to introduce,</v>
00:02Professor Ali Shojaie.
00:05Professor Shojaie holds master's degrees
00:07in industrial engineering, statistics,
00:10applied math, and human genetics.
00:13He earned his PhD in statistics
00:14from the University of Michigan.
00:17His research focuses on the high dimensional data,
00:19longitudinal data, computational biology,
00:23network analysis, and neuroimaging.
00:26Professor Shojaie is a 2022 fellow
00:29of the American Statistical Association
00:32and 2022 winner of their Leo Breiman Award.
00:36He's a full professor of biostatistics,
00:38adjunct professor of statistics,
00:40and the associate chair for strategic research affairs
00:43in the department of biostatistics
00:45in the University of Washington.
00:47Let's welcome Professor Shojaie.
00:52<v ->Thanks for having me.</v>
00:54Sometimes I get moved by the volume of my voice.
00:57You guys, can you hear me at the back, okay?
01:00Since I'm not gonna use the microphone yet,
01:01but I'd rather not use the microphone at all.
01:06Well, it's a pleasure to be here
01:08and to talk to you about some work that I've doing doing
01:12for the past couple of years.
01:15I'm using machine learning tools for different types of data
01:21that you can understand better how the brain works.
01:29The question really is how do we process
01:32information on our brains?
01:34What is the processing information?
01:41The brain through neurons,
01:43we know that neurons interact with each other.
01:46Neurons do process information.
01:51This is of course related to my broader interests
01:54on network and understanding how things interact
01:57with each other.
01:59Naturally I was drawn into this part here,
02:03but when I talk to scientist colleagues,
02:06then a lot of times I'm asked,
02:08what is the goal of understanding that network?
02:10How do we use it?
02:11How do we
02:15take advantage of that network that we learned?
02:17Here's an example of some recent work that we've been doing
02:21that indicates that learning something about these networks
02:26is actually important.
02:30I should say that this is joint work
02:32with a bunch of colleagues at the University of Washington
02:38has done that is biomedical engineering,
02:43and the main group that has been running these experiments.
02:47And then I'm collaborating with E Shea-Brown
02:49who's in computational scientist,
02:51and Z Harchaoui, computer scientist slash statistician,
02:56and she's been working on this project.
02:59This project, the lab is interested.
03:02And what they do is neurostimulation.
03:05What they wanna do is to see if they could stimulate
03:08in different regions of the brain to make in this case
03:12monkey do certain things
03:14or to restore function that the monkey might have lost.
03:18And it's a really interesting platform
03:22that they've developed.
03:24It's basically small implants that they put
03:28in a region of the brain on these monkeys.
03:31And the implant has two areas when the lasers
03:35beam shine in about 96 in this case,
03:41electrodes that collect data
03:43in that small region of the brain.
03:47This is made possible by optogenetics
03:51meaning that it made the neurons sensitive to these lasers.
03:55When neurons
04:00receive the laser, then they basically get excited,
04:03get activate.
04:05The goal in this research eventually
04:08is to see how the activation of neurons,
04:11which plasticity would change
04:14the connectivity of the neurons,
04:18would result in later on in changing function.
04:23That's the eventual goal of this.
04:24This research work at the very beginning of that.
04:28We are not there yet in terms of understanding function,
04:32understanding the link, the connectivity and contact.
04:35The collaboration with this lab started
04:37when they wanted to predict how the connectivity changes
04:41as a result of this activation.
04:44We wanted to understand whether by changing various factors
04:49in the experiments, the distance between two lasers,
04:52the duration of laser.
04:54How could they accurately predict the changing connectivity?
05:01The way that the experiment is set up
05:02is that basically had these times where they have
05:07activation and then the latency period
05:10and then followed by observation.
05:12They basically observe the activity of these brain regions.
05:20That sort of 96.
05:22Electrodes in this main region over time.
05:25That's the data that they're correct.
05:31Here's a look at this functional connectivity a
05:35and that's what they were trying to predict.
05:40Basically the heat map shows
05:46the links between the various brain lesions,
05:50but 96 of them, you don't wanna.
05:56And if that connectivity is defined based on coherence,
06:01which is basically correlation measure frequency domain,
06:05and we have coherence in four different frequency bands.
06:08These are the standard bands that signal instructive
06:11and they think that they measure activity
06:14and different spatial resolution.
06:16We have theta band, the beta band, the gamma band,
06:18and the high gamma band.
06:20And we wanna see how the connectivity
06:22in these different bands changes
06:25as the effect of these type neurons.
06:31And what...
06:37This is not working.
06:38The clicker stopped working.
06:40We'll figure that.
06:51Let's go on full screen again to see where this goes.
07:00What basically we have
07:01is that we have the baseline connectome
07:03and we have these experimental protocols,
07:07and we're trying to predict how the connectivity changes.
07:10What the lab was doing before was that
07:12they were looking at trying to predict connectivity
07:14based on experimental protocols.
07:18And what they were getting
07:19was actually really bad prediction.
07:22These are test R squares.
07:26And what they were getting was about 5% test R square
07:30when they were using these protocol features
07:32to predict how to connect with these gene.
07:34And the first thing that we understood
07:36and so you see it that sort of really bad
07:38is that that's the prediction.
07:39If that's the prediction that you're getting,
07:41then really bad prediction.
07:43The first thing that we noticed in this research
07:46was that it's actually important to incorporate
07:50the features of the current state of connectivity
07:53in order to predict how to make them useful.
07:56What we did was that in addition to those protocol features,
07:59we added some network features,
08:01the current state of the network in order to predict
08:03how it's gonna change.
08:04And this is, to me, this is really interesting
08:06because it basically says that our prediction
08:10has to be subject specific
08:13depending on the current state of each month
08:14these connectivity, how their connectivity
08:18is going to change will be different.
08:21And what we saw was that when we incorporated
08:24these network features, we were able to improve quite a bit
08:28in terms of prediction.
08:29We're still not doing hugely good,
08:33we're only getting like test R squared of what, 25%.
08:36But what you see that sort of the connectivity
08:38is now, the prediction is now much more.
08:41How the connectivity.
08:43And also in terms of the pictures, you see that going from,
08:46so say this is the true,
08:48the first part in d is the true change in connectivity,
08:52e is what you would get from just the protocol features,
08:56and you see that prediction is really bad,
08:57and f is what you get when you combine protocol features
09:01and the network features.
09:03That prediction is closer to the true
09:09change in connectivity than just using the protocol feature.
09:12This was the first thing that we learned from this research.
09:15The second part of what we learned is that
09:18it also matters which approach you used the prediction.
09:21What they had done was that they were using some simple
09:24like linear model for prediction.
09:26And then we realized that we need to use something more
09:30expressive and then we sort of ended up using
09:32these non-linear additive models
09:34that we had previously developed,
09:36partly because while they have a lot of expressive power,
09:40they're still easy to interpret.
09:43Interpretation for these additive models is still easy
09:46and particularly we see what the shapes
09:51basically these functions are.
09:52For example, with the distance we see how the function
09:55changes and that helps with the design of these experience.
09:58I'm not gonna spend too much time
10:00talking about the details of this
10:01given that we only have 50 minutes
10:03and I wanna get to the main topic,
10:05but basically these additive models
10:08are built by combining these features.
10:11Think of tailor expansion in a very simple sense
10:14that you have a linear term, you have a quadratic term,
10:17you have a cubic term.
10:18And the way that sort we form these additive models
10:21is that we automatically select the degree of complexity
10:26of each additive feature,
10:28whether it's says linear, or quadratic, or cubic, etcetera.
10:32We also allow some features to be present in the models,
10:36features not to be present.
10:37What we end up with are these patterns
10:41where some features are real complex and other features,
10:43and that's automatically decided from data.
10:47This model is good in this prediction
10:51and it allows us to come up with these sets of predictions.
10:53We see now that for example, for coherence difference,
10:58which is the network feature,
10:59that's the coherence difference.
11:01Network distance, that's the distance
11:03between the two portals.
11:04The two laser points.
11:05We get these two patterns estimated
11:07and then when we combine them, we get this surface basically
11:10that determines how the connectivity,
11:15changing connectivity could be predicted
11:17based on these two features.
11:18And all of this is done automatically based on data.
11:23This approach, again, sort of the key feature of it
11:25is that it combines the network features
11:28of the current state of connectivity with protocol features
11:30in order to do a better job of prediction.
11:33This is a research that we just started
11:36and we will continue this research
11:39for the next at least five years.
11:42But the goal of it is eventually to see
11:44if we could predict the function
11:46and ultimately if we could build a controller
11:49that we could determine how to change function
11:52based on various features of the experiment.
11:57I mentioned all of this to say that knowing
11:59and learning the network matters.
12:01We need to learn the current state of connectivity,
12:04for example, in this work in order to be able to design
12:07experiments that would hopefully help
12:12and restore function.
12:15Now in this particular work,
12:17what we did was that we used a very simple
12:20notion of connectivity.
12:21We used coherence, which is basically correlation,
12:24but we know that that's not always the best
12:28way to define connectivity between ranges.
12:32And so what I wanna talk about for the remaining
12:3640 minutes or so is how do we learn connectivity
12:40between neurons?
12:42And this is using a different type of data
12:45that I had thought about before,
12:46and I'm hoping that so I could show you this clip,
12:51which is that shows the actual raw data.
12:55The data is actually a video.
12:58And this is activity of individual neurons
13:00in a small region of the brain.
13:03These dots that you see popping up,
13:04these are individual neurons firing over time.
13:10And you see that sort of neuron fires
13:12and other neuron fires, et cetera, et cetera.
13:15That's the raw data that we're getting.
13:18And the goal is to understand
13:21based on this pattern of activation of neurons,
13:24how neurons talk to each other basically.
13:27Now I'm gonna go back here.
13:34And so the data of that video that I showed you,
13:38basically, here's some snapshot of that data.
13:41Here's one frame.
13:43And there's a lot of steps in getting this data
13:46to place it a bit more quick.
13:50Were not gonna talk about this,
13:52but sort of we need to first identify where the neurons are.
13:55No one tells us where the neurons are in that video.
13:58We need to first identify where the neurons are.
14:00We need to identify when they swipe, when they fire.
14:03No one tells us that either.
14:05There's a lot of pre processing step that happens.
14:09The first task is called segmentation,
14:11identifying where the neurons are,
14:13then spike detection, when the nuance fire over time,
14:15when which individual neuron fires over time.
14:17And that none of these is a trivial task.
14:19And then a lot of smart people are working on these,
14:22including some of my colleagues.
14:25After a lot of pre-processing,
14:26so you end up with each individual neuron,
14:28you end up with a data point, like data set like this
14:31that it basically has these takes
14:35whenever the neuron has fired.
14:39A given neuron you have over time that the neuron fire
14:42like this.
14:45These are the time points the neuron apply.
14:47Now, you can do something fancier,
14:49you can look at the magnitude,
14:51the signal that you're detecting at neuron.
14:53You could deal with that, but for now we're ignoring that.
14:55We're just looking at when they fire.
14:58This is called the spike train for each neuron.
15:01That's the data that we're using.
15:05These are neurons firing times.
15:07And if we combine them, this is the cartoon
15:09we get something like this.
15:10We get a sequence of activation pattern.
15:13This is color coded based on that sort of five neuron
15:16sort of cartoon network.
15:18And you see that different neurons activate
15:19at different times.
15:23And what I'll talk about is a notion of connectivity
15:25that tries to predict the activation pattern of one neuron
15:29from a network, basically.
15:31That sort of maybe neuron one tells us something
15:34about sort of activation patterns in neuro two,
15:36that if we knew when neuro one activated or fired,
15:39we could predict when neuro on two fires,
15:41and maybe neuron two will tell us something
15:43about activations of neurons three and four, et cetera.
15:46And that's the notion of connectivity at that time
15:49after, since we're trying to estimate those edges
15:51in this time.
15:53Now, please.
15:55<v ->Could you say just a few words informally</v>
15:57about the direction of connectivity?
15:58<v ->Yeah.</v>
15:59<v ->Maybe drawing arrow forward in time.</v>
16:00<v ->Yes.</v>
16:01I'll get to this, maybe in the next two slides.
16:06The framework that we're gonna work with
16:09is called the Hawkes process.
16:11Just go back to seminal more by Alan Hawkes.
16:14In '70s where he looked at spectral properties
16:19of point processes.
16:20What are point processing that basically is like activation
16:23over time.
16:24Zeros and ones over time.
16:26It could Poisson processes.
16:29What the Hawkes process does in particular
16:31is that it uses the past history of one neuron
16:37to predict the future.
16:39And this goes back to Forest's question
16:42that sort of what is that edge in this case?
16:44This is the notion that is related closely in a special case
16:48of what is known to econometricians as Granger causality
16:52that sort of using past to predict future.
16:55And that's the notion of connectivity
16:57that we're here at, we're after in this particular case.
17:03And what makes this Hawkes process
17:05the convenient for this is that
17:07sort of it's already set up to do this.
17:08I'm gonna present the Hawkes process.
17:10Its simplest form, this is the linear Hawkes process.
17:13And what it is, is that sort o, it's a counting process.
17:17It's just counting the events.
17:20And so that's the event process N.
17:25And that event process has an intensity lambda j
17:31for each neuron is standard i,
17:33which is combination of two terms,
17:37a new I, that's the baseline intensity of that neuron.
17:40That means that if you had nothing else,
17:43this neuron would fire at this rate, but basically random
17:47that would fire at random rate
17:51plus the effect that that neuron
17:53gets from the other neurons.
17:55Every time that there's an activation in neuron,
17:58any neuron j from one to p including neuron i itself,
18:03depending on how long it's been since that activation.
18:05The time it's been, the current time t
18:08and the time of activation of the previous neuron
18:09acquiring or the previous neuron,
18:11some weight function determines how much influence
18:15that neuron pi gets.
18:17This has a flavor of causality,
18:20which is why econometricians call it danger causality.
18:24This is worked by the ranger,
18:29but it's really not causality.
18:30We know that there's beyond,
18:32and so there's a lot of work on this
18:33that's sort, it's only causality
18:34on the day-to-day restrictive assumptions,
18:37talk about in general,
18:38but nonetheless it predicts in the future.
18:41It's a prediction in the future.
18:43And again, sort of in this case this d and i
18:47is our point process, lambda i is our intensity process.
18:52It started itself.
18:54Ui is the background intensity
18:56and tjks are the times when the other neurons
19:01acquired in the past.
19:03And this omega ij is the transfer function.
19:06It determines how much information is passed
19:09from firing your one neuron
19:11to firing of other neurons in the future.
19:14And usually you think that sort of the further
19:16you go in the past, the less information is carrying over.
19:19Usually the types of functions that you consider,
19:21these transfer functions are decay
19:23and how to decay form
19:25that sort of, if you go too far in the past,
19:27there's no information, there's no useful information.
19:30Any question on the basic of this linear Hawkes process
19:33because I'm not gonna present the more complicated version,
19:38but I think this will suffice for our conversation.
19:41I wanna make sure that we're all good
19:43with this simple version.
19:48Okay, so no question on this.
19:51But if we agree with this and then this actually process
19:55gives us a very convenient way
19:56of defining that connectivity.
19:59What it meant by connectivity now basically means
20:02that this function omega ij, if it's non zero,
20:06then that means that there's an edge
20:07between neuron j and neuron I.
20:09And that's basically what I was showing you
20:11in that bigger module.
20:13It all comes down to estimating
20:15whether omega ij is zero or not for this Hawkes process.
20:21Okay.
20:23Let me show you a zero simple example
20:25with two neurons.
20:26In this case, neuron one has no other influence.
20:32It's only it's past history and baseline intensity.
20:36Neuron two has an edge on neuron one.
20:40Let's see what we would expect for the intensity
20:43of neuron one.
20:44If we think about neuro one,
20:47then it's basically a baseline intensity, that new one.
20:51And it's gonna fire at random times for some process.
20:56It's gonna fire at random times with the same intensity.
20:59The intensity is not gonna change because fixed,
21:02we could allow that intensity to be time varying, et cetera,
21:05make it more complicated but in it simplest form
21:08that neuron is just gonna fire randomly,
21:11every time that they sort of it wants.
21:15Now, neuron two would have a difference story
21:19because neuron two depends on activation of neuro one.
21:22Any time that neural one fires, the intensity of neuron two
21:28goes from, let's say the baseline is zero for neuron two,
21:31but every time that neuron one fires,
21:33the intensity of neuron two becomes non zero
21:36because it got excitement from neuron one.
21:38It responds to that.
21:40Neuron two would require to, and then when you have
21:42like three activations, you can get
21:45the convolution of effects that would make neuron two
21:48more likely to activate as well or to spike as well.
21:54And then so this is a pattern that sort of basically
21:56what we are doing here is that we're taking
21:58this to be on omega
22:02to one, that sort of this you see there's the K form
22:05and these get involved if you have more activation
22:09on neuron one, that sort of increases the intensity
22:12of neuron two, meaning that we have more of a chance
22:16for neuron two to fire and this.
22:20Say this simple example, this could be the intensity
22:23of neuron two.
22:24And in fact this all we observe in this case
22:29are these two spike trains for neuron one and neuron two.
22:32We don't observe the network,
22:35in this case there are four possible edges.
22:37One of them is the right edge.
22:38We don't observe the intensity processes.
22:41All we observe is just the point process, the spike.
22:45And the goal is to estimate the network
22:47based on that spike train.
22:49And in fact,
22:53as part of that, we also need to estimate that process.
23:01That estimation problem is not actually that complicated.
23:06If you think of it, it's trying to predict
23:10now based on past.
23:13We could do prediction.
23:14We could use basically penalized regression.
23:18It's a penalized Poison regression.
23:20Something along those lines.
23:21A little bit more complicated,
23:22but basically it's a penalized Poisson regression
23:24and we could use the approach similar
23:27to what is known as neighborhood selection.
23:28We basically meaning that we regress each neuron
23:31on the past of all other neurons,
23:33including that neuron itself.
23:34It's a simple regression problems.
23:36And then we use regularization to select a subset of them
23:39that are more informative, et cetera.
23:42And there's been quite a bit of work on this,
23:45including some work that we've done.
23:47The work that we've done was focused more
23:49on extending the theory of these Hawkes processes
23:55to a setting that is more useful
23:58for neuroscience applications.
24:00In particular, the theory that existed was focused mostly
24:06on the simple linear functions, but also on the case
24:11where we had non-negative transfer functions.
24:14And this was purely an artifact
24:17that the theoretical analysis approach that Hawkes had taken
24:22and using these what are known as cluster representation.
24:28What Hawkes and Oakes had done was that they were
24:33representing each neuron as a sum of, sorry,
24:39homogeneous Poisson processes,
24:42activation pattern of each neuron
24:44as some of homogeneous Poisson process.
24:46And because there was a sum that could not allow
24:48for omega ijs to be negative,
24:51'cause they would cancel throughout and we would get less.
24:56What we did, and this was the work of my former student,
25:00Chen Chang who's Davis, was to
25:06come up with an alternative framework,
25:09theoretical framework motivated by the fact that
25:10we know that neuroscience activations are not just positive,
25:15they're not all excitement,
25:18they're also inhibitions happening.
25:21Neuroscience and in any other biological system really,
25:24we can't have biological systems being stable
25:28without negative feedback.
25:29These negative feedback groups are critical.
25:32We wanted to allow for negative effects
25:36or the effects of inhibition.
25:38And so we came up with a different representation
25:40based on what is known as thinning process representation
25:44that then allowed us to get a concentration
25:48for general.
25:48I won't go into details of this,
25:50that basically we get something that we can show
25:53that for any sort of function,
25:59we get a concentration around its need in a sense.
26:03And so using this as an application,
26:06then you could show that sort of with high probability,
26:08we get to estimate the network correctly
26:11using this name of selection type approach.
26:16This is estimation but we don't really
26:20have any sense of whether...
26:27Let's skip over this for the sake of time.
26:29You don't really have any sense of whether
26:31the edges that we estimate are true edges or not.
26:33We don't have a measure of uncertainty.
26:35We have theory that shows that
26:37sort of the pi should be correct
26:39but we wanna maybe get a sense of uncertainty about this.
26:43And so the work that we've been doing more recently
26:48focused on trying to quantify the uncertainty
26:50of these estimates.
26:52And so there's been a lot of work over the past
26:55almost 10 years on trying to develop inference
26:59for these regularized estimation procedures.
27:03And so we're building on these work,
27:05existing work in particular,
27:06we're building on work on
27:11inferences for vector risk processes.
27:14However, there's some differences
27:17most importantly that vector risk processes capture a fixed
27:24and pre-specified lag, whereas in the Hawkes process case,
27:28we have each basically dependence over the entire history.
27:34We don't have a fixed lag and it's all pre-specified.
27:38And also another difference
27:40is that vector auto-aggressive processes
27:42needs pardoning.
27:44Its' observed over this free time,
27:45whereas the Hawkes process is observed
27:48over a continuous time.
27:50It's a continuous time process
27:50and that that adds a little bit of challenge,
27:52but nonetheless, so we use this de-correlated
27:56score testing work
27:57which is based on the work of Ning and Liu.
28:01And what I'm gonna talk about in the next couple of slides
28:07is an inference framework for these Hawkes processes.
28:11Again, what I showed you before,
28:14the simple form of linear Hawkes process
28:16and motivated by your neuroscience applications,
28:19what we can consider is something quite simple,
28:22although, we could generalize that.
28:24And that generalization is in the paper
28:26but the simple case is to consider something like omega ij
28:30as beta ij times some function pathway j
28:34where that function is simply decay function over time.
28:40It's like exponentially decaying function.
28:43It's class decay function.
28:46That's called a transition for neuroscience applications.
28:49And so if we go with this framework then that
28:54beta ij coefficient determines the connectivity for us,
28:58that this beta ij, if it's positive,
29:01that means that sort of there's an excitement effect.
29:03If it's negative, there's an inhibition effect,
29:05and if it's zero, there's no influence from one or data.
29:08All we need to do really is to develop inference
29:11for this beta ij.
29:14And so that is our goal.
29:17And to do that, I'll go into a little bit of technicalities
29:23and detail of not enough too much.
29:25Please stop me if there are any questions.
29:27The first thing we do is that we realize
29:29that we can represent that linear Hawkes process
29:34as a form of basically a regression almost.
29:38The first thing we do is we turn it into this
29:44integrated stochastic process.
29:46We integrate all the past
29:49that form that sort of seemed ugly,
29:51we integrate it so that it becomes
29:53a little bit more compact.
29:55And then once we do that, we then write it pretty similar
29:59to regression.
29:59We do a change of variable basically.
30:01We write that point process dNi as as our outcome Yi
30:07and then we write epsilon i to be Yi minus lambda
30:11to be added subtract lambda i sense.
30:15And that allows us to write things
30:18as a simple form of regression.
30:22Now this is something that's easy
30:24and we're able to deal with.
30:25The main complication is that sort of this a regression
30:28with the hetero stochastic noise.
30:32Sigma it squared depends on the past
30:36this also time period.
30:38It depends on the beta lambda.
30:42Okay, so once we do this
30:49then to develop a test for beta ij,
30:53we could develop a test for beta ij
30:55and then this also could extended to testing multiple betas
31:00and sort of allowing for ground expansions et cetera.
31:03And even nonstationary the baseline,
31:06but the test is basically
31:09now based on this de-correlated score test.
31:11Once we write in this regression form,
31:13we can take this de-correlated score test
31:15and I'll skip over the details here
31:19but basically we form this set of octagonal columns
31:23and define a score test based on this
31:26that looks something like this,
31:28that you're looking at the effect of the correlated j
31:32with basically noise term, epsilon i.
31:36Both of these are driven from data based on some parameters,
31:40but once you have this, this Sij
31:43then you could actually now define a test
31:47that basically looks at the magnitude of that Sij.
31:53And that's the support that we could use.
31:59And under the no, we can show that this test SUT
32:02converges to a pi square distribution
32:05and we could use that for testing.
32:08In practice, you need to estimate these parameters.
32:10We estimate them, we ensure that things still work
32:13with the estimated parameters
32:15and still so that you have can register pi squared.
32:19And you can also do confidence and all this sector.
32:24Maybe I'll just briefly mention
32:26that this also has the usual power that we expect
32:29that you can study power of this as a local alternative.
32:35And this gives us basically how that we would expect.
32:41And simulation also behaves very close
32:45to the oracle procedure that knows which neurons
32:47acting with other.
32:50What we've done here is that
32:51we've looked at increasing sample size
32:54or own length of the sequence from 200 to 2,000
32:58and then we see that sort of type one error
33:01becomes pretty well controlled as time increases.
33:05The pink here is oracle.
33:06The blue is our procedure.
33:08The power also increases as the sample size increases.
33:14And also look at the coverage of the confidence involved.
33:18Both for the zeros and non zeros,
33:21the coverage also seems to be well behaved.
33:26This is simple setting of simulation but that looks like
33:32it's not too far actually in application
33:35that we've also looked at.
33:38And in particular we've looked at some data
33:42paper that was published in 2018 in nature
33:45when they had looked at activation patterns of neurons
33:50and how they would change with and without laser.
33:54And at the time this was like the largest,
33:57so they had multiple device that they had looked at,
34:00and this was the largest region
34:02that they had looked at had 25 neurons.
34:04The technology has improved quite a bit.
34:06Now there's a couple of hundred neurons
34:08that they could measure,
34:09but this was 25 neurons.
34:10And then what I'm showing you are the activation patterns
34:14without laser and with laser
34:16and not showing the edges that are common
34:19between the two networks.
34:20I'm just showing the edges are different
34:21between these networks.
34:23And we see that these betas,
34:25some of them are clearly different.
34:28In one condition the coefficient covers zero
34:32and the other conditions not cover.
34:33And that's why you're seeing these difference in networks.
34:36And that's similar to what they had observed
34:39based on basically correlation that as you activate
34:43there's more connectivity among these neurons.
34:49Now in the actual experiments,
34:51and this is maybe the last 15 minutes or so by top,
34:57in the actual experiments, they don't do just a simple
35:00one shot experiment because they have to implant
35:03this device.
35:06This is data of a mouse.
35:08They have to implant this device on mouse's brain.
35:11And so what they do is that they actually,
35:13once they do that and sort of now with that camera,
35:16they just measure activities of neurons.
35:18But once they do that, they actually run
35:20a sequence of experiments.
35:23It's never just a single experiment or two experiments.
35:25What they do is that they, for example,
35:28they show different images, the mouse
35:31and they see the activation patterns of neurons
35:34as the mouse processes different images.
35:36And what they usually do is that sort they show an image
35:38with one orientation and then they have a washout period.
35:42They show an image with different orientation,
35:44they have a washout period.
35:45They show an image with a different orientation
35:47and then they might use laser
35:50in combination of these different images et cetera.
35:53What they ended up doing
35:54is that they have many, many experiments.
35:56And what we expect is that the networks
35:59in these different experiments
36:00to be different from each other
36:02but maybe share some commonalities as well.
36:04We don't expect completely different networks
36:06but we expect somewhat related networks.
36:09And over different time segments
36:13the network might change.
36:15In one segment it might be that and the next segment
36:19it might change to something different
36:20but maybe some parts of the network structure are like.
36:25What this does is that it sort of motivates us
36:27to think about join the estimate in these networks
36:29because each one of these time segments
36:31might not have enough observation to estimate accurately.
36:35And this goes back to the simulation results
36:36that I showed you, that in order to get to good control
36:41of type one error and good power,
36:43we need to have decent number of observations.
36:45And in each one of these time segments
36:47might not have enough observations.
36:50In order to make sure that we get high quality estimates
36:54and valid inference,
36:57we need to maybe join the estimations
37:00in order to get better quality estimates and influence.
37:11That's the idea of the second part
37:13of what I wanna talk about going beyond
37:17the single experiment and trying to do estimation
37:19and inference, and multiple experiments of similar.
37:22And in fact in the case of this paper by and Franks
37:26they had, for every single mouse,
37:30they had 80 different experimental setups
37:33with laser and different durations
37:35and different strengths.
37:37It's not a single experiment for each mouse.
37:39It's 80 different experiments for each mouse.
37:42And you would expect that many of these experiments
37:44are similar to each other
37:45and they might have different degrees of similarities
37:47with each other that might need to take into account.
37:53Then the goal of the second part is do joint estimation
37:56of inference for settings where we have multiple experiments
37:59and not just a single experiment.
38:02To do this, we went back to basically
38:05that destination that we had
38:07and previously what we had was the sparsity type penalty.
38:11What we do is that sort of now we added
38:12a fusion type penalty.
38:14Now we combine the estimates in different experiments.
38:19And this is based on past work that I had done
38:22with the the post
38:24but the main difference in this board is that
38:28now we wanna allow these estimates
38:32to be similar to each other
38:33based on a data-driven notion of similarity.
38:36We don't know which experiments
38:37are more similar to each other.
38:40And we basically want the data to tell us which experiments
38:43should be more similar to each other, should be combined
38:46and not necessarily find that a priority person
38:51usually don't have that information.
38:53These data-driven weights are critical here,
38:57and we drive these data-driven weights
38:59based on just simple correlations.
39:01We calculate simple correlations.
39:02The first step we look to see which one of these conditions,
39:05the correlations are more correlated with each other,
39:09more similar to each other
39:11based on these correlations.
39:13And we use these cost correlations to then define ways
39:17for which experiments should be more closely used
39:20with each other.
39:21And estimates on which experiments
39:22should be more closely used.
39:25And I leave that in terms of details
39:29but in this similar setting
39:32as what I had explained before
39:34in terms of experimental setup for this,
39:37I'm sorry, in terms of simulation setup,
39:39there are 50 neurons in network
39:42from three different experiments in this case
39:44of three different lengths,
39:45and we use different estimators.
39:48And what we see is that sort of when we do this fusion,
39:51we do better in terms of the number of two positives
39:54for any given number of estimated edges
39:57compared to separately estimating
39:59or compared to sort of other types of fusions
40:02that what one might consider.
40:06Now, estimation is somewhat easy.
40:10The main challenge was to come up
40:12with these data-driven weights.
40:14The main issue is that if you wanted to come up with
40:19valid infants in these settings,
40:21when we have many, many experiments,
40:24then then we would have very low power if we're adjusting,
40:27for example, from all comparison using FDR, FWER,
40:31false discovery rate or family-wise error rate,
40:35we have p squared times MS.
40:37And so we have a low power.
40:40To deal with this setting, what we have done
40:42is that we've come up with a hierarchical testing procedure
40:45that avoids testing
40:50all these p squared times M coefficient.
40:52And the idea is this,
40:53the idea is that if you have a sense of which conditions
40:57are more similar to each other,
40:59we construct a very specific type of binary tree,
41:03which basically always has a single node
41:07on the left side in this case.
41:09And then we start on the top of that tree
41:11and and test for each coefficient.
41:13We first test Albany experiments.
41:16If you don't reject, then you stop there.
41:18If you reject then we test one, and two,
41:22three, and four separately.
41:25If you reject one, then we've identified the non
41:28make the non zero edge.
41:30If you reject two, three, four, then we go down.
41:34If you don't reject two, three, four, we stop there.
41:36This way we stop at the level that is appropriate
41:39based on data.
41:42And this this ends up especially in sparse networks,
41:44this ends up saving us a lot of tests
41:49and gives us significant improvement in power.
41:51And that's shown in the simulation
41:53that you end up, if you don't do this,
41:57your power decreases as the number of experiments increases.
42:01And in this case you've gone up to 50 experiments
42:04as I mentioned.
42:04The golden and facts paper has about 80.
42:07Whereas if you don't do that
42:09and if your network sparse actually power,
42:12you see that by combining experiments,
42:15you actually gain power
42:16because you're incorporating more data.
42:19And this is more controlling the family-wise error rate.
42:22And both methods control the famil-wise error rate.
42:25We haven't developed anything for FDR.
42:27We haven't developed theory for FDR
42:29but the method also seems to be controlling FDR
42:32in a very stringent way actually.
42:35But we just don't have theory for FDR control
42:38'cause that becomes more complicated.
42:46I'm going very fast because of time
42:47but I'll pause for a minute.
42:49Any questions.
42:53Please.
42:54<v ->What do you think about stationary</v>
42:56of the Hawkes process in the context?
42:58Whether it's the exogenous experimental forcing
43:01and like over what timescale did that happen
43:03in the stationary, the reasonable?
43:04<v ->Yeah, that's a really good question.</v>
43:11To be honest, I think these hard processes
43:13are most likely non stationary.
43:14The two mechanisms of non stationary that could happen.
43:20One, we try to account for it.
43:22I skipped over it but we tried to account
43:25for one aspect of it by allowing the baseline rate
43:28to be time varying.
43:38Basically we allow this this new i to be a function of time.
43:43Baseline rate for each neuron is varying over time.
43:48And the hope is that, that would capture
43:49some of the exogenous factors that might influence overall.
43:56It could also be that the data are changing over time.
44:00That sort of we haven't done or it could in fact be that
44:06we have abrupt changes
44:10in patterns of either activation or the baseline over time,
44:15but sort all of a sudden something completely changes.
44:17We have piecewise stationary, not monotone sort of,
44:22not continuous, not stationary.
44:24We have piecewise.
44:26We have experimental that's happening,
44:28something happening and then all of a sudden
44:30something else is happening.
44:31This eventually would capture maybe plasticity
44:35in these neurons to neuroplasticity to some extent
44:39that sort of allows for changes of activity over time,
44:42but beyond that we haven't done any.
44:45There's actually one paper that has looked
44:47at piece stationary for these hard processes neuron.
44:52It becomes a competition, very, very difficult problem,
44:56especially the person becomes very difficult problem.
44:59But I think it's a very good question.
45:03Aside from that one paper much else that has done.
45:11<v ->Hi, thank you professor for the sharing.</v>
45:13I have a question regarding the segmentation
45:17'cause on the video you showed us,
45:19the image is generally very shaky.
45:23In the computer vision perspective,
45:25it's very hard to isolate which neuron actually fired
45:28and make sure that it's that same neuron fires over time.
45:32And also the second question is that the mouse
45:36factory, the model you've mentioned is like 20 neurons,
45:39but in the picture you show us there's probably
45:42thousands of neurons.
45:42How do you identify which 20 neurons to look at?
45:46<v ->Very good questions.</v>
45:48First of all, before they even get to segmentation,
45:51they need to do what is known as,
45:55and this is actually common in
45:59time series and sort of (indistinct).
46:03In registration.
46:07What this means is that you first need to register
46:09the images so that they're basically aligning correct.
46:13Then you can do segmentation.
46:14If you remember first five,
46:17but if you remember had a couple of dots
46:20before getting to segmentation.
46:21There are a couple of steps that need to happen
46:23before we even get to segmentation.
46:25And part of that is registration.
46:27Registration is actually a nontrivial pass
46:29to make sure that the vocations don't change.
46:32You have to right otherwise that the algorithm
46:36will get confused.
46:37First there's a registration that needs to happen
46:41and some background correction
46:43and sort of getting noise correctly and everything.
46:45And then there's registration.
46:47And then after that you could do segmentation,
46:49identifying neurons.
46:50Now, the data that they showed you was a data
46:52from actually cats video that showed it's different,
46:56this holding and banks data that they showed you here.
47:00This one had 25 neurons that they had.
47:03This is an older technology.
47:04It's an older paper that they only had 25 neurons,
47:07that they had smaller regions that they were capturing.
47:10The newer technologies, they were capturing
47:11the larger region a couple hundred.
47:14I think the most I've seen
47:16was about a thousand or so neurons.
47:17I haven't seen more than a thousand neurons.
47:20<v ->Thank you.</v>
47:25<v ->Okay, so I'm close to the end of my time.</v>
47:29Maybe I'll have the remaining minutes or so
47:34I'll basically mention that sort of
47:37give by this saying we have joint estimation
47:42to the data from holding advance.
47:43And then we also see that something that is not surprising
47:48perhaps that the no laser condition,
47:51the net yield is more different
47:53than the two different magnitudes of laser,
47:55maybe 10, 20 sort of meters and so square.
48:02You see that so least two are more similar other
48:05than the no laser condition.
48:10And I'm probably gonna stop here
48:12and sort of leave a couple of minutes for questions,
48:14additional questions, but I'll mention that
48:15so the last part I didn't talk about was to see if we could
48:19go beyond prediction.
48:20Could we use this and mention that sort major causality
48:23is not really causality prediction.
48:27It could we go beyond prediction,
48:31get a sense of which neurons are impacting other neurons.
48:35And I'll briefly mention that sort of there are two issues
48:39in general going beyond prediction causality.
48:45We have a review paper that tlaks about this one,
48:47issue is subsampling.
48:48And that you don't have enough resolution.
48:51And the other issue is where you might have
48:53limited processes that make it difficult
48:55to answer all the questions.
48:57Fortunately the issue of self sampling,
49:00which is a difficult issue in general is not present,
49:04but is not very prominent thinking these classroom
49:09and imaging data
49:10because you have continuous time videos.
49:14And subsampling should not be a big deal in this case.
49:19However, we observe a tiny faction
49:23of the connection of the brain.
49:25The question is, can we somehow account
49:27for all the other neurons that we don't see?
49:31The last part of this work is about that.
49:34And I'll sort of jump to the end
49:38because I'll put a reference to that work.
49:41That one is published in case you're interested
49:43in a paper that sort of looks at
49:49whether we could go beyond prediction,
49:51whether they actually identify causal links
49:54particularly neurons.
49:56And I think I'm gonna stop here and thank you guys
50:00and I'm happy to take more questions.
50:17<v ->Naive question.</v>
50:19Biologically, what is a network connection here?
50:24Because they're not, I'm assuming they're not
50:27growing synapses or not based on the laser.
50:33(indistinct)
50:36(group chattering)