Skip to Main Content

YSPH Biostatistics Seminar: “Estimation and Inference for Networks of Multi-Experiment Point Processes”

October 06, 2022
  • 00:00<v ->Today it's my pleasure to introduce,</v>
  • 00:02Professor Ali Shojaie.
  • 00:05Professor Shojaie holds master's degrees
  • 00:07in industrial engineering, statistics,
  • 00:10applied math, and human genetics.
  • 00:13He earned his PhD in statistics
  • 00:14from the University of Michigan.
  • 00:17His research focuses on the high dimensional data,
  • 00:19longitudinal data, computational biology,
  • 00:23network analysis, and neuroimaging.
  • 00:26Professor Shojaie is a 2022 fellow
  • 00:29of the American Statistical Association
  • 00:32and 2022 winner of their Leo Breiman Award.
  • 00:36He's a full professor of biostatistics,
  • 00:38adjunct professor of statistics,
  • 00:40and the associate chair for strategic research affairs
  • 00:43in the department of biostatistics
  • 00:45in the University of Washington.
  • 00:47Let's welcome Professor Shojaie.
  • 00:52<v ->Thanks for having me.</v>
  • 00:54Sometimes I get moved by the volume of my voice.
  • 00:57You guys, can you hear me at the back, okay?
  • 01:00Since I'm not gonna use the microphone yet,
  • 01:01but I'd rather not use the microphone at all.
  • 01:06Well, it's a pleasure to be here
  • 01:08and to talk to you about some work that I've doing doing
  • 01:12for the past couple of years.
  • 01:15I'm using machine learning tools for different types of data
  • 01:21that you can understand better how the brain works.
  • 01:29The question really is how do we process
  • 01:32information on our brains?
  • 01:34What is the processing information?
  • 01:41The brain through neurons,
  • 01:43we know that neurons interact with each other.
  • 01:46Neurons do process information.
  • 01:51This is of course related to my broader interests
  • 01:54on network and understanding how things interact
  • 01:57with each other.
  • 01:59Naturally I was drawn into this part here,
  • 02:03but when I talk to scientist colleagues,
  • 02:06then a lot of times I'm asked,
  • 02:08what is the goal of understanding that network?
  • 02:10How do we use it?
  • 02:11How do we
  • 02:15take advantage of that network that we learned?
  • 02:17Here's an example of some recent work that we've been doing
  • 02:21that indicates that learning something about these networks
  • 02:26is actually important.
  • 02:30I should say that this is joint work
  • 02:32with a bunch of colleagues at the University of Washington
  • 02:38has done that is biomedical engineering,
  • 02:43and the main group that has been running these experiments.
  • 02:47And then I'm collaborating with E Shea-Brown
  • 02:49who's in computational scientist,
  • 02:51and Z Harchaoui, computer scientist slash statistician,
  • 02:56and she's been working on this project.
  • 02:59This project, the lab is interested.
  • 03:02And what they do is neurostimulation.
  • 03:05What they wanna do is to see if they could stimulate
  • 03:08in different regions of the brain to make in this case
  • 03:12monkey do certain things
  • 03:14or to restore function that the monkey might have lost.
  • 03:18And it's a really interesting platform
  • 03:22that they've developed.
  • 03:24It's basically small implants that they put
  • 03:28in a region of the brain on these monkeys.
  • 03:31And the implant has two areas when the lasers
  • 03:35beam shine in about 96 in this case,
  • 03:41electrodes that collect data
  • 03:43in that small region of the brain.
  • 03:47This is made possible by optogenetics
  • 03:51meaning that it made the neurons sensitive to these lasers.
  • 03:55When neurons
  • 04:00receive the laser, then they basically get excited,
  • 04:03get activate.
  • 04:05The goal in this research eventually
  • 04:08is to see how the activation of neurons,
  • 04:11which plasticity would change
  • 04:14the connectivity of the neurons,
  • 04:18would result in later on in changing function.
  • 04:23That's the eventual goal of this.
  • 04:24This research work at the very beginning of that.
  • 04:28We are not there yet in terms of understanding function,
  • 04:32understanding the link, the connectivity and contact.
  • 04:35The collaboration with this lab started
  • 04:37when they wanted to predict how the connectivity changes
  • 04:41as a result of this activation.
  • 04:44We wanted to understand whether by changing various factors
  • 04:49in the experiments, the distance between two lasers,
  • 04:52the duration of laser.
  • 04:54How could they accurately predict the changing connectivity?
  • 05:01The way that the experiment is set up
  • 05:02is that basically had these times where they have
  • 05:07activation and then the latency period
  • 05:10and then followed by observation.
  • 05:12They basically observe the activity of these brain regions.
  • 05:20That sort of 96.
  • 05:22Electrodes in this main region over time.
  • 05:25That's the data that they're correct.
  • 05:31Here's a look at this functional connectivity a
  • 05:35and that's what they were trying to predict.
  • 05:40Basically the heat map shows
  • 05:46the links between the various brain lesions,
  • 05:50but 96 of them, you don't wanna.
  • 05:56And if that connectivity is defined based on coherence,
  • 06:01which is basically correlation measure frequency domain,
  • 06:05and we have coherence in four different frequency bands.
  • 06:08These are the standard bands that signal instructive
  • 06:11and they think that they measure activity
  • 06:14and different spatial resolution.
  • 06:16We have theta band, the beta band, the gamma band,
  • 06:18and the high gamma band.
  • 06:20And we wanna see how the connectivity
  • 06:22in these different bands changes
  • 06:25as the effect of these type neurons.
  • 06:31And what...
  • 06:37This is not working.
  • 06:38The clicker stopped working.
  • 06:40We'll figure that.
  • 06:51Let's go on full screen again to see where this goes.
  • 07:00What basically we have
  • 07:01is that we have the baseline connectome
  • 07:03and we have these experimental protocols,
  • 07:07and we're trying to predict how the connectivity changes.
  • 07:10What the lab was doing before was that
  • 07:12they were looking at trying to predict connectivity
  • 07:14based on experimental protocols.
  • 07:18And what they were getting
  • 07:19was actually really bad prediction.
  • 07:22These are test R squares.
  • 07:26And what they were getting was about 5% test R square
  • 07:30when they were using these protocol features
  • 07:32to predict how to connect with these gene.
  • 07:34And the first thing that we understood
  • 07:36and so you see it that sort of really bad
  • 07:38is that that's the prediction.
  • 07:39If that's the prediction that you're getting,
  • 07:41then really bad prediction.
  • 07:43The first thing that we noticed in this research
  • 07:46was that it's actually important to incorporate
  • 07:50the features of the current state of connectivity
  • 07:53in order to predict how to make them useful.
  • 07:56What we did was that in addition to those protocol features,
  • 07:59we added some network features,
  • 08:01the current state of the network in order to predict
  • 08:03how it's gonna change.
  • 08:04And this is, to me, this is really interesting
  • 08:06because it basically says that our prediction
  • 08:10has to be subject specific
  • 08:13depending on the current state of each month
  • 08:14these connectivity, how their connectivity
  • 08:18is going to change will be different.
  • 08:21And what we saw was that when we incorporated
  • 08:24these network features, we were able to improve quite a bit
  • 08:28in terms of prediction.
  • 08:29We're still not doing hugely good,
  • 08:33we're only getting like test R squared of what, 25%.
  • 08:36But what you see that sort of the connectivity
  • 08:38is now, the prediction is now much more.
  • 08:41How the connectivity.
  • 08:43And also in terms of the pictures, you see that going from,
  • 08:46so say this is the true,
  • 08:48the first part in d is the true change in connectivity,
  • 08:52e is what you would get from just the protocol features,
  • 08:56and you see that prediction is really bad,
  • 08:57and f is what you get when you combine protocol features
  • 09:01and the network features.
  • 09:03That prediction is closer to the true
  • 09:09change in connectivity than just using the protocol feature.
  • 09:12This was the first thing that we learned from this research.
  • 09:15The second part of what we learned is that
  • 09:18it also matters which approach you used the prediction.
  • 09:21What they had done was that they were using some simple
  • 09:24like linear model for prediction.
  • 09:26And then we realized that we need to use something more
  • 09:30expressive and then we sort of ended up using
  • 09:32these non-linear additive models
  • 09:34that we had previously developed,
  • 09:36partly because while they have a lot of expressive power,
  • 09:40they're still easy to interpret.
  • 09:43Interpretation for these additive models is still easy
  • 09:46and particularly we see what the shapes
  • 09:51basically these functions are.
  • 09:52For example, with the distance we see how the function
  • 09:55changes and that helps with the design of these experience.
  • 09:58I'm not gonna spend too much time
  • 10:00talking about the details of this
  • 10:01given that we only have 50 minutes
  • 10:03and I wanna get to the main topic,
  • 10:05but basically these additive models
  • 10:08are built by combining these features.
  • 10:11Think of tailor expansion in a very simple sense
  • 10:14that you have a linear term, you have a quadratic term,
  • 10:17you have a cubic term.
  • 10:18And the way that sort we form these additive models
  • 10:21is that we automatically select the degree of complexity
  • 10:26of each additive feature,
  • 10:28whether it's says linear, or quadratic, or cubic, etcetera.
  • 10:32We also allow some features to be present in the models,
  • 10:36features not to be present.
  • 10:37What we end up with are these patterns
  • 10:41where some features are real complex and other features,
  • 10:43and that's automatically decided from data.
  • 10:47This model is good in this prediction
  • 10:51and it allows us to come up with these sets of predictions.
  • 10:53We see now that for example, for coherence difference,
  • 10:58which is the network feature,
  • 10:59that's the coherence difference.
  • 11:01Network distance, that's the distance
  • 11:03between the two portals.
  • 11:04The two laser points.
  • 11:05We get these two patterns estimated
  • 11:07and then when we combine them, we get this surface basically
  • 11:10that determines how the connectivity,
  • 11:15changing connectivity could be predicted
  • 11:17based on these two features.
  • 11:18And all of this is done automatically based on data.
  • 11:23This approach, again, sort of the key feature of it
  • 11:25is that it combines the network features
  • 11:28of the current state of connectivity with protocol features
  • 11:30in order to do a better job of prediction.
  • 11:33This is a research that we just started
  • 11:36and we will continue this research
  • 11:39for the next at least five years.
  • 11:42But the goal of it is eventually to see
  • 11:44if we could predict the function
  • 11:46and ultimately if we could build a controller
  • 11:49that we could determine how to change function
  • 11:52based on various features of the experiment.
  • 11:57I mentioned all of this to say that knowing
  • 11:59and learning the network matters.
  • 12:01We need to learn the current state of connectivity,
  • 12:04for example, in this work in order to be able to design
  • 12:07experiments that would hopefully help
  • 12:12and restore function.
  • 12:15Now in this particular work,
  • 12:17what we did was that we used a very simple
  • 12:20notion of connectivity.
  • 12:21We used coherence, which is basically correlation,
  • 12:24but we know that that's not always the best
  • 12:28way to define connectivity between ranges.
  • 12:32And so what I wanna talk about for the remaining
  • 12:3640 minutes or so is how do we learn connectivity
  • 12:40between neurons?
  • 12:42And this is using a different type of data
  • 12:45that I had thought about before,
  • 12:46and I'm hoping that so I could show you this clip,
  • 12:51which is that shows the actual raw data.
  • 12:55The data is actually a video.
  • 12:58And this is activity of individual neurons
  • 13:00in a small region of the brain.
  • 13:03These dots that you see popping up,
  • 13:04these are individual neurons firing over time.
  • 13:10And you see that sort of neuron fires
  • 13:12and other neuron fires, et cetera, et cetera.
  • 13:15That's the raw data that we're getting.
  • 13:18And the goal is to understand
  • 13:21based on this pattern of activation of neurons,
  • 13:24how neurons talk to each other basically.
  • 13:27Now I'm gonna go back here.
  • 13:34And so the data of that video that I showed you,
  • 13:38basically, here's some snapshot of that data.
  • 13:41Here's one frame.
  • 13:43And there's a lot of steps in getting this data
  • 13:46to place it a bit more quick.
  • 13:50Were not gonna talk about this,
  • 13:52but sort of we need to first identify where the neurons are.
  • 13:55No one tells us where the neurons are in that video.
  • 13:58We need to first identify where the neurons are.
  • 14:00We need to identify when they swipe, when they fire.
  • 14:03No one tells us that either.
  • 14:05There's a lot of pre processing step that happens.
  • 14:09The first task is called segmentation,
  • 14:11identifying where the neurons are,
  • 14:13then spike detection, when the nuance fire over time,
  • 14:15when which individual neuron fires over time.
  • 14:17And that none of these is a trivial task.
  • 14:19And then a lot of smart people are working on these,
  • 14:22including some of my colleagues.
  • 14:25After a lot of pre-processing,
  • 14:26so you end up with each individual neuron,
  • 14:28you end up with a data point, like data set like this
  • 14:31that it basically has these takes
  • 14:35whenever the neuron has fired.
  • 14:39A given neuron you have over time that the neuron fire
  • 14:42like this.
  • 14:45These are the time points the neuron apply.
  • 14:47Now, you can do something fancier,
  • 14:49you can look at the magnitude,
  • 14:51the signal that you're detecting at neuron.
  • 14:53You could deal with that, but for now we're ignoring that.
  • 14:55We're just looking at when they fire.
  • 14:58This is called the spike train for each neuron.
  • 15:01That's the data that we're using.
  • 15:05These are neurons firing times.
  • 15:07And if we combine them, this is the cartoon
  • 15:09we get something like this.
  • 15:10We get a sequence of activation pattern.
  • 15:13This is color coded based on that sort of five neuron
  • 15:16sort of cartoon network.
  • 15:18And you see that different neurons activate
  • 15:19at different times.
  • 15:23And what I'll talk about is a notion of connectivity
  • 15:25that tries to predict the activation pattern of one neuron
  • 15:29from a network, basically.
  • 15:31That sort of maybe neuron one tells us something
  • 15:34about sort of activation patterns in neuro two,
  • 15:36that if we knew when neuro one activated or fired,
  • 15:39we could predict when neuro on two fires,
  • 15:41and maybe neuron two will tell us something
  • 15:43about activations of neurons three and four, et cetera.
  • 15:46And that's the notion of connectivity at that time
  • 15:49after, since we're trying to estimate those edges
  • 15:51in this time.
  • 15:53Now, please.
  • 15:55<v ->Could you say just a few words informally</v>
  • 15:57about the direction of connectivity?
  • 15:58<v ->Yeah.</v>
  • 15:59<v ->Maybe drawing arrow forward in time.</v>
  • 16:00<v ->Yes.</v>
  • 16:01I'll get to this, maybe in the next two slides.
  • 16:06The framework that we're gonna work with
  • 16:09is called the Hawkes process.
  • 16:11Just go back to seminal more by Alan Hawkes.
  • 16:14In '70s where he looked at spectral properties
  • 16:19of point processes.
  • 16:20What are point processing that basically is like activation
  • 16:23over time.
  • 16:24Zeros and ones over time.
  • 16:26It could Poisson processes.
  • 16:29What the Hawkes process does in particular
  • 16:31is that it uses the past history of one neuron
  • 16:37to predict the future.
  • 16:39And this goes back to Forest's question
  • 16:42that sort of what is that edge in this case?
  • 16:44This is the notion that is related closely in a special case
  • 16:48of what is known to econometricians as Granger causality
  • 16:52that sort of using past to predict future.
  • 16:55And that's the notion of connectivity
  • 16:57that we're here at, we're after in this particular case.
  • 17:03And what makes this Hawkes process
  • 17:05the convenient for this is that
  • 17:07sort of it's already set up to do this.
  • 17:08I'm gonna present the Hawkes process.
  • 17:10Its simplest form, this is the linear Hawkes process.
  • 17:13And what it is, is that sort o, it's a counting process.
  • 17:17It's just counting the events.
  • 17:20And so that's the event process N.
  • 17:25And that event process has an intensity lambda j
  • 17:31for each neuron is standard i,
  • 17:33which is combination of two terms,
  • 17:37a new I, that's the baseline intensity of that neuron.
  • 17:40That means that if you had nothing else,
  • 17:43this neuron would fire at this rate, but basically random
  • 17:47that would fire at random rate
  • 17:51plus the effect that that neuron
  • 17:53gets from the other neurons.
  • 17:55Every time that there's an activation in neuron,
  • 17:58any neuron j from one to p including neuron i itself,
  • 18:03depending on how long it's been since that activation.
  • 18:05The time it's been, the current time t
  • 18:08and the time of activation of the previous neuron
  • 18:09acquiring or the previous neuron,
  • 18:11some weight function determines how much influence
  • 18:15that neuron pi gets.
  • 18:17This has a flavor of causality,
  • 18:20which is why econometricians call it danger causality.
  • 18:24This is worked by the ranger,
  • 18:29but it's really not causality.
  • 18:30We know that there's beyond,
  • 18:32and so there's a lot of work on this
  • 18:33that's sort, it's only causality
  • 18:34on the day-to-day restrictive assumptions,
  • 18:37talk about in general,
  • 18:38but nonetheless it predicts in the future.
  • 18:41It's a prediction in the future.
  • 18:43And again, sort of in this case this d and i
  • 18:47is our point process, lambda i is our intensity process.
  • 18:52It started itself.
  • 18:54Ui is the background intensity
  • 18:56and tjks are the times when the other neurons
  • 19:01acquired in the past.
  • 19:03And this omega ij is the transfer function.
  • 19:06It determines how much information is passed
  • 19:09from firing your one neuron
  • 19:11to firing of other neurons in the future.
  • 19:14And usually you think that sort of the further
  • 19:16you go in the past, the less information is carrying over.
  • 19:19Usually the types of functions that you consider,
  • 19:21these transfer functions are decay
  • 19:23and how to decay form
  • 19:25that sort of, if you go too far in the past,
  • 19:27there's no information, there's no useful information.
  • 19:30Any question on the basic of this linear Hawkes process
  • 19:33because I'm not gonna present the more complicated version,
  • 19:38but I think this will suffice for our conversation.
  • 19:41I wanna make sure that we're all good
  • 19:43with this simple version.
  • 19:48Okay, so no question on this.
  • 19:51But if we agree with this and then this actually process
  • 19:55gives us a very convenient way
  • 19:56of defining that connectivity.
  • 19:59What it meant by connectivity now basically means
  • 20:02that this function omega ij, if it's non zero,
  • 20:06then that means that there's an edge
  • 20:07between neuron j and neuron I.
  • 20:09And that's basically what I was showing you
  • 20:11in that bigger module.
  • 20:13It all comes down to estimating
  • 20:15whether omega ij is zero or not for this Hawkes process.
  • 20:21Okay.
  • 20:23Let me show you a zero simple example
  • 20:25with two neurons.
  • 20:26In this case, neuron one has no other influence.
  • 20:32It's only it's past history and baseline intensity.
  • 20:36Neuron two has an edge on neuron one.
  • 20:40Let's see what we would expect for the intensity
  • 20:43of neuron one.
  • 20:44If we think about neuro one,
  • 20:47then it's basically a baseline intensity, that new one.
  • 20:51And it's gonna fire at random times for some process.
  • 20:56It's gonna fire at random times with the same intensity.
  • 20:59The intensity is not gonna change because fixed,
  • 21:02we could allow that intensity to be time varying, et cetera,
  • 21:05make it more complicated but in it simplest form
  • 21:08that neuron is just gonna fire randomly,
  • 21:11every time that they sort of it wants.
  • 21:15Now, neuron two would have a difference story
  • 21:19because neuron two depends on activation of neuro one.
  • 21:22Any time that neural one fires, the intensity of neuron two
  • 21:28goes from, let's say the baseline is zero for neuron two,
  • 21:31but every time that neuron one fires,
  • 21:33the intensity of neuron two becomes non zero
  • 21:36because it got excitement from neuron one.
  • 21:38It responds to that.
  • 21:40Neuron two would require to, and then when you have
  • 21:42like three activations, you can get
  • 21:45the convolution of effects that would make neuron two
  • 21:48more likely to activate as well or to spike as well.
  • 21:54And then so this is a pattern that sort of basically
  • 21:56what we are doing here is that we're taking
  • 21:58this to be on omega
  • 22:02to one, that sort of this you see there's the K form
  • 22:05and these get involved if you have more activation
  • 22:09on neuron one, that sort of increases the intensity
  • 22:12of neuron two, meaning that we have more of a chance
  • 22:16for neuron two to fire and this.
  • 22:20Say this simple example, this could be the intensity
  • 22:23of neuron two.
  • 22:24And in fact this all we observe in this case
  • 22:29are these two spike trains for neuron one and neuron two.
  • 22:32We don't observe the network,
  • 22:35in this case there are four possible edges.
  • 22:37One of them is the right edge.
  • 22:38We don't observe the intensity processes.
  • 22:41All we observe is just the point process, the spike.
  • 22:45And the goal is to estimate the network
  • 22:47based on that spike train.
  • 22:49And in fact,
  • 22:53as part of that, we also need to estimate that process.
  • 23:01That estimation problem is not actually that complicated.
  • 23:06If you think of it, it's trying to predict
  • 23:10now based on past.
  • 23:13We could do prediction.
  • 23:14We could use basically penalized regression.
  • 23:18It's a penalized Poison regression.
  • 23:20Something along those lines.
  • 23:21A little bit more complicated,
  • 23:22but basically it's a penalized Poisson regression
  • 23:24and we could use the approach similar
  • 23:27to what is known as neighborhood selection.
  • 23:28We basically meaning that we regress each neuron
  • 23:31on the past of all other neurons,
  • 23:33including that neuron itself.
  • 23:34It's a simple regression problems.
  • 23:36And then we use regularization to select a subset of them
  • 23:39that are more informative, et cetera.
  • 23:42And there's been quite a bit of work on this,
  • 23:45including some work that we've done.
  • 23:47The work that we've done was focused more
  • 23:49on extending the theory of these Hawkes processes
  • 23:55to a setting that is more useful
  • 23:58for neuroscience applications.
  • 24:00In particular, the theory that existed was focused mostly
  • 24:06on the simple linear functions, but also on the case
  • 24:11where we had non-negative transfer functions.
  • 24:14And this was purely an artifact
  • 24:17that the theoretical analysis approach that Hawkes had taken
  • 24:22and using these what are known as cluster representation.
  • 24:28What Hawkes and Oakes had done was that they were
  • 24:33representing each neuron as a sum of, sorry,
  • 24:39homogeneous Poisson processes,
  • 24:42activation pattern of each neuron
  • 24:44as some of homogeneous Poisson process.
  • 24:46And because there was a sum that could not allow
  • 24:48for omega ijs to be negative,
  • 24:51'cause they would cancel throughout and we would get less.
  • 24:56What we did, and this was the work of my former student,
  • 25:00Chen Chang who's Davis, was to
  • 25:06come up with an alternative framework,
  • 25:09theoretical framework motivated by the fact that
  • 25:10we know that neuroscience activations are not just positive,
  • 25:15they're not all excitement,
  • 25:18they're also inhibitions happening.
  • 25:21Neuroscience and in any other biological system really,
  • 25:24we can't have biological systems being stable
  • 25:28without negative feedback.
  • 25:29These negative feedback groups are critical.
  • 25:32We wanted to allow for negative effects
  • 25:36or the effects of inhibition.
  • 25:38And so we came up with a different representation
  • 25:40based on what is known as thinning process representation
  • 25:44that then allowed us to get a concentration
  • 25:48for general.
  • 25:48I won't go into details of this,
  • 25:50that basically we get something that we can show
  • 25:53that for any sort of function,
  • 25:59we get a concentration around its need in a sense.
  • 26:03And so using this as an application,
  • 26:06then you could show that sort of with high probability,
  • 26:08we get to estimate the network correctly
  • 26:11using this name of selection type approach.
  • 26:16This is estimation but we don't really
  • 26:20have any sense of whether...
  • 26:27Let's skip over this for the sake of time.
  • 26:29You don't really have any sense of whether
  • 26:31the edges that we estimate are true edges or not.
  • 26:33We don't have a measure of uncertainty.
  • 26:35We have theory that shows that
  • 26:37sort of the pi should be correct
  • 26:39but we wanna maybe get a sense of uncertainty about this.
  • 26:43And so the work that we've been doing more recently
  • 26:48focused on trying to quantify the uncertainty
  • 26:50of these estimates.
  • 26:52And so there's been a lot of work over the past
  • 26:55almost 10 years on trying to develop inference
  • 26:59for these regularized estimation procedures.
  • 27:03And so we're building on these work,
  • 27:05existing work in particular,
  • 27:06we're building on work on
  • 27:11inferences for vector risk processes.
  • 27:14However, there's some differences
  • 27:17most importantly that vector risk processes capture a fixed
  • 27:24and pre-specified lag, whereas in the Hawkes process case,
  • 27:28we have each basically dependence over the entire history.
  • 27:34We don't have a fixed lag and it's all pre-specified.
  • 27:38And also another difference
  • 27:40is that vector auto-aggressive processes
  • 27:42needs pardoning.
  • 27:44Its' observed over this free time,
  • 27:45whereas the Hawkes process is observed
  • 27:48over a continuous time.
  • 27:50It's a continuous time process
  • 27:50and that that adds a little bit of challenge,
  • 27:52but nonetheless, so we use this de-correlated
  • 27:56score testing work
  • 27:57which is based on the work of Ning and Liu.
  • 28:01And what I'm gonna talk about in the next couple of slides
  • 28:07is an inference framework for these Hawkes processes.
  • 28:11Again, what I showed you before,
  • 28:14the simple form of linear Hawkes process
  • 28:16and motivated by your neuroscience applications,
  • 28:19what we can consider is something quite simple,
  • 28:22although, we could generalize that.
  • 28:24And that generalization is in the paper
  • 28:26but the simple case is to consider something like omega ij
  • 28:30as beta ij times some function pathway j
  • 28:34where that function is simply decay function over time.
  • 28:40It's like exponentially decaying function.
  • 28:43It's class decay function.
  • 28:46That's called a transition for neuroscience applications.
  • 28:49And so if we go with this framework then that
  • 28:54beta ij coefficient determines the connectivity for us,
  • 28:58that this beta ij, if it's positive,
  • 29:01that means that sort of there's an excitement effect.
  • 29:03If it's negative, there's an inhibition effect,
  • 29:05and if it's zero, there's no influence from one or data.
  • 29:08All we need to do really is to develop inference
  • 29:11for this beta ij.
  • 29:14And so that is our goal.
  • 29:17And to do that, I'll go into a little bit of technicalities
  • 29:23and detail of not enough too much.
  • 29:25Please stop me if there are any questions.
  • 29:27The first thing we do is that we realize
  • 29:29that we can represent that linear Hawkes process
  • 29:34as a form of basically a regression almost.
  • 29:38The first thing we do is we turn it into this
  • 29:44integrated stochastic process.
  • 29:46We integrate all the past
  • 29:49that form that sort of seemed ugly,
  • 29:51we integrate it so that it becomes
  • 29:53a little bit more compact.
  • 29:55And then once we do that, we then write it pretty similar
  • 29:59to regression.
  • 29:59We do a change of variable basically.
  • 30:01We write that point process dNi as as our outcome Yi
  • 30:07and then we write epsilon i to be Yi minus lambda
  • 30:11to be added subtract lambda i sense.
  • 30:15And that allows us to write things
  • 30:18as a simple form of regression.
  • 30:22Now this is something that's easy
  • 30:24and we're able to deal with.
  • 30:25The main complication is that sort of this a regression
  • 30:28with the hetero stochastic noise.
  • 30:32Sigma it squared depends on the past
  • 30:36this also time period.
  • 30:38It depends on the beta lambda.
  • 30:42Okay, so once we do this
  • 30:49then to develop a test for beta ij,
  • 30:53we could develop a test for beta ij
  • 30:55and then this also could extended to testing multiple betas
  • 31:00and sort of allowing for ground expansions et cetera.
  • 31:03And even nonstationary the baseline,
  • 31:06but the test is basically
  • 31:09now based on this de-correlated score test.
  • 31:11Once we write in this regression form,
  • 31:13we can take this de-correlated score test
  • 31:15and I'll skip over the details here
  • 31:19but basically we form this set of octagonal columns
  • 31:23and define a score test based on this
  • 31:26that looks something like this,
  • 31:28that you're looking at the effect of the correlated j
  • 31:32with basically noise term, epsilon i.
  • 31:36Both of these are driven from data based on some parameters,
  • 31:40but once you have this, this Sij
  • 31:43then you could actually now define a test
  • 31:47that basically looks at the magnitude of that Sij.
  • 31:53And that's the support that we could use.
  • 31:59And under the no, we can show that this test SUT
  • 32:02converges to a pi square distribution
  • 32:05and we could use that for testing.
  • 32:08In practice, you need to estimate these parameters.
  • 32:10We estimate them, we ensure that things still work
  • 32:13with the estimated parameters
  • 32:15and still so that you have can register pi squared.
  • 32:19And you can also do confidence and all this sector.
  • 32:24Maybe I'll just briefly mention
  • 32:26that this also has the usual power that we expect
  • 32:29that you can study power of this as a local alternative.
  • 32:35And this gives us basically how that we would expect.
  • 32:41And simulation also behaves very close
  • 32:45to the oracle procedure that knows which neurons
  • 32:47acting with other.
  • 32:50What we've done here is that
  • 32:51we've looked at increasing sample size
  • 32:54or own length of the sequence from 200 to 2,000
  • 32:58and then we see that sort of type one error
  • 33:01becomes pretty well controlled as time increases.
  • 33:05The pink here is oracle.
  • 33:06The blue is our procedure.
  • 33:08The power also increases as the sample size increases.
  • 33:14And also look at the coverage of the confidence involved.
  • 33:18Both for the zeros and non zeros,
  • 33:21the coverage also seems to be well behaved.
  • 33:26This is simple setting of simulation but that looks like
  • 33:32it's not too far actually in application
  • 33:35that we've also looked at.
  • 33:38And in particular we've looked at some data
  • 33:42paper that was published in 2018 in nature
  • 33:45when they had looked at activation patterns of neurons
  • 33:50and how they would change with and without laser.
  • 33:54And at the time this was like the largest,
  • 33:57so they had multiple device that they had looked at,
  • 34:00and this was the largest region
  • 34:02that they had looked at had 25 neurons.
  • 34:04The technology has improved quite a bit.
  • 34:06Now there's a couple of hundred neurons
  • 34:08that they could measure,
  • 34:09but this was 25 neurons.
  • 34:10And then what I'm showing you are the activation patterns
  • 34:14without laser and with laser
  • 34:16and not showing the edges that are common
  • 34:19between the two networks.
  • 34:20I'm just showing the edges are different
  • 34:21between these networks.
  • 34:23And we see that these betas,
  • 34:25some of them are clearly different.
  • 34:28In one condition the coefficient covers zero
  • 34:32and the other conditions not cover.
  • 34:33And that's why you're seeing these difference in networks.
  • 34:36And that's similar to what they had observed
  • 34:39based on basically correlation that as you activate
  • 34:43there's more connectivity among these neurons.
  • 34:49Now in the actual experiments,
  • 34:51and this is maybe the last 15 minutes or so by top,
  • 34:57in the actual experiments, they don't do just a simple
  • 35:00one shot experiment because they have to implant
  • 35:03this device.
  • 35:06This is data of a mouse.
  • 35:08They have to implant this device on mouse's brain.
  • 35:11And so what they do is that they actually,
  • 35:13once they do that and sort of now with that camera,
  • 35:16they just measure activities of neurons.
  • 35:18But once they do that, they actually run
  • 35:20a sequence of experiments.
  • 35:23It's never just a single experiment or two experiments.
  • 35:25What they do is that they, for example,
  • 35:28they show different images, the mouse
  • 35:31and they see the activation patterns of neurons
  • 35:34as the mouse processes different images.
  • 35:36And what they usually do is that sort they show an image
  • 35:38with one orientation and then they have a washout period.
  • 35:42They show an image with different orientation,
  • 35:44they have a washout period.
  • 35:45They show an image with a different orientation
  • 35:47and then they might use laser
  • 35:50in combination of these different images et cetera.
  • 35:53What they ended up doing
  • 35:54is that they have many, many experiments.
  • 35:56And what we expect is that the networks
  • 35:59in these different experiments
  • 36:00to be different from each other
  • 36:02but maybe share some commonalities as well.
  • 36:04We don't expect completely different networks
  • 36:06but we expect somewhat related networks.
  • 36:09And over different time segments
  • 36:13the network might change.
  • 36:15In one segment it might be that and the next segment
  • 36:19it might change to something different
  • 36:20but maybe some parts of the network structure are like.
  • 36:25What this does is that it sort of motivates us
  • 36:27to think about join the estimate in these networks
  • 36:29because each one of these time segments
  • 36:31might not have enough observation to estimate accurately.
  • 36:35And this goes back to the simulation results
  • 36:36that I showed you, that in order to get to good control
  • 36:41of type one error and good power,
  • 36:43we need to have decent number of observations.
  • 36:45And in each one of these time segments
  • 36:47might not have enough observations.
  • 36:50In order to make sure that we get high quality estimates
  • 36:54and valid inference,
  • 36:57we need to maybe join the estimations
  • 37:00in order to get better quality estimates and influence.
  • 37:11That's the idea of the second part
  • 37:13of what I wanna talk about going beyond
  • 37:17the single experiment and trying to do estimation
  • 37:19and inference, and multiple experiments of similar.
  • 37:22And in fact in the case of this paper by and Franks
  • 37:26they had, for every single mouse,
  • 37:30they had 80 different experimental setups
  • 37:33with laser and different durations
  • 37:35and different strengths.
  • 37:37It's not a single experiment for each mouse.
  • 37:39It's 80 different experiments for each mouse.
  • 37:42And you would expect that many of these experiments
  • 37:44are similar to each other
  • 37:45and they might have different degrees of similarities
  • 37:47with each other that might need to take into account.
  • 37:53Then the goal of the second part is do joint estimation
  • 37:56of inference for settings where we have multiple experiments
  • 37:59and not just a single experiment.
  • 38:02To do this, we went back to basically
  • 38:05that destination that we had
  • 38:07and previously what we had was the sparsity type penalty.
  • 38:11What we do is that sort of now we added
  • 38:12a fusion type penalty.
  • 38:14Now we combine the estimates in different experiments.
  • 38:19And this is based on past work that I had done
  • 38:22with the the post
  • 38:24but the main difference in this board is that
  • 38:28now we wanna allow these estimates
  • 38:32to be similar to each other
  • 38:33based on a data-driven notion of similarity.
  • 38:36We don't know which experiments
  • 38:37are more similar to each other.
  • 38:40And we basically want the data to tell us which experiments
  • 38:43should be more similar to each other, should be combined
  • 38:46and not necessarily find that a priority person
  • 38:51usually don't have that information.
  • 38:53These data-driven weights are critical here,
  • 38:57and we drive these data-driven weights
  • 38:59based on just simple correlations.
  • 39:01We calculate simple correlations.
  • 39:02The first step we look to see which one of these conditions,
  • 39:05the correlations are more correlated with each other,
  • 39:09more similar to each other
  • 39:11based on these correlations.
  • 39:13And we use these cost correlations to then define ways
  • 39:17for which experiments should be more closely used
  • 39:20with each other.
  • 39:21And estimates on which experiments
  • 39:22should be more closely used.
  • 39:25And I leave that in terms of details
  • 39:29but in this similar setting
  • 39:32as what I had explained before
  • 39:34in terms of experimental setup for this,
  • 39:37I'm sorry, in terms of simulation setup,
  • 39:39there are 50 neurons in network
  • 39:42from three different experiments in this case
  • 39:44of three different lengths,
  • 39:45and we use different estimators.
  • 39:48And what we see is that sort of when we do this fusion,
  • 39:51we do better in terms of the number of two positives
  • 39:54for any given number of estimated edges
  • 39:57compared to separately estimating
  • 39:59or compared to sort of other types of fusions
  • 40:02that what one might consider.
  • 40:06Now, estimation is somewhat easy.
  • 40:10The main challenge was to come up
  • 40:12with these data-driven weights.
  • 40:14The main issue is that if you wanted to come up with
  • 40:19valid infants in these settings,
  • 40:21when we have many, many experiments,
  • 40:24then then we would have very low power if we're adjusting,
  • 40:27for example, from all comparison using FDR, FWER,
  • 40:31false discovery rate or family-wise error rate,
  • 40:35we have p squared times MS.
  • 40:37And so we have a low power.
  • 40:40To deal with this setting, what we have done
  • 40:42is that we've come up with a hierarchical testing procedure
  • 40:45that avoids testing
  • 40:50all these p squared times M coefficient.
  • 40:52And the idea is this,
  • 40:53the idea is that if you have a sense of which conditions
  • 40:57are more similar to each other,
  • 40:59we construct a very specific type of binary tree,
  • 41:03which basically always has a single node
  • 41:07on the left side in this case.
  • 41:09And then we start on the top of that tree
  • 41:11and and test for each coefficient.
  • 41:13We first test Albany experiments.
  • 41:16If you don't reject, then you stop there.
  • 41:18If you reject then we test one, and two,
  • 41:22three, and four separately.
  • 41:25If you reject one, then we've identified the non
  • 41:28make the non zero edge.
  • 41:30If you reject two, three, four, then we go down.
  • 41:34If you don't reject two, three, four, we stop there.
  • 41:36This way we stop at the level that is appropriate
  • 41:39based on data.
  • 41:42And this this ends up especially in sparse networks,
  • 41:44this ends up saving us a lot of tests
  • 41:49and gives us significant improvement in power.
  • 41:51And that's shown in the simulation
  • 41:53that you end up, if you don't do this,
  • 41:57your power decreases as the number of experiments increases.
  • 42:01And in this case you've gone up to 50 experiments
  • 42:04as I mentioned.
  • 42:04The golden and facts paper has about 80.
  • 42:07Whereas if you don't do that
  • 42:09and if your network sparse actually power,
  • 42:12you see that by combining experiments,
  • 42:15you actually gain power
  • 42:16because you're incorporating more data.
  • 42:19And this is more controlling the family-wise error rate.
  • 42:22And both methods control the famil-wise error rate.
  • 42:25We haven't developed anything for FDR.
  • 42:27We haven't developed theory for FDR
  • 42:29but the method also seems to be controlling FDR
  • 42:32in a very stringent way actually.
  • 42:35But we just don't have theory for FDR control
  • 42:38'cause that becomes more complicated.
  • 42:46I'm going very fast because of time
  • 42:47but I'll pause for a minute.
  • 42:49Any questions.
  • 42:53Please.
  • 42:54<v ->What do you think about stationary</v>
  • 42:56of the Hawkes process in the context?
  • 42:58Whether it's the exogenous experimental forcing
  • 43:01and like over what timescale did that happen
  • 43:03in the stationary, the reasonable?
  • 43:04<v ->Yeah, that's a really good question.</v>
  • 43:11To be honest, I think these hard processes
  • 43:13are most likely non stationary.
  • 43:14The two mechanisms of non stationary that could happen.
  • 43:20One, we try to account for it.
  • 43:22I skipped over it but we tried to account
  • 43:25for one aspect of it by allowing the baseline rate
  • 43:28to be time varying.
  • 43:38Basically we allow this this new i to be a function of time.
  • 43:43Baseline rate for each neuron is varying over time.
  • 43:48And the hope is that, that would capture
  • 43:49some of the exogenous factors that might influence overall.
  • 43:56It could also be that the data are changing over time.
  • 44:00That sort of we haven't done or it could in fact be that
  • 44:06we have abrupt changes
  • 44:10in patterns of either activation or the baseline over time,
  • 44:15but sort all of a sudden something completely changes.
  • 44:17We have piecewise stationary, not monotone sort of,
  • 44:22not continuous, not stationary.
  • 44:24We have piecewise.
  • 44:26We have experimental that's happening,
  • 44:28something happening and then all of a sudden
  • 44:30something else is happening.
  • 44:31This eventually would capture maybe plasticity
  • 44:35in these neurons to neuroplasticity to some extent
  • 44:39that sort of allows for changes of activity over time,
  • 44:42but beyond that we haven't done any.
  • 44:45There's actually one paper that has looked
  • 44:47at piece stationary for these hard processes neuron.
  • 44:52It becomes a competition, very, very difficult problem,
  • 44:56especially the person becomes very difficult problem.
  • 44:59But I think it's a very good question.
  • 45:03Aside from that one paper much else that has done.
  • 45:11<v ->Hi, thank you professor for the sharing.</v>
  • 45:13I have a question regarding the segmentation
  • 45:17'cause on the video you showed us,
  • 45:19the image is generally very shaky.
  • 45:23In the computer vision perspective,
  • 45:25it's very hard to isolate which neuron actually fired
  • 45:28and make sure that it's that same neuron fires over time.
  • 45:32And also the second question is that the mouse
  • 45:36factory, the model you've mentioned is like 20 neurons,
  • 45:39but in the picture you show us there's probably
  • 45:42thousands of neurons.
  • 45:42How do you identify which 20 neurons to look at?
  • 45:46<v ->Very good questions.</v>
  • 45:48First of all, before they even get to segmentation,
  • 45:51they need to do what is known as,
  • 45:55and this is actually common in
  • 45:59time series and sort of (indistinct).
  • 46:03In registration.
  • 46:07What this means is that you first need to register
  • 46:09the images so that they're basically aligning correct.
  • 46:13Then you can do segmentation.
  • 46:14If you remember first five,
  • 46:17but if you remember had a couple of dots
  • 46:20before getting to segmentation.
  • 46:21There are a couple of steps that need to happen
  • 46:23before we even get to segmentation.
  • 46:25And part of that is registration.
  • 46:27Registration is actually a nontrivial pass
  • 46:29to make sure that the vocations don't change.
  • 46:32You have to right otherwise that the algorithm
  • 46:36will get confused.
  • 46:37First there's a registration that needs to happen
  • 46:41and some background correction
  • 46:43and sort of getting noise correctly and everything.
  • 46:45And then there's registration.
  • 46:47And then after that you could do segmentation,
  • 46:49identifying neurons.
  • 46:50Now, the data that they showed you was a data
  • 46:52from actually cats video that showed it's different,
  • 46:56this holding and banks data that they showed you here.
  • 47:00This one had 25 neurons that they had.
  • 47:03This is an older technology.
  • 47:04It's an older paper that they only had 25 neurons,
  • 47:07that they had smaller regions that they were capturing.
  • 47:10The newer technologies, they were capturing
  • 47:11the larger region a couple hundred.
  • 47:14I think the most I've seen
  • 47:16was about a thousand or so neurons.
  • 47:17I haven't seen more than a thousand neurons.
  • 47:20<v ->Thank you.</v>
  • 47:25<v ->Okay, so I'm close to the end of my time.</v>
  • 47:29Maybe I'll have the remaining minutes or so
  • 47:34I'll basically mention that sort of
  • 47:37give by this saying we have joint estimation
  • 47:42to the data from holding advance.
  • 47:43And then we also see that something that is not surprising
  • 47:48perhaps that the no laser condition,
  • 47:51the net yield is more different
  • 47:53than the two different magnitudes of laser,
  • 47:55maybe 10, 20 sort of meters and so square.
  • 48:02You see that so least two are more similar other
  • 48:05than the no laser condition.
  • 48:10And I'm probably gonna stop here
  • 48:12and sort of leave a couple of minutes for questions,
  • 48:14additional questions, but I'll mention that
  • 48:15so the last part I didn't talk about was to see if we could
  • 48:19go beyond prediction.
  • 48:20Could we use this and mention that sort major causality
  • 48:23is not really causality prediction.
  • 48:27It could we go beyond prediction,
  • 48:31get a sense of which neurons are impacting other neurons.
  • 48:35And I'll briefly mention that sort of there are two issues
  • 48:39in general going beyond prediction causality.
  • 48:45We have a review paper that tlaks about this one,
  • 48:47issue is subsampling.
  • 48:48And that you don't have enough resolution.
  • 48:51And the other issue is where you might have
  • 48:53limited processes that make it difficult
  • 48:55to answer all the questions.
  • 48:57Fortunately the issue of self sampling,
  • 49:00which is a difficult issue in general is not present,
  • 49:04but is not very prominent thinking these classroom
  • 49:09and imaging data
  • 49:10because you have continuous time videos.
  • 49:14And subsampling should not be a big deal in this case.
  • 49:19However, we observe a tiny faction
  • 49:23of the connection of the brain.
  • 49:25The question is, can we somehow account
  • 49:27for all the other neurons that we don't see?
  • 49:31The last part of this work is about that.
  • 49:34And I'll sort of jump to the end
  • 49:38because I'll put a reference to that work.
  • 49:41That one is published in case you're interested
  • 49:43in a paper that sort of looks at
  • 49:49whether we could go beyond prediction,
  • 49:51whether they actually identify causal links
  • 49:54particularly neurons.
  • 49:56And I think I'm gonna stop here and thank you guys
  • 50:00and I'm happy to take more questions.
  • 50:17<v ->Naive question.</v>
  • 50:19Biologically, what is a network connection here?
  • 50:24Because they're not, I'm assuming they're not
  • 50:27growing synapses or not based on the laser.
  • 50:33(indistinct)
  • 50:36(group chattering)