Skip to Main Content

YSPH Biostatistics Seminar: “Enhancing Biostatistics and Health Informatics Research Through Collaborative Cloud-Based Data Science Tools"

October 12, 2023
  • 00:02<v ->All right.</v>
  • 00:04In the interest of time, let's go ahead and get started.
  • 00:08Hey everybody,
  • 00:09thank you so much for coming today and this week seminar.
  • 00:14It's my pleasure to introduce Stephen Larsson
  • 00:16and Adria Haimann from Metacell.
  • 00:20This is a few words of context here.
  • 00:24We've talked about, we've had people,
  • 00:26we started this semester with somebody from the hospital.
  • 00:28We've had people from academia,
  • 00:30we've had people from pharmaceutical companies.
  • 00:33And so very excited to present something different.
  • 00:37So Metacell is a company that works
  • 00:40in sort of the research space.
  • 00:43Near and dear to my heart.
  • 00:44They've been, from their beginning, I think,
  • 00:46very active in the computational neuroscience community.
  • 00:52We both contributed to a project called NetPyNE
  • 00:56for building models of computational neurons.
  • 01:01But more broadly, they work in the greater
  • 01:04health informatics space.
  • 01:06And they're going to tell us a little bit
  • 01:10about how we can enhance biostatistics
  • 01:12and health informatics research
  • 01:13through collaborative cloud-based data science tools.
  • 01:16So let's welcome them.
  • 01:20<v ->Thank you very much. Good afternoon everyone.</v>
  • 01:22I can see some of the back of your heads,
  • 01:24so I can imagine that I'm also, you know,
  • 01:26virtually looking at your faces.
  • 01:28Thanks so much for having us.
  • 01:30I'm Adria Haimann and I work alongside Stephen at MetaCell.
  • 01:33And as already mentioned, today we're gonna share with you
  • 01:35some insights into how academics are using cloud-based
  • 01:39collaboration tools to enhance their research.
  • 01:42But before I kind of begin with this,
  • 01:43I wanna provide you with some context.
  • 01:45So, 10 years ago I was in your position,
  • 01:48I was studying health economics
  • 01:50at the London School of Economics,
  • 01:52and I had joined a research team
  • 01:54at the European Observatory for Health.
  • 01:56And I was relatively new to this field
  • 01:57and kind of found myself in a Catch 22
  • 02:00that maybe you can relate to.
  • 02:02So I wanted to know how can someone or a student or postdoc
  • 02:05or researcher discover the best way to collaborate
  • 02:08on their research and use new tools
  • 02:10if you have fairly minimal experience,
  • 02:12neither academia or in industry.
  • 02:14So that's essentially what we want to show you today
  • 02:17and what we'd love to share with you,
  • 02:19if you could go to the next slide,
  • 02:21which is kind of a collection of key topics
  • 02:24of how researchers are doing just that,
  • 02:27while also getting the most out of their data.
  • 02:29So during this seminar,
  • 02:31we're gonna cover different methods that you can share
  • 02:33data analysis and introduce you to a specific cloud-based
  • 02:36collaboration platform
  • 02:38that we've created called Cloud Workspaces.
  • 02:41And then we'll run you through some examples
  • 02:43of how researchers are using this platform,
  • 02:45as well as how we've formed an industry partnership.
  • 02:48And then lastly, we wanna show you kind of other ways
  • 02:50that this tool can be used in academic settings.
  • 02:53And then of course, we'll open it up to you guys
  • 02:55and encourage you to ask us questions
  • 02:57on any of these topics.
  • 02:59So I'll hand over to Stephen now.
  • 03:02<v ->Thanks Adria for that great introduction.</v>
  • 03:04And hello to all of you.
  • 03:07I currently see you as tiny, tiny pixels on my screen
  • 03:12because of the way this is viewed.
  • 03:13So as much as I'd love to be there in person
  • 03:16and looking into the whites of your eyes,
  • 03:17I'm not gonna get that chance.
  • 03:18But, I think we have a really good robust discussion
  • 03:23for you guys that I hope you'll find very interesting.
  • 03:27And thank you very much again to Robert for the invitation.
  • 03:30So similar backstory on myself,
  • 03:35I went through undergraduate training at MIT
  • 03:40in computer science, did a master's in AI
  • 03:43before it was cool again,
  • 03:45and then shipped off to UCSD for a PhD
  • 03:51in neuroscience with a computational specialization.
  • 03:54So very much familiar with the academic experience
  • 03:59and I'm really excited to share with you
  • 04:06some of the things that I've learned since leaving academia.
  • 04:09And one of those things
  • 04:10has been to start this company, MetaCell,
  • 04:14which I basically started as I was wrapping up my PhD
  • 04:16and I kind of realized that I wanted to serve science
  • 04:22in a different way than was gonna be possible
  • 04:27just within the confines of academia
  • 04:29because I realized that I was a builder
  • 04:31and to build software that could,
  • 04:36software tools that could be useful to, you know,
  • 04:40tools that I would wanted to have had as myself,
  • 04:43a graduate student.
  • 04:44I would need to kind of put a professional team of folks
  • 04:48together that, you know, really came outta industry
  • 04:51and that are kind of high hard to higher end academia.
  • 04:54So the story of this slide is, since then,
  • 04:58all the different great groups
  • 04:59that we've had a chance to work with,
  • 05:01and you'll see a really kind of motley crew of logos
  • 05:05that are present here from, you know,
  • 05:08really, really big pharma companies
  • 05:12like Yale, you guys are on here,
  • 05:14other universities that we've had the chance to work with,
  • 05:18and then biotech companies,
  • 05:21med device companies that we work with some,
  • 05:25some of the US lots internationally.
  • 05:29And realizing that, you know,
  • 05:31the core thing that unifies all the work
  • 05:34that we've been doing over time is the way
  • 05:36that sort of math and computation can help us
  • 05:40understand the life sciences.
  • 05:41So hence I come to you today in a biostatistics seminar
  • 05:46to talk about, you know,
  • 05:47some of the other pieces of the puzzle
  • 05:50that go into advancing the life sciences in that way.
  • 05:56So, let's start with a really simple, simple example, right?
  • 06:04So let's say you're doing some kind of analysis
  • 06:08on some kind of bio data, okay?
  • 06:13Perhaps in the statistics context, you're using SaaS.
  • 06:17In a computational neuroscience context,
  • 06:20you may be using Python and the Python suite of tools.
  • 06:26Some in the statistics field are using R open source,
  • 06:29you know, statistics packages.
  • 06:30Whatever it is, you've got some data, you know,
  • 06:33maybe you're analyzing it on behalf of yourself,
  • 06:35maybe you're analyzing on behalf of your lab,
  • 06:37the group that you're working with.
  • 06:38Maybe you're analyzing it in terms of a company.
  • 06:41Whatever it is,
  • 06:42you wanna share that data analysis with somebody else.
  • 06:44You're probably gonna have to gather
  • 06:47some history of those commands together.
  • 06:50Maybe it's packaged up as a script, maybe not.
  • 06:53You're gonna send that file
  • 06:54to somebody else very often.
  • 06:57And then you're also gonna wanna somehow
  • 06:59collect the outputs of that, right?
  • 07:01The figures, the diagrams, the summary statistics,
  • 07:05the result of T-tests, you know,
  • 07:08things like this, right?
  • 07:09And send that output somewhere, right?
  • 07:12So, you know, that is a problem time immemorial.
  • 07:16And you know, as long as I've been, you know,
  • 07:20working in this space still, you know,
  • 07:23it's very common to just do this
  • 07:25and it's maybe send this over email, right?
  • 07:29It's still a practice that I'm sure you know, happens.
  • 07:32And so, and that's probably just fine, you know,
  • 07:35in many small circumstances.
  • 07:37But as that scales up, there's problems of reproducibility,
  • 07:42there's problems of, you know,
  • 07:44keeping track of who sent what.
  • 07:46Email is not a great file management system.
  • 07:48So we've been thinking a lot over the course of our company,
  • 07:55which is, we've been around now,
  • 07:56this is our 13th year about how, you know,
  • 08:00the cloud and the internet basically can come into that
  • 08:02in any better way than sending email along.
  • 08:05And so we've thought a lot about, you know,
  • 08:08what starts to happen when there's a computer that lives
  • 08:11in the cloud that multiple people can jump into and join.
  • 08:15And what is, you know, how does that work in general?
  • 08:18It's something that we're not only just us doing, right?
  • 08:22This is an idea that's been there for a while.
  • 08:24Anybody familiar with like, say Python Notebooks, right,
  • 08:27are aware of this idea.
  • 08:29There's tools like Google Colab,
  • 08:31and then we've even been talking to major universities,
  • 08:34like we've been having a conversation
  • 08:35with Harvard Medical School,
  • 08:37where they've been working collaboration with Amazon
  • 08:39to kind of work together with them to set up computers
  • 08:43that are in the cloud.
  • 08:44Similarly, of course, there's gonna be what happens with,
  • 08:49at like, at your local university
  • 08:50with your local computing infrastructure.
  • 08:52Typically that's based around supercomputers that are there
  • 08:56for doing like really powerful computations or calculations.
  • 08:59Things that are very data intensive.
  • 09:01A workspace in the cloud is sort of in between.
  • 09:02So it's kind of like, you know,
  • 09:05just a laptop that isn't your physical laptop,
  • 09:09but it's like a laptop that's somewhere else in the cloud
  • 09:11that you can log into and do some analysis with.
  • 09:14And it basically lives as long as you wanna do that analysis
  • 09:16and then it goes away
  • 09:18if you don't need that analysis anymore
  • 09:20or it can stay there as long as your lab is around, right?
  • 09:22And then go away if you don't need it anymore.
  • 09:25So the idea is then in this story,
  • 09:27instead of just gathering the history of commands,
  • 09:29sending the file and sending the output of the file,
  • 09:31what if, right you could do all that in the context
  • 09:34of a computer that multiple people
  • 09:37can join and look at, right?
  • 09:39Work in that same environment.
  • 09:40When you log out,
  • 09:41it's exactly where you left it, right?
  • 09:43Like if you know your computer gets misplaced
  • 09:47or you drop it, you know, off a bridge into a river,
  • 09:50like, doesn't matter 'cause
  • 09:51all this stuff is preserved, right?
  • 09:54So, how does that idea start to change the basic practice
  • 09:57of interacting with data and doing analysis like this
  • 10:02if you were to change that one variable okay?
  • 10:05So that's sort of the starting premise for our chat today.
  • 10:09So, you know, what that might look like is, you know,
  • 10:13a session one-on-one or two-on-one with multiple people
  • 10:16where you get, you know, perhaps one of you in the future.
  • 10:22In the case that we've been doing in our company,
  • 10:24one of our staff members, who has experience
  • 10:28in doing a different kind of data analysis.
  • 10:32In our case, we work on a variety of problems,
  • 10:36but one of the major ones we worked on
  • 10:37is like the imaging of calcium signals
  • 10:42in neural tissue okay?
  • 10:45But you know, you might be on a call like this one and just
  • 10:49the same way that you might meet with your lab members on a
  • 10:50Zoom call, you might meet with someone
  • 10:54with experience in data analysis or biostatistics
  • 10:56that is not in your lab or not in your even organization.
  • 11:01It might be somewhere remote,
  • 11:02maybe at another university or in a company like ours.
  • 11:06But what they might get as the experience of that is
  • 11:13jointly logging into this workspace that lives in the cloud.
  • 11:17And if SaaS is the thing you wanna use,
  • 11:20you might find a whole SaaS instance there
  • 11:22in a desktop that you can log into.
  • 11:25But the point being that multiple people now can type on it
  • 11:27as opposed to like physically handing your laptop around
  • 11:30in the lab or even just screen sharing it
  • 11:33in some kind of a lab meeting, right?
  • 11:35It's actually allowing for people to jump into the same
  • 11:38application and literally like trade off
  • 11:40on like typing commands into it.
  • 11:43Kind of like what you get with a Google Document
  • 11:46or a Google Spreadsheet, right?
  • 11:48That real-time collaboration,
  • 11:49but now for any kind of application.
  • 11:52So that's one experience you might have.
  • 11:54Not just SaaS, right?
  • 11:56So a Jupyter Notebook, as I mentioned before,
  • 11:58is another thing that you can use.
  • 11:59And those of you who might be using,
  • 12:01again, the more open source technologies,
  • 12:03if you might be using R Statistics or using Python
  • 12:05or whatnot, you'd be familiar with, you know,
  • 12:08a Jupyter Notebook.
  • 12:11So it's based around, you know,
  • 12:13this idea of putting a computer in the cloud,
  • 12:16multiple folks logging into it,
  • 12:18and then being able to sort of transport
  • 12:21your expertise around the world.
  • 12:25Because in addition to the knowledge of doing analysis
  • 12:31being shipped around,
  • 12:32data can also come into this workspace
  • 12:34as an intermediate space that's private to a given lab,
  • 12:39but allows for a different kind of model on sharing data
  • 12:43where it sort of stays under the control of the lab,
  • 12:47you know, whoever puts it there can take it back,
  • 12:49that kind of thing.
  • 12:51Okay so we've been exploring this model
  • 12:54and we've also been talking to other organizations
  • 12:57and universities about this model and how to use it,
  • 13:00how to implement it, right?
  • 13:02As I mentioned, we've been talking to folks like
  • 13:05at Harvard Medical School that partner with Amazon
  • 13:08to bring these sorts of instances into their
  • 13:11labs and what can be done with it.
  • 13:13So I'm gonna wanna talk a little bit
  • 13:14about like some of those details,
  • 13:16and I'm saying it here in the context of our product,
  • 13:19but I'm not trying to sell you anything.
  • 13:20I'm really trying to talk about it
  • 13:21more in the context of what can be done.
  • 13:24So thinking about it, like,
  • 13:28so I mentioned SaaS as an example.
  • 13:29I mentioned Jupyter Notebooks as an example,
  • 13:31but there might be other kinds of software
  • 13:34that are more particular to a use case,
  • 13:36like MATLAB's another one that could be installed.
  • 13:38But there might be even more specific software
  • 13:40that might need to be set up or run.
  • 13:44Sometimes, for example, survey software
  • 13:47where you might collect data from a very particular kind of
  • 13:52survey system and you need something to work with it.
  • 13:54So imagine that,
  • 13:55like for the use case that you might have, right,
  • 13:58you could have a workspace that is set up
  • 14:02so that all that software comes pre-built
  • 14:03once you set it up.
  • 14:05Much like, you know, having laptops
  • 14:07that have come pre-configured with a certain set of tools,
  • 14:10but instead of handing out physical laptops,
  • 14:12it's on the cloud.
  • 14:14The virtual collaboration,
  • 14:15I think I've gone through a lot, the multiple workspace,
  • 14:18I think I mentioned also.
  • 14:20Data security I kinda mentioned, you know,
  • 14:23anybody who's doing data analysis
  • 14:26with anybody who has, you know,
  • 14:29talking to somebody that they weren't the ones
  • 14:30to collect it, I'm sure has run into challenges
  • 14:32where folks are reticent to, you know, share data.
  • 14:37So that's why in this context,
  • 14:38it's really important to note that like, you know,
  • 14:41we can lock that environment down
  • 14:42and make sure that only the people that can log into it
  • 14:44have access to it, that's a really important point.
  • 14:47So it's not really like the data
  • 14:49are going out of somebody's control.
  • 14:51Again, they're kept in a place
  • 14:52where anybody who wants to can remove
  • 14:53that data again and delete it.
  • 14:57And then if there were to be very computationally aggressive
  • 15:01things to do, it's very easy to scale it up.
  • 15:05And that's something that folks also like.
  • 15:10So how, you know, how are ways that this kind of workspace
  • 15:14can support biostatistics research
  • 15:17and data analysis in general.
  • 15:18So I mentioned data science as a service
  • 15:20a little bit in this example.
  • 15:22So this would be the case where any organization
  • 15:26who say doesn't have biostatistics
  • 15:29or data science expertise local to them
  • 15:32might be interested in sort of renting time
  • 15:36or having some part-time person come in to help with that.
  • 15:40And that's a model that we've seen work well
  • 15:42both for labs and for companies.
  • 15:44One way in which labs really like it is new PIs
  • 15:49with a startup package that just, you know,
  • 15:51first few weeks into their appointment
  • 15:54with an R one, right, no staff yet.
  • 15:57Nobody, but they're coming in with data from their previous,
  • 16:03you know, from their postdoc basically.
  • 16:06And what do they do, right?
  • 16:07They need to write grants, they need to like hire staff,
  • 16:10they need to do all these things.
  • 16:12So we've actually found labs are very happy
  • 16:15in that circumstance just to get going, you know,
  • 16:19to be like, "Hey, I have this data,
  • 16:20I haven't analyzed it yet.
  • 16:21I really wanna put in my grant proposals.
  • 16:23I just need somebody to kind of sit with me virtually
  • 16:27and run through this data,
  • 16:30so that I can get these figures
  • 16:33made and get my grant out, right?"
  • 16:34And I just don't have time
  • 16:36to bring on a full person to do that.
  • 16:37So data sciences service can be very useful for that.
  • 16:40Data standardization and sharing as a service.
  • 16:42So, you know, I'm not sure how much it's affecting folks
  • 16:46in the room, but the NIIH over time
  • 16:48has gotten increasingly serious about making data sharing
  • 16:55happen for real for real,
  • 16:56and not for fake for real, right?
  • 16:58And so this year in particular,
  • 17:01a new policy from NIIH has come out, DMS policy,
  • 17:05where they're really, really asking for even, you know,
  • 17:09grant proposals to have a whole data management
  • 17:11strategy figured out upon submission.
  • 17:15And even, you know, saying you need to set aside
  • 17:19some budget for that
  • 17:20'cause it turns out data sharing doesn't happen for free,
  • 17:22doesn't happen for free, you know,
  • 17:24for PIs for their time, right?
  • 17:26So that's also something where, okay,
  • 17:29I don't have the expertise to figure out
  • 17:30which of the billion databases I might share my data in.
  • 17:34Could somebody come in and help do that?
  • 17:36Well how do you do that?
  • 17:37You know, when I did work in the neuroinformatics
  • 17:41space as a graduate student
  • 17:43and I was trying to help figure out for neuroscientists
  • 17:47how to get data that they had, you know, collected
  • 17:50in a very laborious process of experimental collection,
  • 17:55was trying to help them share their data
  • 17:57'cause they wanted to comply with these policies
  • 17:59even back then, you know, very frequently I would
  • 18:04get the challenge of like,
  • 18:05"Yeah, it's in a hard drive under my desk, right?
  • 18:08Physical hard drive sitting under my desk, right?"
  • 18:10Like, okay, so you can go pick it up and like take it away
  • 18:14and do something with it.
  • 18:15But you know, they don't have the expertise, you know,
  • 18:19locally to even know, okay, now we're gonna plug it in
  • 18:22and we gotta look through it
  • 18:23and like, oh, the PhD student is left three years ago.
  • 18:27And like, how do I do that?
  • 18:27So the idea of, okay, if all we can do is like take that
  • 18:31hard drive from under the desk
  • 18:33and like plug it in the cloud, share it on Dropbox,
  • 18:37okay, something like this or you know,
  • 18:39have a conduit to get it to the cloud,
  • 18:41share that folder in a workspace online
  • 18:43and then have somebody else that does this all the time
  • 18:47like go through all that and do their best to start,
  • 18:49you know, documenting what they find,
  • 18:51maybe raising questions that they might find, you know,
  • 18:54to present to the PI,
  • 18:55"Hey, I know your PhD student left three years ago,
  • 18:58but you know, can you tell me a little bit
  • 18:59about this experimental methodology?"
  • 19:01There's now at least a hope that you can start,
  • 19:03you know, standardizing that data,
  • 19:05sharing it in a better way,
  • 19:06making the NIIH not come kick down your door
  • 19:09with the data sharing police force
  • 19:11that I'm sure they're setting up now.
  • 19:14Okay probably not.
  • 19:16Okay a third way is through workshops.
  • 19:21And I'll have some specific examples
  • 19:23a little bit later about this one.
  • 19:25But if you think about, you know,
  • 19:27the experience of either physically traveling
  • 19:30or doing what we're doing here
  • 19:31and then being exposed to software, right?
  • 19:36It's one thing to have slides show
  • 19:37you pretty pictures of what software looks like.
  • 19:39And it's another thing to say basically like,
  • 19:43"Hey, log into, like go right now on your laptops
  • 19:47and go hit this address"
  • 19:50and like, here's your login and like while I'm explaining it
  • 19:53to you, check it out, play with it, right?
  • 19:57So we've actually found that also to be a really valuable
  • 20:00way to do an extra level of education and demonstration,
  • 20:05especially for tools built in academia,
  • 20:09which generally have a pretty small audience, right?
  • 20:11Not a lot of people use them maybe necessarily,
  • 20:14or it's like a very niche community.
  • 20:16So the total number of humans is not great.
  • 20:18So to have the ability right now in a live session
  • 20:21to be like, let me show you this software you log in right
  • 20:24now, play with it can move the needle a lot on getting folks
  • 20:27to use stuff that that there will really be tools
  • 20:31that they will actually help them a lot.
  • 20:33And then lastly, you know,
  • 20:35collaborations between labs, right?
  • 20:38Hey, we just set up a consortia,
  • 20:40it's a five lab consortia
  • 20:41and we're all studying this thing, right?
  • 20:44It's a collaboration between the folks that are generating
  • 20:46the data and the folks are gonna analyze the data.
  • 20:48Okay, great, we got this really smart set of mathematicians
  • 20:50who are gonna do all these great statistics, awesome.
  • 20:53How do you get the data from point A to point B?
  • 20:55Well email, right?
  • 20:58So what if you can improve that, right?
  • 21:01Or you know, the context of, you know,
  • 21:04we also find companies wanna collaborate with each other's
  • 21:06and then universities and companies wanna collaborate
  • 21:08with each other also, right?
  • 21:10So in ways that I haven't already listed,
  • 21:13but just collaborations of whatever variety.
  • 21:17So when it comes down to those things, right,
  • 21:19it's one step better than just sharing on Dropbox
  • 21:22and being like, here are the data, go check it out
  • 21:24'cause you're keeping the analysis all together, right?
  • 21:29It adds a layer of reproducibility
  • 21:31to those kinds of collaborations,
  • 21:32which are hard to match in addition to all the other things,
  • 21:36all the great best practices for reproducibility.
  • 21:40Okay so that's four ways to use cloud workspaces
  • 21:43support biostatistics research.
  • 21:47So let's, you know, I think I've kind of walked through this
  • 21:51example already verbally,
  • 21:52but I did have a slide specifically for it.
  • 21:54So like this happens in research all the time.
  • 21:57There's a lab that needs a particular analysis completed
  • 22:00and they don't have the expertise in lab.
  • 22:01What can be done?
  • 22:02So typically the alternatives are, you know,
  • 22:04bring in some student or a postdoc or collaborate
  • 22:07with a lab that has some mathematical expertise
  • 22:09to perform analysis.
  • 22:11But that can be quite time consuming, you know,
  • 22:13that might not deliver the results you're looking for.
  • 22:16Secondly, right for folks who might, you know,
  • 22:20be in a position, like I mentioned
  • 22:21with early lab set up, right?
  • 22:25Engaging some part-time data scientists from industry
  • 22:27could help work on particular problems as needed.
  • 22:31And that's interesting both perhaps
  • 22:33from the perspective of me as a company,
  • 22:35but also maybe interesting for yourselves
  • 22:38thinking about a path through industry
  • 22:41where you might be able to do biostatistics
  • 22:45for multiple organizations at once, not just one at a time.
  • 22:50And then it's also interesting,
  • 22:51as I mentioned from the perspective folks
  • 22:53that have the problem that need to get the analysis done.
  • 22:57Okay so some case studies, does this happen?
  • 23:03I sort of mentioned abstractly, it does,
  • 23:05but these are five cases that we've worked on in our company
  • 23:10and they are, many of them have a,
  • 23:14well they all have the theme
  • 23:15of being calcium imaging data, okay?
  • 23:18So here, you know, swap out biostatistics
  • 23:20for looking at data that comes from a microscope.
  • 23:23But at the end of the day,
  • 23:25that data from a microscope is basically a video stream,
  • 23:31generally black and white images
  • 23:33that then have to be post-processed.
  • 23:36And from that video stream there's a spatial component
  • 23:39of looking at a field of neurons under a microscope
  • 23:44and a time component.
  • 23:46Like how did those, you know,
  • 23:49neurons activity change over time.
  • 23:51But there's a lot of like statistical challenges
  • 23:54that have to go into that.
  • 23:55You need to separate the neurons out from each other, okay?
  • 23:58They kind of overlapped on each other.
  • 24:00So looking at a video stream, you're not always sure, right?
  • 24:04If I'm looking at one neuron or two neurons.
  • 24:06So you have to do some spatial analysis
  • 24:08to separate those out.
  • 24:09And then you wanna do some sort of peak finding over time.
  • 24:13What you kind of wanna extract out is a time series
  • 24:15of however many neurons you've detected
  • 24:17in your field of view
  • 24:19and then start to do some additional analysis.
  • 24:21And that additional analysis will be based on
  • 24:24the specifics of the experimental setup
  • 24:26and like, you know, what part of brain were you looking at?
  • 24:30What was your protocol that you applied
  • 24:33and what kind of expectations
  • 24:37do you have about the time series that you extracted?
  • 24:41So these organizations that we work with, I guess, you know,
  • 24:45four out of five are universities.
  • 24:48So DGIST is Institute of Science and Technology
  • 24:51in South Korea, McGill University in Canada,
  • 24:58University of Penn, UPenn and University of Alabama.
  • 25:04And then Maze, which is a small pharma company
  • 25:09in San Francisco and they're all doing calcium imaging work.
  • 25:14And I think we served all of these organizations
  • 25:18within the same span of about six months.
  • 25:22Each one of them had brought different data to the table.
  • 25:27They're all generally in this form of video data
  • 25:29with the calcium imaging to extract.
  • 25:33All five of them were served
  • 25:34by the same data scientist on our side,
  • 25:38gentleman whose picture you saw earlier
  • 25:41but they had very different scientific protocols, right?
  • 25:44So it wasn't necessary that one person full-time
  • 25:47over six months worked on each of these projects, right?
  • 25:50Instead we have one individual,
  • 25:52who's able to jump from project to project
  • 25:54and check back in with multiple PIs/business leaders,
  • 26:01managers to check in on the results of that, right?
  • 26:05And that person never left their home, right?
  • 26:08So our company is also fully remote, which is nice.
  • 26:13And so I think that's a really powerful demonstration
  • 26:17of what's possible for this kind of analysis,
  • 26:19whereby, you know, essentially organizations
  • 26:25in multiple different countries
  • 26:27and different continent in one case, right,
  • 26:29can all be served by the same person doing roughly
  • 26:33having roughly the same skillset of data analysis
  • 26:36but working on data that addresses very different scientific
  • 26:40questions all at the same time.
  • 26:43Okay, so that's a thing.
  • 26:47And, in each one of these, I should say
  • 26:49been done in this collaboration model that I mentioned
  • 26:51where there's one workspace per organization, right?
  • 26:57So each organization has their own workspace,
  • 26:59they log into it, they can see the results
  • 27:01of the data science work that happens.
  • 27:04They have all in one way or the other,
  • 27:06put data into the workspace, right?
  • 27:09And, they've all sort of been able to pull figures back out
  • 27:13again and direct the flow of analysis in the direction
  • 27:19that they wanted through Zoom calls,
  • 27:22like the one that I mentioned
  • 27:23generally on like a weekly basis
  • 27:25or every couple weeks check in.
  • 27:28So yeah, a little bit more about the team behind that
  • 27:34in terms of thinking about like what it takes
  • 27:35to make that happen.
  • 27:37While there is a little bit of like finding those labs
  • 27:39and figuring out that they have that problem,
  • 27:42which are not taken care of
  • 27:45by the individuals on this screen.
  • 27:46But I mentioned, I mentioned Phil, the PhD;
  • 27:50another PhD, who's worked with us
  • 27:52as data scientist is Marcus.
  • 27:55And then kind of orchestrating behind the scenes,
  • 27:57the standing up of these workspaces
  • 27:59is a software architect, Zoran.
  • 28:04Phil in the New York area, New York City area.
  • 28:07Marcus is in China and Zoran is in the Netherlands.
  • 28:13So again, interesting to think about the different
  • 28:16geographies where folks come from being able to serve people
  • 28:19in different geographies,
  • 28:21but all of them when it comes to a project,
  • 28:23like the center organizing node is a workspace.
  • 28:27That is the thing that helps
  • 28:28coordinate a lot of this together.
  • 28:31There are a few other technologies that help.
  • 28:34Those of you familiar with like a Kanban board
  • 28:37or just really any kind of task driven software,
  • 28:39you know, you can bring that to bear as well.
  • 28:42So one of the ways you can organize work a little bit better
  • 28:44than just sending emails back and forth
  • 28:46is to encapsulate each task,
  • 28:50break each task down into a card on a Kanban board.
  • 28:53We like the tool called Trello,
  • 28:56but there's lots of them out there
  • 28:58that can be used for such things.
  • 29:00And then, you know, one card per task
  • 29:02is a nice way to organize things.
  • 29:04And then using a practice from software engineering,
  • 29:07you can actually sort of estimate
  • 29:09in roughly how many hours, you know,
  • 29:12the data scientists might think it would take
  • 29:15to do a given task
  • 29:16and then use that as a way to figure out
  • 29:18like how long it's gonna take
  • 29:20to do a certain kind of analysis.
  • 29:21This is a practice that we actually use
  • 29:23across my company for all sorts of tasks,
  • 29:25not just data science,
  • 29:26really organizing kind of everything that we do
  • 29:28on the basis of making cards like this
  • 29:31and moving things across.
  • 29:32And I'm still surprised
  • 29:33how many organizations don't use this.
  • 29:36I have lots of friends in academia
  • 29:38that do this just for their labs.
  • 29:39You guys might do this in your labs, I don't know.
  • 29:40But for organizing oneself,
  • 29:44even if you do meet in person,
  • 29:46having this sort of set up in the cloud
  • 29:48can be very helpful for organizing work.
  • 29:52Not sure how new or not new this is
  • 29:54to those of you in the room, but something we use.
  • 29:57And then of course there's Slack,
  • 29:58which I think has pretty good adoption amongst academia.
  • 30:03We do find almost every lab that we talk to
  • 30:06pretty much is on Slack or some version of it.
  • 30:10Companies are using Microsoft Teams,
  • 30:12which I personally like less,
  • 30:13but you know, but we use that too.
  • 30:17But basically, you know,
  • 30:20one thing that we do that maybe others don't do
  • 30:23is to connect a Kanban board like
  • 30:26the one that you saw to spit out notifications
  • 30:28in a Slack channel at the same time,
  • 30:31which can be really nice if you are a Slack based person
  • 30:35to just like be able to see how tasks are changing
  • 30:37and evolving in the feed,
  • 30:40which then doesn't require an extra conversation, right?
  • 30:42Like "Hey, so we agreed on Monday that you were gonna,
  • 30:45you know, do that t-test on this survey data,
  • 30:50how's that going right?"
  • 30:52Well if they've moved that card,
  • 30:55which was like T-test on survey data from the to-do column
  • 30:58to the doing column,
  • 30:59a little notification's gonna pop up in Slack.
  • 31:02And then when they write a comment like, "Yep, you know,
  • 31:04I ran the test and wasn't statistically significant,"
  • 31:07then that's gonna pop up also.
  • 31:09That comment will then be relayed into Slack.
  • 31:11So then when you go back to check in,
  • 31:13you don't have to ask that question.
  • 31:13It's like, "Yep, I saw that it happened
  • 31:15and by the way I saw that it happened on Tuesday,
  • 31:18you know, now it's Wednesday, you know.
  • 31:20I forgot to check back in with you about it."
  • 31:23So like that idea of asynchronous work can happen
  • 31:25in this cloud-based context also, which again,
  • 31:29like we use also in all other parts
  • 31:31of our company can be really helpful
  • 31:33for moving projects along in lots of ways.
  • 31:37So yeah I've told you a lot
  • 31:42about a particular example then of doing work.
  • 31:44I wanna call Adria back in here
  • 31:47to extend a little bit more in a partnership example
  • 31:52that we've had some experience with.
  • 31:53So back to you Adria.
  • 31:55<v ->Thanks, so one thing that Stephen mentioned was, you know,</v>
  • 31:58another challenge we might face is,
  • 32:00okay, where do we go find people who have data that
  • 32:03they might need help with?
  • 32:04And we were thinking about where does data come from, right?
  • 32:08And so one area that data's generated
  • 32:12from is through devices and manufacturers
  • 32:15make devices that are sitting in labs.
  • 32:17So we thought of the idea of let's have discussions
  • 32:20with these manufacturers
  • 32:21and see if we could form some sort of partnership.
  • 32:24Now when you're forming a partnership in industry,
  • 32:27you need to think about why that would benefit both sides
  • 32:29in order to kind of engage your perspective partner
  • 32:33as to why they should talk to you right?
  • 32:34So one thing that we identified was that
  • 32:37a key aim of manufacturers
  • 32:39is to provide additional support
  • 32:41to their customers or make sure,
  • 32:43hey, I have a customer or a lab that has data
  • 32:45and then what if there's an aspect of their data
  • 32:48they don't know how to do something
  • 32:51or they don't know what to do,
  • 32:52maybe they'll stop using my device down the line
  • 32:54because the data's just not useful to them at this point
  • 32:57'cause they're lacking a skillset.
  • 32:59So we thought of an idea whereby
  • 33:01we could approach device manufacturers
  • 33:03and kind of explain what Stephen explained
  • 33:05about our data science as a service offering and say,
  • 33:09"Hey look, we could form a partnership with you,
  • 33:11whereby as an offering, in addition to extending a warranty
  • 33:15on your device, you could offer custom analysis support
  • 33:19or data science support to any interested customers,
  • 33:22whereby they could use cloud workspaces
  • 33:24to put their data that they're collecting
  • 33:26and then they could work with someone like Phil
  • 33:28to solve a challenge that they might have."
  • 33:31And so we actually successfully
  • 33:33did form such a partnership quite recently.
  • 33:36And if you go to the next slide,
  • 33:38you'll see, so we are now working
  • 33:40with a company called Neurophotometrics.
  • 33:43They produce a device that does the imaging
  • 33:46that Stephen previously described.
  • 33:48And what our partnership involves is we essentially offer
  • 33:53cloud workspaces as a solution to their customers,
  • 33:56whereby when they collect their data,
  • 33:59they can then work on our cloud workspaces alongside Phil
  • 34:02or ourselves and we can work with them
  • 34:03to solve any challenges they might need.
  • 34:06Now who are these customers of Neurophotometrics?
  • 34:08They are a bunch of different labs kind of
  • 34:11all over the world as well.
  • 34:12Mostly academics, some in industry as well.
  • 34:14And so it's that way for us as an organization
  • 34:17to kind of find potential labs
  • 34:20we didn't even know had the challenge.
  • 34:22And then it's also solving the problem
  • 34:25for NeuroPhotometrics of how do you keep your
  • 34:26customers happy if you don't really offer a service
  • 34:29they're already kind of asking of you
  • 34:31as a follow-on for providing this device.
  • 34:33So, so far the partnership is fairly new.
  • 34:37It seems to be working quite well so far
  • 34:40and we're meeting new people
  • 34:41and already getting kind of more projects
  • 34:43like Stephen described for Phil to work on.
  • 34:45So we'll see how it goes.
  • 34:46But this is just one way to show you
  • 34:47that it's not just about kind
  • 34:49of solving a problem for a customer,
  • 34:51it's about where do you find your customers
  • 34:53and that could be through an industry partnership.
  • 34:57<v ->Awesome, thanks for that.</v>
  • 35:02So I mentioned one other model earlier, which is workshops.
  • 35:08I think I talked about that example for a bit.
  • 35:11And we have done a few of them actually as well
  • 35:17in the computational neuroscience space.
  • 35:18So now the space near and dear
  • 35:21to our work with Robert.
  • 35:25So one of those projects was a collaboration
  • 35:28actually Brown University on something
  • 35:31called the Human Neocortical Neurosolver.
  • 35:34We have kind of a neuroscience bias in the company.
  • 35:38We like doing those sorts of things.
  • 35:39So we did a workshop also.
  • 35:44We helped facilitate a workshop
  • 35:46that allowed a software tool
  • 35:49that came out of this particular collaboration to be shown.
  • 35:56And, let me show you a little bit more.
  • 36:00So in this case, I'm actually gonna switch
  • 36:04away from the Human Neocortical Neurosolver
  • 36:05and also show you an example with NetPyNE,
  • 36:07which is the thing that Robert mentioned earlier
  • 36:09that we work with as well.
  • 36:11It's similar to HNN.
  • 36:13In both cases there's a computational model
  • 36:15of a neuron, okay?
  • 36:16Just think of like, you know,
  • 36:18a spatial model of a neuron that has a cell body
  • 36:22and has an axon and dendrite, that kind of thing.
  • 36:25And you wanna simulate something about it.
  • 36:28And so you have a specialized piece of software
  • 36:34that knows how to look at the model of a neuron,
  • 36:38the way that it's shaped
  • 36:40and how to get signals out of it basically, right?
  • 36:44So in collaboration with NetPyNE also a software platform
  • 36:50called Open Source Brain at UCL
  • 36:52that we've been partnering with for a while.
  • 36:54You might have something that looks like this.
  • 36:58So what you can do in a workshop context
  • 37:03with something like a workspace that's really exciting,
  • 37:05as I mentioned to you before is have people
  • 37:07put hands on with the software itself.
  • 37:09And this is one of those pictures
  • 37:11from one of those workshop that we did,
  • 37:14I think this one was specifically NetPyNE
  • 37:16where you can kind of see what everybody's looking at.
  • 37:18So everybody brought laptops in, right?
  • 37:20And they're able to launch in this case
  • 37:23they're literally, you can see several of 'em,
  • 37:25like this one up in front and this one over here,
  • 37:27they literally have exactly the same screen up
  • 37:29that is being shown, you know, in the screen share,
  • 37:33not because they're logged into a Zoom,
  • 37:34but 'cause they're actually logged into essentially
  • 37:37a workspace environment where they can also like, you know,
  • 37:40change parameters around.
  • 37:41So you can get this hands-on tutorial effect
  • 37:43in a workshop, in this context.
  • 37:46That is kind of hard to do any other way
  • 37:50if you don't have that.
  • 37:53If it's deployed as web-based software,
  • 37:55that makes it a little bit easier.
  • 37:56But if it's not, you know,
  • 37:57if it's something that's traditionally supposed
  • 37:59to be on a desktop,
  • 37:59then this is kind of the only way to do something like that.
  • 38:03And this was at a academic conference,
  • 38:06I think CNS that gets held.
  • 38:09So yeah, from all that today then
  • 38:15kind of wrapping up the part where I just,
  • 38:17we just talk at you and I hope those questions
  • 38:20that you guys have, what do we sort of talk about today?
  • 38:23Like how can some cloud-based data science tools
  • 38:26help enhance the ability to do biostatistics
  • 38:29health informatics research?
  • 38:31I've been, you know, leaning on some examples
  • 38:32that are heavily neuroscience based,
  • 38:34but we kind of think that that's not the thing
  • 38:36that's particular to this, right?
  • 38:37It's still, you know, as I started at the beginning,
  • 38:40you know, doing some analysis, you know,
  • 38:42sharing the results of the commands
  • 38:45that we're using in the analysis
  • 38:47and then sharing the output of that analysis, right?
  • 38:48Like that's where we began.
  • 38:50I think that's common to every technique.
  • 38:51We're bringing some kind of science and math
  • 38:53to bear on some data, right?
  • 38:55So what we're finding is that, you know,
  • 38:57by using cloud-based platforms
  • 38:59really can help us facilitate collaborative research,
  • 39:02allowing colleagues to share data and work together.
  • 39:05You can help labs efficiently gain access
  • 39:08to additional data science support if that's desirable.
  • 39:10That they, you know, otherwise might struggle to get
  • 39:14or is just kind of unaffordable.
  • 39:15Doesn't make sense 'cause there's too much of a person.
  • 39:19And then finally in the last example, right,
  • 39:21you can facilitate, you know,
  • 39:23distance workshops that allow much more immediate
  • 39:26hands-on experience with certain software.
  • 39:29So with all that, I will thank you all for listening
  • 39:36to us for a full 40 minutes
  • 39:38and happy to take any questions that you have on this
  • 39:41or any other thing I can help directly.
  • 39:44Thank you very much.
  • 39:46<v ->Thank you so much.</v>
  • 39:50Does anybody have any questions for our presenters?
  • 39:57I'll start if there's no questions.
  • 40:01So data science is a service growth industry.
  • 40:07People want jobs.
  • 40:10What's your take on the industry on that?
  • 40:13<v ->We are about 18 months into our exploration of the market.</v>
  • 40:22We have seen growth so far.
  • 40:25We think there's more to go.
  • 40:28I showed you those five labs,
  • 40:30I think in total maybe served certainly more than a dozen,
  • 40:35I wanna say maybe like 15 and like labs plus companies or so
  • 40:3815, 16, in those 18 months.
  • 40:43We had to figure out lots of other stuff along the way.
  • 40:45But we think there's a need, you know, like I mentioned
  • 40:52and folks that have the skillset to, you know,
  • 40:56provide that data science service
  • 40:58that are continually in demand.
  • 41:00So I'm gonna say yes, it's growing.
  • 41:04We're always wondering in industry how fast, you know,
  • 41:08that's always the question,
  • 41:10but it's definitely not shrinking.
  • 41:13<v Robert>Alright, that's an exciting option.</v>
  • 41:18<v Participant>Yeah just really quick,</v>
  • 41:20what happens with authorship?
  • 41:22If you work with the lab very closely on a project,
  • 41:26they come out with a really good publication.
  • 41:31How do you deal with that in this industry?
  • 41:36<v ->Yeah, great question. Thank you.</v>
  • 41:40So as a company,
  • 41:44we don't require to have our data scientists listed
  • 41:51as co-authors on papers.
  • 41:55I think from an ethical perspective
  • 42:02in the case where the contribution that the data scientist
  • 42:05has made are very significant
  • 42:09you know, sometimes PIs have asked the question to us,
  • 42:13you know, what sort of acknowledgement
  • 42:15would you like of the data scientist?
  • 42:18And if the PI feels that, say, you know,
  • 42:21someone who has a PhD who works with us
  • 42:23has done enough work that it merits authorship,
  • 42:27they're free to add that person.
  • 42:28We don't require that.
  • 42:30Otherwise, you know, an acknowledgements nice always right?
  • 42:33But also not required.
  • 42:37I think, you know, sometimes the nature
  • 42:40of the contribution really matters.
  • 42:42So, you know, as a company it's a little bit
  • 42:47like how much do you acknowledge
  • 42:49the vendor of your microscope, right?
  • 42:53You might say, okay, I did this on a Nikon microscope
  • 42:56or you know, but you might write that more
  • 42:58as a method section.
  • 42:59And then if like a technician came out
  • 43:00and like helped you calibrate it,
  • 43:02you're probably not gonna give
  • 43:03that person an authorship either.
  • 43:05But you might acknowledge them if they did extensive help
  • 43:07that like led to some novel process.
  • 43:10So on the whole, it's a case by case conversation
  • 43:15that scales based on the level of the contribution,
  • 43:17but it's not the first thing that we think of.
  • 43:19It's not like, "Hey, because we did anything for you,
  • 43:21please put us on a paper."
  • 43:23Definitely don't do it that way.
  • 43:24It's more the opposite, which is like, you know,
  • 43:27we're gonna do a thing for you.
  • 43:28Probably, you don't need to cite us.
  • 43:30But if it gets up to a certain point
  • 43:33and we kind of mutually agree that that's appropriate,
  • 43:35then we're happy to discuss that.
  • 43:41<v ->Thank you for sharing Stephen.</v>
  • 43:42So I have a quick question too.
  • 43:44So if you're running on data sets,
  • 43:47one cell may take really long time to run,
  • 43:50then how do you solve the concurrency issue?
  • 43:53Let's say there's multiple people collaborating online
  • 43:56that when the cell is running,
  • 44:00what if some other, another party just clicked stop
  • 44:04or doing something random?
  • 44:06How do you solve the issue that people are on the same page
  • 44:08when something takes really long time to run?
  • 44:13<v ->Yeah, great question.</v>
  • 44:14So a few ways,
  • 44:18one nice thing about a cloud workspace is that
  • 44:22we can expand the number of processors
  • 44:25and the amount of memory kind of
  • 44:28behind the scenes transparently.
  • 44:31So basically you can like log out of the workspace
  • 44:35and in five minutes log back into the workspace
  • 44:38and we've like doubled the processing speed
  • 44:40and like doubled the memory.
  • 44:42So we tend to keep our default instance
  • 44:45at like a reasonable like laptop,
  • 44:47like probably not a high end.
  • 44:49And then when we discover cases like what you're talking
  • 44:52about where like, yeah, no, that cell requires a lot
  • 44:56and we kind of know a little bit in advance,
  • 44:57like we're gonna wanna run that a lot, right?
  • 44:59We might do this, which was we might
  • 45:01like just beef it up, right?
  • 45:03And that's cool that we can do that.
  • 45:07And then the question becomes like,
  • 45:10does that need to run, you know, 24/7,
  • 45:12does it need to run every day,
  • 45:13every week, every month right?
  • 45:15We think a little bit about that
  • 45:16because then there's some additional costs on our side.
  • 45:18If you're gonna do it for like an afternoon,
  • 45:20it's like really not, it's not worth making any additional,
  • 45:24you know, requests of somebody.
  • 45:27But there's another part of your question I wanna get at
  • 45:28too, which is like maybe overriding each other, right?
  • 45:33So that can happen.
  • 45:34And that's a little bit like software specific.
  • 45:38So like in a Jupyter Notebook, you could,
  • 45:43if you don't coordinate a little bit with your lab member,
  • 45:45like overwrite something in one cell at one time, right?
  • 45:49The other person didn't notice.
  • 45:50So for that, we have some best practices, you know.
  • 45:54By far the most common, you know, example that we see is,
  • 45:59is like two or fewer people collaborating,
  • 46:01but if it were three or four,
  • 46:03we'd probably recommend that they do a best practice
  • 46:05of like, you know, while you're doing work that's separate
  • 46:08and you're not like talking to each other,
  • 46:10do work on separate copies of the thing, right?
  • 46:13And then come together in a meeting
  • 46:15and like put it back together, right?
  • 46:17Usually is the better practice if you're say,
  • 46:20working on a Jupyter Notebook,
  • 46:22and you know, communicate, you know,
  • 46:25using some other method like a meeting like this.
  • 46:28So yeah so those are the two aspects.
  • 46:30On the one side, if it's computation intensive,
  • 46:32we can make it bigger.
  • 46:33If it's actually about people writing each other,
  • 46:35we recommend some best practices
  • 46:37for communicating outside of the workspace.
  • 46:42<v ->Other questions?</v>
  • 46:47All right, I have one more question.
  • 46:50So like in the old days,
  • 46:53people would buy a nice computer for their lab or maybe a
  • 46:57couple of nice computers and like then everybody
  • 47:00would log in at that and it was a one-time cost, right?
  • 47:05And so how have you found, I don't know,
  • 47:09I mean, so it's a very different model for
  • 47:14both academia industry, wherever that's trying
  • 47:18to transition from this one time cost
  • 47:21where now, you know, you might still be using this computer
  • 47:2410 years later for good and ill
  • 47:29versus sort of this continuous cloud-based thing.
  • 47:34I don't know,
  • 47:35do you have any words of wisdom on this transition?
  • 47:39Because it seems like, you know, you pay
  • 47:42for a cloud computer and if it's on constantly,
  • 47:46it eats up a lot of money.
  • 47:48<v ->Yeah, yeah.</v>
  • 47:49So really good question.
  • 47:53So I think and-
  • 47:54<v ->Lose control of your data also, which to some extent,</v>
  • 47:58like somebody else has your data.
  • 48:00<v ->In theory, yes.</v>
  • 48:02But you know, I think some of this is just like a journey
  • 48:06and a transition that, you know, scientists are making.
  • 48:09Those of us, like yourself,
  • 48:11we're more software engineer minded,
  • 48:13have been comfortable with the idea of say, you know,
  • 48:16like all of our company's data, for example,
  • 48:18is kind of in Google's clouds,
  • 48:21Google's workspace technically.
  • 48:22None of it is sitting under my desk, right?
  • 48:25But we've gotten a level of comfort about data ownership
  • 48:28based on essentially trust and agreements
  • 48:32and our understanding of how certain sections
  • 48:34of disk are like cordoned off, you know, for ourselves
  • 48:38and lying on some of those best practices.
  • 48:40But to get to the heart of your question,
  • 48:44I think the best metaphor is like
  • 48:45buying a house versus renting an apartment, right?
  • 48:48So, you know, going down to Apple
  • 48:51and picking up a laptop or Dell or whatever you wanna use,
  • 48:55right, is that's the buy model.
  • 48:56And we're super comfortable with that.
  • 48:58The cloud model is more the like renting the apartment.
  • 49:01And certainly people make the choice,
  • 49:03you know, not to rent sometimes
  • 49:05because it's like, doesn't work out economically, right?
  • 49:07It's like, "Hey, I'm throwing money away."
  • 49:09Sometimes people throw, right?
  • 49:11But what is the advantage of renting, right?
  • 49:13The advantage of renting is, you know,
  • 49:16if a thing breaks in your rented apartment,
  • 49:17it's not on you to go pay extra money to go fix it.
  • 49:20That's on the person who owns it.
  • 49:21Similarly, if something breaks with your cloud workspace,
  • 49:24you know, you call us and you're like,
  • 49:26"Hey, this thing didn't work,
  • 49:27please fix it, right?"
  • 49:29And then there's this scaling thing, right?
  • 49:31Which is like, if you go back to Apple and you're like,
  • 49:32"Actually can you add like double the CPU
  • 49:37and double the memory?"
  • 49:39They'll be like, yes, you can pay us for that,
  • 49:41but it's gonna take a while, right?
  • 49:43And it's not gonna happen flexibly and scalably.
  • 49:44So I think it fits into a different space, right?
  • 49:48Obviously these two come together,
  • 49:50I'm talking to you on a physical laptop that I own, right?
  • 49:52But I'm also using cloud instances to do things.
  • 49:56So I think it's like, it fits into this niche where like,
  • 50:00actually the most useful computer for this purpose,
  • 50:03this collaborative purpose
  • 50:05is a rented one, right rather than an owned one.
  • 50:08And you know, maybe that means when I'm not using it,
  • 50:11I'm not paying for it at all, basically, right?
  • 50:13Like, if I'm like paused on this collaboration,
  • 50:15then I'm like actually not paying for it at all,
  • 50:17but then I can bring 'em back and six months and start
  • 50:18paying for it again.
  • 50:20So this is what I hope that folks take away is like,
  • 50:22it opens up a lot of new possibilities.
  • 50:24And the ones that we've gotten
  • 50:26are certainly not the only ones.
  • 50:27There's just like lots more
  • 50:28that you can imagine or envision.
  • 50:32But, but yeah, it's a mindset change
  • 50:35and it's one that I think, you know,
  • 50:37requires some adapting, yeah.
  • 50:42<v ->All right. Thank you so much.</v>
  • 50:44<v ->I have a question for you guys</v>
  • 50:45if there's not another question for me.
  • 50:48<v ->There's a question on the screen.</v>
  • 50:51<v ->Sorry, I have a question.</v>
  • 50:54I think piggy-backing off of that question-
  • 50:58<v ->Hi hello. Hi Noelle.</v>
  • 51:00<v ->Actually Hi.</v>
  • 51:02I used to like physical like pieces of data
  • 51:08and like having physical hard drives.
  • 51:10So like what is the security for data that's on the cloud?
  • 51:16<v ->Yeah, so folks like,</v>
  • 51:24we ourselves build these cloud instances
  • 51:30on the back of three major providers,
  • 51:32whose names you'll recognize,
  • 51:33Amazon, Google, and Microsoft okay?
  • 51:37Those are the big three cloud providers
  • 51:40and they make a guarantee to us
  • 51:43and then we make a guarantee to our customers
  • 51:46about the data protection.
  • 51:47So it's kind of like a layer cake.
  • 51:49And the foundation of it begins with, do you trust Amazon?
  • 51:52Do you trust Google? Do you trust Microsoft?
  • 51:53Some people say yes, some people say no,
  • 51:56but fundamentally they are the ones that, you know,
  • 51:59build data centers, right where the physical aspect
  • 52:04of these computers actually live.
  • 52:05So, you know, this virtual computer,
  • 52:07maybe if you go and like,
  • 52:09"Hey, show me the hard drive where this lives."
  • 52:12You're gonna go out to like, I don't know,
  • 52:14Washington State near some power plant basically,
  • 52:18where it's very economical to set this up, right?
  • 52:21So they then guarantee like,
  • 52:25how do you know that that's safe, right?
  • 52:27Well they guarantee that they're following industry
  • 52:30standards to secure those facilities, to lock them down,
  • 52:35to like continually maintain and manage the networks
  • 52:41that are there to patch the servers
  • 52:44that they're using to keep ahead of any security faults.
  • 52:47So there's one layer of this
  • 52:49where we rely on these big providers to do their jobs.
  • 52:52And despite the last 15, 20 years of like hacks
  • 52:57that you've heard about whatnot that happened in industry,
  • 53:00these three providers so far have managed to avoid
  • 53:03being hacked in any major way.
  • 53:05Like you've not heard of like Amazon getting hacked,
  • 53:08Google getting hacked, Microsoft getting hacked.
  • 53:10If tomorrow Amazon gets hacked, then yeah,
  • 53:13we're all worried okay?
  • 53:14And then we probably would need to shift around.
  • 53:16But so there's a fundamental guarantee
  • 53:19that like all cloud kind of relies on
  • 53:21and it's like good to talk about it
  • 53:23because like we all have to kind of trust these,
  • 53:27you know, these large providers.
  • 53:29But they also invest,
  • 53:31I'd say millions or hundreds of millions of dollars
  • 53:34in computer security.
  • 53:35Like if you're in the field of computer security,
  • 53:38like, you know these guys because they are sort
  • 53:41of world leaders in this sort of thing.
  • 53:44Microsoft, you know, notably was involved in doing some
  • 53:48forensic analysis on like Russian hacking back in 2016.
  • 53:52Like they were some of the first people to notice
  • 53:55that a state actor like Russia was on the scene
  • 53:58doing the various things, taking over computers.
  • 54:00So generally the community of software engineers
  • 54:05that do cloud work know these things
  • 54:07and kind of rely on Google, Amazon, and Microsoft
  • 54:11to like make these investments in computer security.
  • 54:14And notably like, I don't go like set up my own data center
  • 54:18because I know that I would have to invest millions
  • 54:21of dollars in having an equivalently good computer security
  • 54:25team to like watch out for Russia,
  • 54:27who by the way also invests hundreds of millions of dollars
  • 54:30to try to hack these things.
  • 54:31So, the world of computer security is a problem.
  • 54:35So there's that level of trust, okay?
  • 54:37And then on top of that, you have to trust one more level,
  • 54:39which is the group that like sets up the workspace.
  • 54:41So you kinda have to trust, like if it's from us,
  • 54:43you have to kind of trust us that we're not screwing
  • 54:45something up on top of all of those protections
  • 54:48'cause it is possible to do that at the level of like,
  • 54:51you know, Jupyter Notebook that our logins are well used.
  • 54:55So we also invest in using industry standard
  • 54:59like login protocols, so that only the people that we say
  • 55:02can log in can log in, right?
  • 55:04There's a layer of software security there that, you know,
  • 55:07we have to be on top of patching at one level also.
  • 55:11So these are all the things that make that secure.
  • 55:13And the last thing would be like,
  • 55:15do you or don't you trust us to like not to,
  • 55:18to not go in and do something nefarious with your data
  • 55:21even though we're the only ones that can control it.
  • 55:23So you trust that nobody else can get into it,
  • 55:25but do you trust us?
  • 55:26And then that becomes,
  • 55:27yeah a question of like, you know,
  • 55:29going back and checking your references, you know,
  • 55:32talking to other PIs, making sure that something nefarious
  • 55:35hasn't happened, you know, there.
  • 55:37And you probably wanna gain some confidence on that.
  • 55:39But what we've found is that organizations
  • 55:42are getting more and more comfortable with that.
  • 55:43Dropbox is a publicly traded company,
  • 55:46lots of people put stuff on Dropbox.
  • 55:48When you put something on Dropbox,
  • 55:49you're essentially trusting Dropbox.
  • 55:51Dropbox is also built on one of these
  • 55:53three providers same way, right?
  • 55:55So it's that kind of idea
  • 55:57that takes some getting used to but you know,
  • 56:01becomes increasingly useful to do this kind of work on.
  • 56:05And we see large banks and large pharma companies
  • 56:07having taken their time to also adopt cloud
  • 56:10large financial institutions.
  • 56:13But over time there's been increasing comfort
  • 56:15as some of these security questions
  • 56:17have been, you know, asked and answered.
  • 56:20So bit of a long answer,
  • 56:22but thank you for the question 'cause it's important.
  • 56:27<v ->Alright, thanks so much.</v>
  • 56:28In the interest of time,
  • 56:29I think we're gonna have to stop it here, thanks again.
  • 56:32Really appreciate. (audio garbles)
  • 56:37<v ->Thank you guys. Thank you all for your time.</v>
  • 56:40<v ->Have a great day.</v>