You can also read a more full explanation in my project report ... but I won't upload it here. Drop me an email if you're that interested!
Why protein interaction networks?
If all of the functions of a cell are carried out by proteins, then intuitively, all of the information about what that cell can do and how it reacts to different situations should be encoded in these proteins and the way they interact with each other. Through various experiments, we can produce a network of all of the protein-protein interactions that we know about. You can then perform community detection on this network, which is an algorithm that splits up nodes into likely groups based on how much they interact with each other compared to everything else.
However, there are two problems with this approach. Firstly, the interaction network represents all of the interactions between proteins, but not all proteins are present in every cell. In fact, these interactions may not occur all the time in the same cell, or at the same strength. The network entirely misses the biological context. Secondly, the techniques that we use to discover interactions are not perfect, leaving us with approximately a 50% false positive rate for edges on the network... and a 50% false negative rate, too. Normal community detection is not robust to this kind of noise.
Gene expression data
The flip side of the coin is to analyse dynamic changes in gene expression by cells after applying some stimulus, like starving them of amino acids. What you end up with is a time course of the expression of all the cells, which you can compare between a perturbed and a control system. However, this only allows you to make binary comparisons, and some of the relationships between pairs of proteins will be weak. This approach is vulnerable to false positives and negatives.
Taking the best of both worlds
In my project, I investigated ways to combine these approaches. I calculated the correlation between the expression pairs of genes and compared genes that interact or not. I then performed community detection on the protein interaction network and compared the correlation of pairs of genes in the same community or not. What resulted was the figure below - which was encouraging!
What about the function of clusters of proteins?
This brings us back to the whole point of this project. We used correlation to try and apply a score to different communities on the network, and looked at how these scores changed between different expression sets. For example, I looked between a control set of expression data in yeast and a set where the yeast was suffering a heat shock. Through this, certain communities gained a higher score... which I hoped meant they had some function to do with heat shock. I ended up with a list of communities with scores that change quite a bit between various conditions. I used functional annotations by some kind hearted experimentalists to try and validate these scores, as well as ``borrowing'' some code from someone else in the group.
Can I steal your research?
No thanks, unless you're working with Charlotte Deane
! I was going to put some more pretty plots here, but then realised I would perhaps be giving away a bit too
much. However, said pretty plots are in my report - so drop me an email if you'd like a look!
All the code I produced for this project is kept in a repository here
. If you'd like access, send a request or drop me an email.