-
Notifications
You must be signed in to change notification settings - Fork 761
Add RMSFResidue analysis class for per-residue RMSF computation #5176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5176 +/- ##
========================================
Coverage 92.72% 92.73%
========================================
Files 180 181 +1
Lines 22472 22504 +32
Branches 3188 3191 +3
========================================
+ Hits 20837 20868 +31
Misses 1177 1177
- Partials 458 459 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi! Whenever you get a moment, I’d appreciate a review. |
orbeckst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please cite a handful of papers (~5) that require this kind of analysis?
|
Thanks for the question. Per-residue flexibility analysis (often reported as per-residue RMSF or equivalent fluctuation profiles) is widely used in MD studies to interpret protein dynamics and function. A few representative examples: Karplus & McCammon (2002), Nature Structural Biology – foundational review establishing the importance of internal motions and residue-level flexibility in biomolecular simulations. Hollingsworth & Dror (2018), Neuron – discusses analysis of MD trajectories using residue-wise flexibility profiles to relate dynamics to function. Hospital et al. (2015), Advances and Applications in Bioinformatics and Chemistry – highlights ensemble-based analysis and residue-level dynamic properties derived from MD simulations. Grant et al. (2006), Bioinformatics – Bio3D toolkit explicitly performs per-residue fluctuation analyses, demonstrating this as a standard and useful abstraction. These works illustrate that residue-level RMSF (or closely related metrics) is a common and meaningful analysis output. The goal of this PR is to make this standard analysis directly accessible within MDAnalysis, complementing the existing atom-level RMSF. |
|
@Nitin-Prata please take no offense in my question, but the way your response is written makes me think that you are using an LLM. Can you please confirm if you are using one? |
|
@IAlibay I have performed a literature search independently and chose these papers based on my understanding of the use of per-residue RMSF in MD analysis. I do use tooling to help with wording the response, but the references and technical content are mine. |
|
The first 3 papers you cite are review papers that broadly describe molecular dynamics but, as far as I can tell, offer no details about this specific per residue RMSF method and its applications. Re Bio3D, whilst I am aware that Bio3D offers a groupby feature, the paper you link to also does not describe such an approach. |
|
At this moment, before we proceed, I would ask you to try to provide specific example application cases. |
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking until more info is available.
|
Amadei et al Essential dynamics of proteins, Proteins (1993). In the Methods (page 3), the authors define the covariance matrix of atomic positions as the time-average of squared deviations from the mean coordinates This is precisely the fluctuation in atomic positions from which the RMSF is calculated RMSF = sqrt(⟨(x − ⟨x⟩)²⟩). These calculated fluctuations are then interpreted at the level of specific residues in the Results page 8 Whereas the catalytic site residues Glu-35 and Asp-52 are rigid residues involved in substrate binding namely 59, 62, 63, 101, and 107,show extensive flexibility. The following will be a direct example of residue-level position fluctuation analysis derived from MD trajectories |
|
I am replying to a comment that is not present anymore
to make very clear why I asked my original question and to assert why it is important: Maintaining MDAnalysis is difficult and very labor-intensive. Any piece of code we add makes future maintenance more difficult — this is called technical debt. Therefore, we have to carefully weigh advantages to our users vs the technical debt that we are incurring. If a newly proposed featured is not something that our users would likely want to see then we will not include it. We appreciate that you showed the initiative and started with code to discuss this feature. It is, however, more common that people first raise an issue for a desired feature where we can discuss the need for it. If we then come to the conclusion that we do not want the feature implemented then nobody has spent time coding. With this PR we are at the stage where we are trying to assess if this is a feature that we want to include and we are asking you to convince us that there is enough interest (e.g., because it's a widely used method). |
|
@orbeckst Thank you for clarifying. that puts everything in perspective. I see your point concerning technical debt and concur with your approach to prioritize incorporating new features when there is a demand from consumers. To clarify my intent: RMSFResidue is not intended to offer another way of doing analysis but rather a small convenience wrapper over a pattern which a set of users already seemed to have used before in a manner based on atom-level RMSF. I think this is a topic that would better be discussed in an issue or a discussion before code is written. If it’s your preference to go first and get some feedback in the community before getting back to this topic, I am totally in favor of whatever this project wants to follow. |
This PR adds a new analysis module
RMSFResiduethat computesRoot Mean Square Fluctuations (RMSF) on a per-residue basis.
Motivation
MDAnalysis currently provides atom-level RMSF via
RMSF. This PR adds asmall convenience analysis that aggregates RMSF values at the residue
level, following the existing RMSF API. This does not introduce a new
method, but provides a commonly requested way to summarize RMSF results
at the residue level.
Implementation
RMSFResidueinMDAnalysis.analysis.rmsf_residue._conclude().results.residue_rmsf.Tests
Added:
testsuite/MDAnalysisTests/analysis/test_rmsf_residue.pyEnsures:
Notes
The implementation avoids
groupby("residues")due to current API limitations, and instead manually groups atoms byresid.Status
All tests pass locally.
📚 Documentation preview 📚: https://mdanalysis--5176.org.readthedocs.build/en/5176/