Advancements in Molecular Dynamics: Introducing the mwSuMD Sampling Method

The SuMD, or Sampling under Molecular Dynamics, is an innovative adaptive sampling technique detailed by researchers Deganutti and Moro in 2017. This method aims to enhance the simulation of binding events that occur between small molecules or peptides and proteins, as explored in works by Salmaso et al. (2017), Bower et al. (2018), Cuzzolin et al. (2016), and Sabbadin and Moro (2014). A significant advantage of the SuMD approach is that it operates without introducing any energetic bias, ensuring that the simulations remain unbiased and more reflective of real-world interactions.
At its core, SuMD involves performing a series of short, unbiased molecular dynamics (MD) simulations. After each simulation, researchers analyze the distances between the centers of massor the geometrical centersof the ligand and the predicted binding site. This analysis is done at regular intervals and the results are fitted to a linear function. If the slope of this linear function is negative, indicating progress toward the desired target, the next simulation step uses the last set of coordinates and velocities. Otherwise, if no progress is detected, the simulation restarts with the atomic velocities being randomly assigned.
Building upon the foundation of SuMD, the mwSuMD, or multi-walker SuMD, introduces a more sophisticated method aimed at increasing sampling from specific configurations. This is achieved by seeding parallel replicas, referred to as 'walkers', as opposed to relying on a single short simulation. This allows mwSuMD to provide greater control over the total wall-clock time utilized for a simulation, as it considers one replica for each batch of walkers to be productive. However, its worth noting that to maximize the effectiveness of mwSuMD, it is optimal to assign one walker per GPU. This requirement means that multiple GPUs may be necessary to achieve significant results. Yet, even modern multi-threaded GPUs can implement mwSuMD, albeit with a lesser impact on performance.
In the implementation of mwSuMD for ACEMD, several inputs are required: the initial coordinates of the system in a PDB file, the coordinates and atomic velocities from the equilibration stage of the simulation, the topology file of the system, and the necessary force field parameters. Users can choose to supervise one or two metrics of the simulated system during short simulations seeded in batches. In cases where a single metric is monitored, either the slope of the linear function that interpolates the metric values or a specific score can be used to determine whether to continue the mwSuMD simulation. For scenarios that involve two metrics, a distinct score is employed, allowing for more comprehensive monitoring. Commonly supervised metrics include distances between centroids, root-mean-square deviations (RMSDs), or the number of atomic contacts between two selections.
Choosing the appropriate metrics is crucial and often depends on the specific system and problem at hand. For instance, RMSD is particularly useful when the final state is known, while the distance metric is essential when the target state remains unidentified. The decision to either restart or continue mwSuMD after completing any short simulation is postponed until all walkers within a batch are collected. At this point, the best short simulation is selected and extended, seeding an equal number of walkers with the same duration as the previous step.
For each walker, the score for the supervision of a single metric, termed SMscore, is calculated using the square root of the product between the metric value in the last frame and the average metric value across the short simulation. When the monitored metric is expected to decreasesuch as in scenarios involving binding or dimerizationthe walker with the lowest SMscore is continued. Conversely, if the metric is anticipated to increase, the walker with the highest score is selected for continuation. This approach emphasizes the importance of the final state in each short simulation, as it serves as the starting point for the ensuing batch of simulations.
In cases where both monitored metrics are set to increase during mwSuMD simulations, the score for overseeing two metrics, referred to as DMscore, is computed using a specified formula. This involves the metrics values in the last frame and the average values over all walkers in the batch. Notably, if either of the two metrics decreases, the corresponding component in the equation is multiplied by negative one to maintain a positive score. This scoring system is designed to ensure some degree of independence between the two metrics being supervised. For optimal results, it is preferable that both metrics exhibit similar variations over time.
It's worth noting that unlike the SuMD approach, when a walker is extended by seeding a new batch of short simulations, the remaining walkers do not have their atomic velocities reassigned. This unique aspect allows for simulations to be conducted over very short time framespotentially just a few picosecondswithout introducing artifacts that typically arise due to thermostat latency, which can take 10 to 20 picoseconds to stabilize when simulations are restarted and atomic velocities are reassigned.
The current implementation of mwSuMD is developed for Python 3 and utilizes several advanced modules, including MDAnalysis and MDTRaj, which facilitate efficient manipulation and analysis of molecular dynamics data.