Discovery of potent inhibitors of α-synuclein aggregation using structure-based iterative learning
Components of the machine learning method
The machine learning approach used here consists of three main components27: (1) the experimental data, which is a readout of the potency of the compounds in an aggregation assay, (2) the variational autoencoder required to represent the compounds as latent vectors, and (3) a model for training and prediction using these vectors and the assay readouts.
For component 1, we used a chemical kinetics assay9,28,29 that provided both the initial data for the model training and the data that were iteratively fed back into the model at each cycle of testing and prediction. This assay identifies the top compounds that inhibit the surface-catalyzed secondary nucleation step in the aggregation of αS. Secondary nucleation is enabled by adding a small amount of preformed fibrils to a monomeric mixture. Aggregation was tracked using the amyloid binding dye, thioflavin T (ThT).
For component 2, we used a junction tree variational autoencoder30, pretrained on a set of 250,000 molecules31 enabling accurate representation of a diverse population of molecular structures. Using this approach, SMILES strings were standardized using MolVS32 and converted into latent vector representations.
For component 3, we used a random forest regressor (RFR) with a Gaussian process regressor (GPR) fitted to the residuals33,34 of the RFR, with both regressors using the latent vectors as training features. The RFR provided the highest performance compared to other combinations of multilayer perceptrons (MLPs), GPRs and linear regressors (LRs) in terms of R2 score, mean absolute error and root mean square error. Performance and parameters are shown in Supplementary Fig. 1 and Supplementary Table 1, respectively. Combining the RFR and GPR provided only a marginal improvement in the metrics of the RFR alone, but crucially enabled leveraging of the associated uncertainty measure of the GPR when ranking molecules during acquirement prioritization27. Tuning the weighting applied to this uncertainty measure allowed a ranking based on both the predicted potency of the molecules and the uncertainty of that prediction. Component 3 was then trained on the 161 initial experimental data points (see below). The best molecules predicted by the model were then tested in the same assay and the results fed back into the model in an iterative fashion (~55–65 new molecules tested at each iteration). The molecules used at each stage of the project are illustrated in Supplementary Fig. 2, together with the structures of the most potent hits and leads at each stage. An overview of the pipeline is shown in Fig. 1.
Initial set of small molecules
The initial set of molecules was identified via docking simulations to αS fibrils (Supplementary Information), followed by similarity searches around molecules that performed well in the chemical kinetics assay to identify further candidates23. The docking screening was carried out using the consensus strong binders predicted by AutoDock Vina35 and Openeye’s FRED36,37,38 software.
Two million molecules with optimal central nervous system multiparameter optimization (CNS MPO)39 properties were previously docked using AutoDock Vina to target the selected binding pocket23 (Supplementary Fig. 3). CNS MPO is an aggregated metric of molecular properties that predicts likelihood of a molecule passing the blood–brain barrier. In that study, the binding site encompassing residues His50–Lys58 and Thr72–Val77 was selected due to its propensity to form a pocket according to the Fpocket software37 (Supplementary Fig. 3a), and its mid to low solubility according to CamSol40 (Supplementary Fig. 3b). Additionally, His50 is predicted to be protonated below the pH value (5.8) at which αS secondary nucleation more readily occurs41, which may be important for initial interactions. To increase the confidence of the calculations, the top-scoring 100,000 small molecules were selected and docked against the same αS binding site, using FRED36. The top-scoring, common 10,000 compounds in both docking protocols were selected and clustered using Tanimoto clustering42 with a similarity cutoff of 0.75, leading to a list of 79 centroids (representative molecules from each cluster). The Tanimoto similarity is a metric that compares Morgan fingerprint43 representations (radius 2, nbits 2,048) of two different molecules. A value of 1 for the Tanimoto similarity implies complete two-dimensional homology between two structures, while values closer to 0 imply little to no structural similarity. Sixty-eight compounds were available of the 79 molecules identified in the in silico structure-based docking study. The first round of in vitro experiments was carried out with this set.
Subsequent experiments to test these predicted binders in aggregation assays identified four active compounds23 labeled molecule 48, 52, 68 and 69, referred to as the ‘docking set’, (Fig. 1a). We then began the process of lead generation and optimization. Here, using the Tanimoto similarity metric between Morgan Fingerprint representations (radius 2, nbits 2,048) of the molecules, two similarity searches were then carried out on the ZINC15 database using these four structures as starting points (Fig. 1b). Different Tanimoto similarity thresholds were used to specify molecule subsets for testing. As such a similarity value >0.5 was used for closely related analogs, >0.4 for loosely related analogs and >0.3 for the library to screen from (‘evaluation set’). While this use of a structurally related screening library constrains the model’s ability to generalize, the lack of diversity in terms of potent molecules in the training set also makes it unlikely for the model to perform well in chemical space divergent from this region. We are thus carrying out an exploitation strategy here. We remove the need for a curated screening library in a parallel work by utilizing generative modeling and reinforcement learning44, allowing for both exploitation and exploration strategies.
A selection of closely related molecules (Tanimoto similarity >0.5) to the parent compounds (referred to as the ‘close similarity docking set’, Fig. 1b and Supplementary Fig. 2b) was tested in the aggregation assay. The potent molecule selection was made according to a cutoff corresponding to a normalized half-time of the aggregation (t1/2) of two times that of the negative control. The percentage of molecules passing this threshold was defined as the optimization rate. This yielded five new potent molecules from 25 new molecules (Supplementary Fig. 2b), 1 derived from molecule 48, three from molecule 52 and one from molecule 69. This step was then followed by a larger selection of compounds with a looser cutoff of structural similarity (Tanimoto similarity >0.4) to the parent compounds (referred to as the ‘loose similarity docking set’, Fig. 1b). Although new potent molecules featured among this set, the optimization rate was low (4%), and both molecules 48 and 52, which had initially appeared the most promising of the parent structures, yielded poor results. From the 29 molecules related to molecule 48 in the loose similarity docking set, none were potent, while from the 24 molecules related to molecule 52, only 2 were potent. The functional range of molecules 48 and 52 appeared narrowly limited around the chemical space of the parent structures. Molecule 69 yielded one potent molecule from 16 molecules. Overall, the optimization rate from the loose similarity docking set was less than a quarter of that of the close similarity docking set and involved testing three times as many compounds.
These results suggested that it would be challenging to further explore the chemical space using conventional structure–activity relationship techniques without considerable attrition, since the optimization rate worsened as the similarity constraint to the initial hits was loosened. To overcome this problem, the compounds resulting from these experiments were then used as input for a machine learning method for an iterative exploration of the chemical space (Fig. 1c). The similarity searches removed the most obvious targets of the machine learning approach, but also increased the size of the dataset available for training. The training set, however, remained small by typical machine learning standards, consisting of 161 molecules. Since training sets of this size are common in early-stage research, a further aim of this work was to demonstrate that machine learning can be used effectively even in such data-sparse scenarios.
Iterative application of the machine learning approach
One of the issues with applying machine learning to a data-sparse scenario is that predictions are likely to be overconfident. While this problem can be addressed to an extent by utilizing Gaussian processes, a complementary strategy is to restrict the search area to a region of chemical space that is more likely to yield successful results. To this end, a structural similarity search of the four hit molecules in the docking set was carried out on the ‘clean’ and ‘in stock’ subset of the ZINC15 database, comprising ~6 million molecules. Any molecules showing a Tanimoto similarity value of >0.3 to any of the four structures of interest was included. This low threshold for Tanimoto similarity was intended to narrow the search space but without being overly restrictive of the available chemical landscape, yielding a dataset of ~9,000 compounds that composed the prospective ‘evaluation set’. The distribution of this evaluation set in terms of the predicting binding energies is shown in Supplementary Fig. 4a.
Different machine learning models were initially trialed against the docking scores calculated for the evaluation set as a test of the project feasibility, and these models were then tuned on the much smaller aggregation dataset. The best-performing setup, the RFR–GPR stacked model, was then trained on the whole aggregation dataset and used to predict the top set of molecules (see ‘Machine learning implementation’ section in Supplementary Information, and Supplementary Figs. 1, 5 and 6). For this work, the t1/2 for the light seeding assay was used as the metric of potency to be used in machine learning because of its robustness. For comparison, the amplification rate is more susceptible to small fluctuations in the slope of the aggregation fluorescence trace23 (Supplementary Fig. 7). Molecules that achieved a t1/2 twofold greater than that of the negative control under standard assay conditions (Methods) were classed as potent45. The algorithm was run repeatedly from different random starting states and those molecules that appeared in the top 100 ranked molecules more than 50% of the time (64 molecules) were chosen for purchase (first iteration). In this first iteration, there was an inherent bias toward the structure of molecule 69 in the dataset given the relative population sizes (Supplementary Fig. 2a), but with the caveat that many of these structures were only loosely related to the parent (Tanimoto similarity
The dynamic range within the aggregation dataset in terms of potency was large, in that a majority of the molecules had no effect on aggregation, while initial docking hits exhibited relative t1/2 of up to four to five times that of the negative control (limited by the length of the experimental run) at 25 μM. Molecules then found via machine learning produced a relative t1/2 of ~4–5 at up to eightfold lower concentration (3.12 μM, 0.3:1 molecule:protein) than that carried out in the initial screening (25 μM, 2.5:1 molecule:protein). This compares favorably with previous molecular matter tested in a less aggressive seeded aggregation assay such as the flavone derivatives, apigenin, baicalein, scutellarein and morin, which achieved relative t1/2 of 1–2 at a stoichiometry of 0.5:1 molecule:protein9. Anle-138b12 is another example of a well-characterized small molecule inhibitor, which was also taken into clinical trials, whose relative t1/2 is 1.22 (Fig. 2) at a ratio of 2.5:1 molecule:protein in the assay used in this work, which is lower than any of the molecules discovered using the strategy employed here.
After the first iteration, the compound data were pooled together to extend the training set and a further two iterations were carried out with the updated model, adding the resultant data to the training set at each iteration. This was followed by a fourth and final iteration trained on low dose (3.12 μM) data of all the previously obtained molecules. Example kinetic traces for a molecule from the fourth iteration are shown in Fig. 2a. The molecules are labeled according to iteration number and lead identifier within that iteration. For example I4.05 is the fifth potent lead (05) within iteration 4 (I4). The dose-dependent potency in the aggregation assay was investigated (Fig. 2a and Supplementary Fig. 8) with all potent lead molecules exhibiting substoichiometric potency. For comparison, Anle-138b is also shown.
Figure 2b shows an approximate overall rate of aggregation at different concentrations of I4.05, Anle-138b and the parent molecule. This approximate rate was taken as 1/t1/2, and fitted to a Hill slope. A kinetic inhibitory constant (KIC50) was then derived. This is the concentration of molecule at which the t1/2 is increased by 50% with respect to the control, as defined previously45. The KIC50 values for the leads were in the range of 0.5–5 μM, which compare favorably with the parent of the lead molecules (molecule 69) and Anle-138b which have extrapolated KIC50 values of 18.2 μM and 36.4 μM, respectively. I4.05 had a KIC50 value of 0.52 µM with 95% confidence limits of 0.45 µM and 0.59 µM.
The elongation rate was largely unaffected in the presence of molecules at any concentration (Fig. 2c). This was expected given the designed mechanism of action of the small molecules. It was also reassuring, since compounds that inhibit elongation may increase the population of oligomers45, which are considered the most damaging of the aggregate species in vivo7,8. Then, using the amplification and elongation rates derived from Fig. 2a,c, the oligomer population over time was calculated9 (Methods). These calculations are shown in Fig. 2d for I4.05 and Supplementary Fig. 8 for the rest of the leads. All potent leads demonstrated a dose-dependent delay and reduction of the oligomer peak. Across all metrics, I4.05 performed better than Anle-138b and the parent molecule at substoichiometric ratios, as do all of the leads obtained in previous iterations (Supplementary Figs. 8 and 9).
The aggregation data from the first three iterations are also shown in Fig. 3a. Of the 64 molecules from iteration 1, 8 were potent, representing an optimization rate of 12.5%, the second iteration showed a further increase, with 11 potent molecules, representing a 17.2% optimization rate, and the third iteration, with 12 potent molecules, had an optimization rate of 21.4%. These optimization rates represent an order of magnitude improvement over high-throughput screening hit rates (46 and, remarkably, an overall 40% improvement over the combined similarity search optimization rates, which removed the most likely lead candidates. The potency of the machine learning leads was also higher on average than those identified by the similarity searches (Supplementary Fig. 10a), without compromising the CNS-MPO scores (Supplementary Fig. 10b). The flow of molecules derived from each parent in terms of positives and negatives over the course of the project is illustrated in Fig. 3b. The accumulated training data from all stages of the project for all molecules in terms of half-time distribution is shown in Supplementary Fig. 4b,c.
Given that αS aggregation and toxicity has also been linked to membrane interactions7,47 a parallel investigation was carried out with a lipid-induced aggregation assay (Supplementary Fig. 11), which was used as a validation of the molecules rather than for machine learning optimization. The tested lead molecules also showed strong efficacy in this assay. A further test of these molecules in a spontaneous αS aggregation assay, without induction via pre-seeding or shaking, also exhibited strong potency48.
Analysis of the chemical space explored by machine learning
The chemical space explored by the machine learning approach was inspected via dimensionality reduction techniques, including principal component analysis, t-distributed stochastic neighbor embedding49 and uniform manifold approximation and projection (UMAP)50 (Methods) to investigate how the model was prioritizing molecules (Supplementary Fig. 12). The relative positioning of the training points and the parents within the chemical space is shown in Supplementary Fig. 13a. The stacked RFR–GPR model assigned low uncertainty to areas of the chemical space proximal to the observed data, and the corresponding acquirement priority mirrored this when trained on the aggregation data (Supplementary Fig. 13b–d). Supplementary Fig. 13 also illustrates how the uncertainty weighting could be altered during the ranking, depending on how conservative a prediction was required. A drawback to a high uncertainty penalty was that the model remained in the chemical space it was confident in, while a lower uncertainty penalty ensured reasonable confidence of potent lead acquirement while still exploring the chemical space.
The changes in similarity of the potent leads to the parent structures are shown in Supplementary Fig. 14. The similarity of the molecules to their parent structure dropped for all structures at successive stages of the investigation, reaching its lowest point at the iterations of the machine learning approach. The more potent leads mostly retained the central ring and benzene substituent of molecule 69 albeit with the addition of polar groups to the benzene ring, but featured alterations to the rest of the scaffold. For example, from iteration 1, I1.01 replaced the fused ring substructure of molecule 69 with a single substituted benzene ring, while I1.02 replaced it with a substituted furan ring, and subsequent iterations saw more complexity introduced. These changes were reflected in the Tanimoto similarity values, which were at the lower end of what was permitted in the evaluation set, 0.3 being the cutoff. It was evident from this result that parts of the substructure were important to retain for potency, which the model did effectively while also identifying alterations in the rest of the scaffold that enhanced the potency considerably beyond that of the parent.
The observation that component 3, the quantitative structure activity relationship (QSAR) model, converges on the structures from two areas of the UMAP space related to structure 69 was encouraging. It suggested the model was learning useful information and not selecting at random. While we have not tested a random set of molecules due to prohibitive resource cost, we do note that, if a random selection of molecules were taken from the accumulated training data from all stages of the project, its optimization rate (11%) would be lower than that of iterations 1, 2 and 3 on average. Though performance improves with additional data, the QSAR performance in terms of R2 remains modest (Supplementary Fig. 1), but this is in part due to sparsity of training data. We would anticipate improvement if this approach could be implemented at medium scale with correspondingly more complex QSAR models, and we have an indication of this from trials of the this model set up against the docking scores of the evaluation set, where performance in terms of R2 score is threefold higher for a slightly larger dataset (Supplementary Fig. 6).
Next, an investigation was carried out to identify what structural information the latent vectors were encoding. Variational autoencoders are generally not built to ensure that their latent space dimensions are human interpretable, making this a challenge. The decoding of a variational autoencoder is also not deterministic, preventing facile analysis of the feature space based on single perturbation approaches of the input features and observing changes to decoded structures. Instead, hierarchical clustering was carried out on the latent vectors, followed by SHAP51 (Shapley additive explanations) clustering for comparison (Supplementary Fig. 15). While the former differentiated groups based on large changes in any dimension, clustering based on SHAP dimensions ensured that clusters were created only on the basis of features relevant to the prediction problem at hand. Latent space dimensions that have a large range of values had a large effect on the latent space clustering, regardless of whether these dimensions were important predictors of molecular potency. Using SHAP values, on the other hand, meant that latent space dimensions that had little effect on the model prediction were mapped to values close to zero, and therefore had a much smaller influence on the clustering. This resulted in clusters which were relevant to the prediction task. This strategy was suggested by the authors of SHAP and was recently used in the context of identifying subgroups of coronavirus disease 2019 symptoms52.
Supplementary Fig. 15 shows two-dimensional UMAP representations of the tested molecules, with the latent vector clustering indicated by color and the SHAP clustering indicated by shape. From the UMAP representation, we note that the SHAP clustering identified clusters more effectively than the hierarchical clustering. The SHAP values for each feature show the importance of that feature in the interpretation of potency, and this in turn could be used to identify which substructures within the molecules are relevant for potency by observing the structures that recurred in each cluster. For example, Supplementary Fig. 15 shows the top dimensions of each SHAP cluster, revealing that dimension 24 at least partly encoded for the key substructure 3,5-pyrazolidinedione, which was present in every molecule in cluster α and a proportion of cluster β. This confirmed the hypothesis previously put forward30 that, in a junction tree variational autoencoder, the latent space encoding preserved the key features of each molecule. Molecules that were clustered together shared many molecular substructures in common.
Measurement of binding affinity
A series of validation experiments were carried out on the most potent leads from the machine learning iterations. We first tested the binding to fibrils using surface plasmon resonance (SPR; Methods) under different buffer conditions. The results for molecule I4.05 versus Anle-138b are shown in Fig. 4. The proposed mechanism of action is the binding of molecules to the fibrils thereby blocking nucleation sites for further aggregation. Support for this mechanism of action comes from the observations that the molecules function at substoichiometric ratios, discounting monomer interactions, and also show negligible effect on elongation. Covalent interactions can also be discounted, as no mass change is observed of the αS monomer by mass spectrometry. The large effect observed in an assay that isolates secondary nucleation as the dominant mechanism implies that the molecules are specifically affecting this step, and the substoichiometry implies that the molecules must be interacting with the fibrils that are present in nanomolar monomer equivalents at the start of the aggregation.
Proof of binding and evidence for this potential mechanism are shown by SPR in Fig. 4. Figure 4a shows a schematic representation of molecule binding to the binding pocket targeted during the initial docking simulation. Figure 4b shows SPR response curves for a concentration range between 0.3 nM and 1.1 μM of I4.05 binding to immobilized αS fibrils, while Fig. 4c shows the same experiment utilizing Anle-138b from 1.1 μM to 5 μM. The binding was tested under the conditions of the αS secondary nucleation assay (pH 4.8), and also at pH 8, allowing direct comparison to the secondary nucleation conditions of Aβ42, which were tested as a negative control in Fig. 4d. αS is highly charged at neutral pH and has an isoelectric point (pI) of 4.7 (ref. 53). It therefore requires a pH in this region to render the protein uncharged in order to aggregate on an experimentally accessible timescale under quiescent conditions, whereas Aβ42 is highly aggregation prone and requires higher pH to prevent it aggregating too rapidly45. At both pH values, I4.05 exhibited binding to αS fibrils, with kinetic fits giving KD values of 68 nM at the lower pH and 13 nM at the higher pH. The data for Anle-138b showed no response for pH 4.8, and so no KD could be obtained, while at pH 8 an approximate KD of 8.1 µM was obtained. It was evident that the two orders of magnitude improvement in KIC50 of I4.05 compared to Anle-138b was matched by a similar degree of improvement in terms of binding efficacy. Figure 4d shows that I4.05 has no effect on the seeded aggregation of Aβ42, nor does it bind effectively to Aβ42 fibrils, which suggests that this molecule is not a promiscuous aggregation inhibitor between different amyloidogenic proteins.
Inhibition of aggregation using brain-derived seeds
While this result was encouraging, with the recent determination of the pathological αS fibril structure54, it became clear that the recombinant in vitro fibril structure we had employed for computational and experimental work was different to that found in the brains of patients with PD. To test whether these molecules might work against patient-derived fibrils, these molecules were tested in a real-time quaking-induced conversion (RT-QuIC) seed amplification assay (Fig. 5) that employs brain samples from patients suffering with dementia with Lewy bodies (DLB). The dominant fibril structure identified in DLB was found to match the dominant structure observed in PD54.
The RT-QuIC assay was initially introduced as a diagnostic assay55,56, showing distinct aggregation curves in the presence of brain material derived from different pathologies57. In this case, we use it to test the ability for these molecules to slow the aggregation of αS induced by DLB brain material. As a negative control, samples from patients with a tauopathy (corticobasal degeneration, CBD) were also used, as these did not induce αS aggregation as no αS seeds were present (Fig. 5a,b). No aggregation was observed in the CBD samples over the timescale observed except for Anle-138b, which accelerated aggregation under this condition. This unusual behavior may be due to Anle-138b’s reportedly low solubility12. The conditions are different to those initially screened, as this assay was carried out at pH 8 and utilized shaking to accelerate seeded aggregation. This is a more challenging paradigm for the molecules to function in as multiple aggregation processes occur in tandem41. In addition to secondary nucleation from the fibril surfaces, fragmentation of the fibrils induced via shaking results in more fibril ends for elongation, which in turn provides more fibril surface for secondary nucleation.
Despite these challenges, and the different fibril structure present, the lead molecules still function well in inhibiting aggregation, and still at substoichiometric ratios (Fig. 5c). There was a clear improvement for the leads over Anle-138b, which again appeared to accelerate aggregation, and the parent molecule, although the ranking of the leads in terms of efficacy is altered compared to the screening assay. To understand these results we note that there is a similarity in the binding pockets in the structures 6CU7 (recombinant) and 8A9L (brain derived) (Supplementary Fig. 16). We currently do not know whether this similarity is serendipitous, but binding pockets with similar features can also be observed via cryogenice electron microscopy in the multiple system atrophy (MSA) type I and MSA type II fibril folds as well as the Lewy fold, with an unresolved species bound within the pocket54.
To account for differences in brain samples and also investigate potential efficacy against MSA-derived brain material, we tested a single concentration of the same selection of molecules against three neuropathologically confirmed MSA brain samples (Supplementary Fig. 17a,c) and two further DLB brain samples (Supplementary Fig. 17a,d). As a further negative control, a sample with no seed or brain material was tested, to determine the degree of spontaneous nucleation in the absence of an inducer (Supplementary Fig. 17b). Aggregation in this negative control was effectively inhibited by all the potent ML molecules, given that αS was likely to assume the 6CU7 polymorph in this condition, and not by Anle-138b, which accelerated aggregation. It should be noted that the CBD samples are the better negative control for RT-QuIC, as all brain samples contain tissue matrix components that may sequester αS and reduce its aggregation. The unseeded sample began aggregation at ~40–50 h, whereas CBD samples did not exhibit aggregation over a span of 80 h (Supplementary Fig. 17e). Fibrils present in DLB and MSA samples were able to counteract this effect. For the other DLB and MSA samples, broadly similar trends were observed to those shown in Fig. 5. The ML molecules did appear more efficacious against MSA samples (Supplementary Fig. 17c), perhaps because the MSA pocket more closely matches that of the targeted 6CU7 polymorph (four flanking lysines around a histidine residue) compared to the 8A9L polymorph found in PD and DLB (four flanking lysines around a tyrosine residue) as shown in Supplementary Fig. 16. The behavior of Anle-138b was variable as, where the ML-derived molecules inhibited aggregation to some extent across all examples, Anle-138b either had no effect (unseeded and MSA samples 1 and 2) or induced (CBD sample, MSA sample 3 and DLB sample 1) or mildly inhibited aggregation (DLB samples 2 and 3).
Oligomer quantification by microfluidic free-flow electrophoresis
Having observed that molecule I3.02 was the most broadly effective in the RT-QuIC assay, an investigation was carried out to directly measure the oligomeric species formed during the reaction. This was achieved using microfluidic free-flow electrophoresis (µFFE)58, a technique optimized using similar conditions to that used in the RT-QuIC assay, albeit at higher αS concentration (100 µM). The results of this are shown in Fig. 6. Aggregation time courses were tracked using AlexaFluor 488 labeled N122C αS rather than ThT. Figure 6 shows a schematic of the approach, where samples were extracted from an aggregation time course, centrifuged to remove insoluble aggregates, and finally submitted to µFFE. The degree of deflection and the photon count of each particle are proportional to the size and charge of the biomolecule. The former allows the separation of monomers from oligomers and the latter gives a measure of the number and size of the oligomers at a particular time point in the presence of different inhibitors. Oligomer electrophoretic mobility (μo) for an oligomer composed of nm monomer units is proportional to oligomer charge (qo) and inversely proportional to oligomer hydrodynamic radius (ro) and so can be described by58
$${\mu }_{{\mathrm{o}}}\propto \frac{{q}_{{\mathrm{o}}}}{{r}_{{\mathrm{o}}}}\propto \frac{{{n}_{{\mathrm{m}}}}^{v}}{{r}_{{\mathrm{o}}}}$$
(1)
where v is a scaling exponent linking qo with nm. Approximating the oligomers as spherical species yields58
$${\mu }_{o}\propto \frac{{{n}_{{\mathrm{m}}}}^{v}}{{r}_{{\mathrm{m}}}{{n}_{{\mathrm{m}}}}^{\frac{1}{3}}}=\frac{{{n}_{{\mathrm{m}}}}^{{v}^{* }}}{{r}_{{\mathrm{m}}}}$$
(2)
where the oligomer electrophoretic mobility is defined only in terms of the monomer number (nm) and hydrodynamic radius (rm), and the scaling exponent v* = v − 1/3. Samples were extracted at the t1/2 of the negative control (1% dimethyl sulfoxide (DMSO)) and the results are shown in Fig. 6. Anle-138b dosing resulted in a smaller population of large aggregates, as may be expected from the slight acceleration in the aggregation observed in the fluorescence values, while I3.02 reduced both the size and the number of oligomers present in comparison to the DMSO control. The ranking of these inhibitors was further validated in a subsequent study of oligomer levels using solid state nanopores combined with DNA nanostructure tagging59.