Data modelling (WP7)

Research Theme 7

Data modelling

Many different types of data will be collected in PAINSTORM, in order to try and better understand the complex nature of neuropathic pain. This research theme brings together the information generated in the rest of the project and analyses it together to look for correlations, causes, and effects. Using statistics, we can look for the factors that best explain neuropathic pain. We will also focus on tools that are easy to use in a clinical setting.

Taking data generated through Work Packages 1 to 6, we will use sophisticated statistical and computational approaches to model and quantify associations between a wide range of factors associated with neuropathic pain, testing their strength and validity. We will adopt two distinct approaches:

Examining all available factors, with a view to understanding the pathophysiology of neuropathic pain;
Focusing on factors and techniques with the greatest clinical utility, with a view to applying the findings in real-world medical practice.

Ultimately we will develop composite biomarker signatures for neuropathic pain, enabling greater understanding of neuropathic pain as well as individualised assessment, and therefore stratified treatment or prevention approaches.

Ranking clinical, biophysical, demographic, genetic and self-reported psychosocial measures

Exploratory hierarchical modelling will employ all available data to elucidate dependence and inter-dependence patterns among study variables. Features will be constructed by grouping original variables into larger mechanistic concepts (e.g. peripheral nerve fibre integrity, central processing). Hypothesis testing will determine which features are significantly different between patient subgroups and model agnostic feature selection, encapsulated in cross-validation, to determine the most powerful predictors in the context of predictive models.

Identifying patient subgroups sharing common traits to improve stratification

Based on high bivariate correlations, more complex multivariate factor and correspondence analyses will be investigated to uncover latent variables describing a common trait. After further validation, a structural equation model will be used to align the latent variables arising from each group into a single, unified model, describing input of various sources from stimulus perception along the nervous pathway to central processing. This methodology has been developed for immunology and we have started to apply it to multi-dimensional datasets relating to pain. In a separate analysis, we will use a fully data driven approach to identify interrelated factors arising from Work Packages 1 to 6, and common patterns. Here, we will use algorithmic clustering and machine learning.

A cross-validated predictive model for the patient subgroups

Finally, we will exploit the large, phenotypically harmonised datasets with longitudinal outcomes to build models that maximise predictive accuracy on the pseudo-independent validation set. Cross-validation will fine tune the parameters of an array of machine learning classification algorithms. We will include a burden/benefit ratio for included variables as identified with patient partners in Work Package 2 and the developed model will be informed by and cross-validated with the directed acyclic graph models developed in Work Package 3.

Outcomes and Results

This work is ongoing. Come back later!

External links

This paper introduces the approach used in DOLORisk to build a risk model that aims to predict the onset and the resolution of chronic neuropathic pain: Development and external validation of multivariable risk models to predict incident and resolved neuropathic pain: a DOLORisk Dundee study