Using self-reported symptoms, a machine learning model was able to predict the early stages of COVID-19 infection after only three days.
The timely detection of COVID-19 infections through PCR testing is vital to contain the spread of the virus. However, while PCR testing has become the most widely used analytical technique to detect the virus, the result is highly dependent on the timing of sample collection, the type of specimen and the quality of the sample. An alternative means of identifying infected individuals is through a combination of symptoms and then ensuring that only those with appropriate symptoms are tested. This approach was used in an Italian study of nearly 3000 subjects and with the aid of a short diagnostic scale, was able to correctly identify the symptoms associated with infection. This same methodology is utilised in the COVID-19 Symptom Study App which is a longitudinal, self-reported study of the symptom profile of patients with COVID-19. Through the use of machine learning models, the study has been able to develop models to identify the main symptoms of infection and their correlation with outcomes. Nevertheless, current models are not conducive to the early detection of infection. This prompted the COVID-19 Symptom Study team to create a machine learning model that captured self-reported symptoms for only the first three days and used this information to predict an individual’s likelihood of being COVID-19 positive.
The team used three different machine learning models to analyse self-reported symptoms. The first model was based on the NHS algorithm which uses the presence of cough, fever or loss of smell between days 1 and 3 as potentially representative of COVID-19 infection. The second logistic regression model, is based on an algorithm which incorporates loss of smell, persistent cough, fatigue and skipped meals and which has been previously validated and found to correlate well with COVID-19 infection. For the third algorithm, the team used 18 self-reported symptoms combined with co-morbidities as well as demographic data and referred to this as a hierarchical Gaussian process model. All three models were compared in terms of sensitivities, specificities and area under the receiver operating characteristics curve (AUC) and evaluated with a training set, for patients self-reporting symptoms between April and October 2020 and a test set for self-reported symptoms between October and November 2020.
Findings
There were data from 182,991 participants in the training set and 15,049 in the test set with a similar symptom distribution. The predictive power of the three model was different. For example, the hierarchical Gaussian process model showed the highest predictive value (AUC = 0.80, 95% CI 0.80–0.81) using three days of symptoms compared to the logistic regression model (AUC = 0.74) and the NHS model (AUC = 0.67). The hierarchical Gaussian process model for prediction of COVID-19 infection had a sensitivity of 73% and a specificity 72%. This was higher than either the logistic regression model (59%, 76%, sensitivity, specificity, respectively) and the NHS model (60%, 75%, sensitivity, specificity, respectively). Interestingly, the key symptoms predictive of early COVID-19 were loss of smell, chest pain, persistent cough, abdominal pain, feet blisters, eye soreness and unusual pain.
The authors concluded that the hierarchical Gaussian process model was successfully able to predict the early signs of infection and could be used to enable referral for testing and self-isolation when these symptoms were present.
Citation
Canas LS et al. Early detection of COVID-19 in the UK using self-reported symptoms: a large-scale, prospective, epidemiological surveillance study. Lancet Digit Health 2021