Kidney transplants at risk: Machine learning is full of pitfalls

Kidney transplants at risk: Machine learning is full of pitfalls

O’Connell et al. published a recent study on the prediction of transplant risk from biopsy transcriptome data. The study collected 204 renal allograft biopsies after kidney transplantation. Microarrays were used to find genes that correlate with kidney damage 12 months after the biopsy was taken. Kidney damage was measured using the Chronic Allograft Damage Index. A machine learning model was developed using penalized regression, and it consisted of 13 genes that predicted the development of transplant complications. The study concluded that the model had superior predictive potential (Area Under the Curve=0.967) – better than clinical data normally used to identify complications. However, criticism of the study was quick, concerning the possibility of overfitting (when a machine learning model is fitting noise instead of true signal). The authors of the original study did not separate the test set from the training set when selecting genes for the model. This results in obvious “contamination” of the test set, which should be kept completely isolated until when it is time to report the final performance metrics. The authors replied, stating that they undertook steps to avoid overfitting (leave-one-out cross-validation), which gave similar gene sets as the original method. We are curious to why the authors choose AUC as their performance metric, if the aim was to predict a continuous variable? Perhaps other metrics such as root mean square error would have been more appropriate? From our experience it is very easy to end up with overfitted models when biomolecular data is the predictor. Extreme care must be taken from the beginning in the development of a model.

p53: Does the guardian of the genome also guard the heart?

p53: Does the guardian of the genome also guard the heart?

p53 dubbed “guardian of the genome”, is a protein (and transcription factor) discovered in 1979 and represents one of the most studied proteins (there are 84,538 abstracts in PubMed mentioning this protein). It has received such massive attention due to its biological role as a tumor suppressor. Thus, insights into p53 can be translated into knowledge of tumors. Mak et al. hypothesized that p53 has an important role in the protection of the heart, due to the observation that apoptosis of cardiomyocytes in end-stage human heart failure, correlates with elevated p53 levels. The authors created transgenic mice without the p53 gene. Gene expression in cardiac tissues were measured with microarrays. Mice without the p53 gene developed hypertropy of the heart. p53 was found to regulate mitochondrial biogenesis as well as energy metabolism. Limitations of the study include a somewhat outdated technique for measuring gene expression as well as the focus on perturbing a single component. We could also not easily find the sample size. Nevertheless, the study implicates p53 as an important component in a protein network in the heart.

Targeted sequencing in neurodevelopmental disorders: a case-control study

Targeted sequencing in neurodevelopmental disorders: a case-control study

Neurodevelopmental disorders, for example intellectual disability, autism and developmental delay, are heterogeneous conditions of which little is known in terms of their broad underlying genetics (twin studies confirm there is a genetic component, and exome sequencing studies have captured extremely rare mutations). Stessman et al. published a comprehensive study describing targeted sequencing of 208 candidate genes in 11,730 cases and 2867 controls. 50% of the cases had a prior diagnosis of autism spectrum disorder, and the remaining had intellectual disability. Molecular inversion probes (a technique originating from early genotyping assays) were used to focus on the 208 genes. 91 genes showed increased de novo mutations, and the authors then used functional studies in Drosophila as follow up. The authors performed a network-based analysis and identified interacting genes associated with high-functioning autism. Three new candidate risk genes were identified: NAA15, KMT5B, and ASH1L.

Fish populations have undergone adaptation to fend off human-caused environmental pollution

Fish populations have undergone adaptation to fend off human-caused environmental pollution

Evolution is a slow process, and it is therefore seldom that we are able to directly observe genetic changes that have been selected for in our lifetime. Killifish is the name of several families of fish that inhabit lakes, streams, and rivers around the world. In the Atlantic ocean, we know that some killifish populations residing in highly polluted habitats have undergone rapid evolution to this toxic environment. This is the case for Fundulus heteroclitus, and a new research study published in the journal Science sequenced ~50 individuals from eight populations of F. heteroclitus. Two populations were sequenced to 7-fold coverage and the remaining populations to 0.6-fold coverage. The authors used a genomic window of 5 kb showing Tajima’s D to identify regions associated with pollution tolerance. Interestingly, there were shared regions of showing signals of selection, which is likely to reflect convergent evolution. Some of these genes encode genes involved in the aryl hydro-carbon receptor signaling pathway. Notably, a member of the large and polymorphic cytochrome P450 family (CYP1A) – involved in the transformation of xenobiotica – is present within these regions.

Meta-analysis of blood pressure reveals global trends: blood pressure is increasing in low-income countries

Meta-analysis of blood pressure reveals global trends: blood pressure is increasing in low-income countries

Blood pressure (hypertension) is a major risk factor for many different medical conditions, including cardiovascular disease and kidney disease. It causes a systematic weakening of the heart muscle, making it work less efficiently and eventually terminating in cardiac arrest. The relationship between blood pressure and disease is log-linear. Blood pressure is affected by a multitude of external and internal factors, such as environment, nutrition and genetics, amount of fat tissue, alcohol use, smoking and physical activity, stress, and many others. It has been suspected that global trends in blood pressure have changed over time. NCD Risk Factor Collaboration, a network of medical scientists around the world, conducted a humongous study on blood pressure using data from 19.1 million individuals. They used systolic and diastolic blood pressure measurements and pooled data at the national, subnational, and community levels. Data from more than 200 countries were included in the study. The authors used a Markov chain Monte Carlo algorithm to fit the statistical model, taking errors, noise and different measurement devices into consideration. The major conclusion from this study is the shift of high blood pressure from high-income countries in North America and Europe to low-income countries in Asia and sub-Saharan Africa. Blood pressure has largely decreased in high-income countries – a trend also confirmed by other studies such as the MONICA project.