Skip to main content

Early Target Assessment and Prioritization


As a proof of concept for downstream workflows, I take advantage of the previous tools to establish a basic kinase target identification / proritization pipeline using kidney cancer as an example.


At a high level, the workflow can be represented as:

Natural language request → LLM-generated query → GraphDB aggregation → Exported tables → Downstream analysis

Obtaining Data


It initally relies on four pieces of information that can be queried in the following manner:


Target harmonisation

Query:
Retrieve all kinase targets together with their UniProt and Ensembl gene mappings


Goal:
Retrieve all kinase targets together with their UniProt identifiers and corresponding Ensembl gene identifiers

This allows connectiong pharmacological targets from ChEMBL with expression values from TCGA (tumor) and GTEx (normal) kidney tissues


Tumor expression aggregation

Query:
Calculate the mean kidney tumor expression for each mapped kinase gene


Goal:
Calculate the mean expression for each mapped gene across kidney tumor samples

This step generates a expression summary that can later be compared against normal tissue baseline expression



Normal tissue aggregation

Query:
Calculate the mean GTEx kidney cortex expression for each mapped kinase gene


Goal:
Calculate the mean GTEx expression for the same genes across normal kidney cortex samples

This provides the baseline physiological expression needed to assess tumor specificity



Compound coverage aggregation

Query:
Count the number of distinct compounds associated with each kinase target


Goal:
Count the number of distinct compounds associated with each kinase target in the ChEMBL kinase dataset

This serves as a simple proxy for how extensively a kinase has already been explored pharmacologically


Downstream Target Triage


Once retrieved, the tables were exported and combined locally.

At this stage, the goal is to establish a simple first-pass target triage workflow using three criteria:

  • Tumor expression — mean expression across kidney tumor samples
  • Tumor enrichment — relative expression in tumor tissue compared with normal kidney tissue
  • Compound coverage — number of distinct compounds associated with each kinase in ChEMBL

From these, the analysis asks a simple question:

Which kinases are sufficiently expressed in tumor tissue, enriched relative to normal kidney tissue, and supported by evidence of prior chemical targeting?

Multi-dimensional Target Landscape


To make the previous question operational, a simple candidate region (green shade) was defined using two biological criteria:

  • Tumor expression ≥ 10 TPM
  • Tumor enrichment ≥ 2× tumor / normal
    (equivalent to log₂ fold-change ≥ 1)

Larger bubbles indicate greater compound coverage in ChEMBL, suggesting that these kinases have been more extensively explored chemically. Smaller bubbles indicate less explored targets, which may reflect either greater novelty or greater uncertainty.

Therefore, the second plot compares the candidate kinases in terms of tumor enrichment and compound count, and highlighted points correspond to kinases that are not clearly outperformed by another candidate in both dimensions simultaneously, representing the strongest trade-offs between tumor specificity and prior chemical exploration:

KinaseKey features in the trade-off analysis
STK32BStrongest tumor enrichment among the candidates, moderate compound coverage
NEK6Combines high tumor enrichment with substantial compound coverage
PRKACBBalanced trade-off between tumor enrichment and prior chemical exploration
PRKCQHigh compound coverage, reasonable tumor enrichment
MAPK1Extensively explored kinase chemically, extremely high compound coverage

Interpretation Disclaimer


This proof of concept is intentionally simplified.

In a real target evaluation workflow, additional evidence would normally be incorporated, including mutation prevalence, pathway dependency, structural ligandability, toxicity considerations, clinical biomarker associations, etc.

The purpose here is not to claim definitive targets, but to demonstrate how a knowledge graph can support data extraction, crossdomain integration, and interpretable target triage by combining transcriptomic context with pharmacological metadata.