Early Target Assessment and Prioritization
As a proof of concept for downstream workflows, I take advantage of the previous tools to establish a basic kinase target identification / proritization pipeline using kidney cancer as an example.
At a high level, the workflow can be represented as:
Natural language request → LLM-generated query → GraphDB aggregation → Exported tables → Downstream analysis
Obtaining Data
It initally relies on four pieces of information that can be queried in the following manner:
Target harmonisation
Query:
Retrieve all kinase targets together with their UniProt and Ensembl gene mappings
Goal:
Retrieve all kinase targets together with their UniProt identifiers and corresponding Ensembl gene identifiers
This allows connectiong pharmacological targets from ChEMBL with expression values from TCGA (tumor) and GTEx (normal) kidney tissues
Tumor expression aggregation
Query:
Calculate the mean kidney tumor expression for each mapped kinase gene
Goal:
Calculate the mean expression for each mapped gene across kidney tumor samples
This step generates a expression summary that can later be compared against normal tissue baseline expression
Normal tissue aggregation
Query:
Calculate the mean GTEx kidney cortex expression for each mapped kinase gene
Goal:
Calculate the mean GTEx expression for the same genes across normal kidney cortex samples
This provides the baseline physiological expression needed to assess tumor specificity
Compound coverage aggregation
Query:
Count the number of distinct compounds associated with each kinase target
Goal:
Count the number of distinct compounds associated with each kinase target in the ChEMBL kinase dataset
This serves as a simple proxy for how extensively a kinase has already been explored pharmacologically
Downstream Target Triage
Once retrieved, the tables were exported and combined locally.
At this stage, the goal is to establish a simple first-pass target triage workflow using three criteria:
- Tumor expression — mean expression across kidney tumor samples
- Tumor enrichment — relative expression in tumor tissue compared with normal kidney tissue
- Compound coverage — number of distinct compounds associated with each kinase in ChEMBL
From these, the analysis asks a simple question:
Which kinases are sufficiently expressed in tumor tissue, enriched relative to normal kidney tissue, and supported by evidence of prior chemical targeting?
Multi-dimensional Target Landscape
To make the previous question operational, a simple candidate region (green shade) was defined using two biological criteria:
- Tumor expression ≥ 10 TPM
- Tumor enrichment ≥ 2× tumor / normal
(equivalent to log₂ fold-change ≥ 1)
Larger bubbles indicate greater compound coverage in ChEMBL, suggesting that these kinases have been more extensively explored chemically. Smaller bubbles indicate less explored targets, which may reflect either greater novelty or greater uncertainty.
Therefore, the second plot compares the candidate kinases in terms of tumor enrichment and compound count, and highlighted points correspond to kinases that are not clearly outperformed by another candidate in both dimensions simultaneously, representing the strongest trade-offs between tumor specificity and prior chemical exploration:
| Kinase | Key features in the trade-off analysis |
|---|---|
| STK32B | Strongest tumor enrichment among the candidates, moderate compound coverage |
| NEK6 | Combines high tumor enrichment with substantial compound coverage |
| PRKACB | Balanced trade-off between tumor enrichment and prior chemical exploration |
| PRKCQ | High compound coverage, reasonable tumor enrichment |
| MAPK1 | Extensively explored kinase chemically, extremely high compound coverage |
Interpretation Disclaimer
This proof of concept is intentionally simplified.
In a real target evaluation workflow, additional evidence would normally be incorporated, including mutation prevalence, pathway dependency, structural ligandability, toxicity considerations, clinical biomarker associations, etc.
The purpose here is not to claim definitive targets, but to demonstrate how a knowledge graph can support data extraction, crossdomain integration, and interpretable target triage by combining transcriptomic context with pharmacological metadata.