Early Target Assessment | Genomics Portfolio

Early Target Assessment and Prioritization

As a proof of concept for downstream workflows, I take advantage of the previous tools to establish a basic kinase target identification / proritization pipeline using kidney cancer as an example.

At a high level, the workflow can be represented as:

Natural language request → LLM-generated query → GraphDB aggregation → Exported tables → Downstream analysis

Obtaining Data

It initally relies on four pieces of information that can be queried in the following manner:

Target harmonisation

Query:
Retrieve all kinase targets together with their UniProt and Ensembl gene mappings

Goal:
Retrieve all kinase targets together with their UniProt identifiers and corresponding Ensembl gene identifiers

This allows connectiong pharmacological targets from ChEMBL with expression values from TCGA (tumor) and GTEx (normal) kidney tissues

Tumor expression aggregation

Query:
Calculate the mean kidney tumor expression for each mapped kinase gene

Goal:
Calculate the mean expression for each mapped gene across kidney tumor samples

This step generates a expression summary that can later be compared against normal tissue baseline expression

Normal tissue aggregation

Query:
Calculate the mean GTEx kidney cortex expression for each mapped kinase gene

Goal:
Calculate the mean GTEx expression for the same genes across normal kidney cortex samples

This provides the baseline physiological expression needed to assess tumor specificity

Compound coverage aggregation

Query:
Count the number of distinct compounds associated with each kinase target

Goal:
Count the number of distinct compounds associated with each kinase target in the ChEMBL kinase dataset

This serves as a simple proxy for how extensively a kinase has already been explored pharmacologically

Downstream Target Triage

Once retrieved, the tables were exported and combined locally.

At this stage, the goal is to establish a simple first-pass target triage workflow using three criteria:

Tumor expression — mean expression across kidney tumor samples
Tumor enrichment — relative expression in tumor tissue compared with normal kidney tissue
Compound coverage — number of distinct compounds associated with each kinase in ChEMBL

From these, the analysis asks a simple question:

Which kinases are sufficiently expressed in tumor tissue, enriched relative to normal kidney tissue, and supported by evidence of prior chemical targeting?

Multi-dimensional Target Landscape

To make the previous question operational, a simple candidate region (green shade) was defined using two biological criteria:

Tumor expression ≥ 10 TPM
Tumor enrichment ≥ 2× tumor / normal
(equivalent to log₂ fold-change ≥ 1)

Larger bubbles indicate greater compound coverage in ChEMBL, suggesting that these kinases have been more extensively explored chemically. Smaller bubbles indicate less explored targets, which may reflect either greater novelty or greater uncertainty.

Therefore, the second plot compares the candidate kinases in terms of tumor enrichment and compound count, and highlighted points correspond to kinases that are not clearly outperformed by another candidate in both dimensions simultaneously, representing the strongest trade-offs between tumor specificity and prior chemical exploration:

Kinase	Key features in the trade-off analysis
STK32B	Strongest tumor enrichment among the candidates, moderate compound coverage
NEK6	Combines high tumor enrichment with substantial compound coverage
PRKACB	Balanced trade-off between tumor enrichment and prior chemical exploration
PRKCQ	High compound coverage, reasonable tumor enrichment
MAPK1	Extensively explored kinase chemically, extremely high compound coverage

Interpretation Disclaimer

This proof of concept is intentionally simplified.

In a real target evaluation workflow, additional evidence would normally be incorporated, including mutation prevalence, pathway dependency, structural ligandability, toxicity considerations, clinical biomarker associations, etc.

The purpose here is not to claim definitive targets, but to demonstrate how a knowledge graph can support data extraction, crossdomain integration, and interpretable target triage by combining transcriptomic context with pharmacological metadata.

Early Target Assessment and Prioritization​

Obtaining Data​

Downstream Target Triage​

Multi-dimensional Target Landscape​

Interpretation Disclaimer​

Early Target Assessment and Prioritization

Obtaining Data

Downstream Target Triage

Multi-dimensional Target Landscape

Interpretation Disclaimer