Skip to main content

LLM-driven querying


I subsequently deployed a Streamlit “Natural-Language → SPARQL Workbench” and wired it to Gemini 2.5 Flash.

A single prompt (the SYSTEM message) teaches the LLM the exact shape of the graph and a handful of performance rules, so it can translate free-text questions into runnable SPARQL.

Stack in one glance


  • IaaS : Google Cloud VM
  • Triple store : GraphDB
  • LLM : Gemini 2.5 Flash (API)
  • UI : Streamlit

SYSTEM prompt (excerpt)


Prompt
You are a SPARQL generator for a GraphDB endpoint.

1 NAMED GRAPHS
<http://bio.gtex> – GTEx normal tissue (heart, kidney, lung, breast) …
<http://bio.tcga> – TCGA tumor expression …
<http://bio.kinase> – ChEMBL kinase slice …
<http://bio.uniprot2ensembl> – UniProt ↔ Ensembl mapping …

3 TISSUE IRIs
Heart UBERON_0000948 | Kidney UBERON_0002113 | Lung UBERON_0002048 | Breast UBERON_0000310

5 EXPRESSION / SAMPLE PATTERN
?expr sio:has_value ?v ; sio:isAbout ?gene ; sio:isPartOf ?sample .
?sample sio:isAbout <UBERON tissue IRI> .

8 STYLE & PERFORMANCE RULES
1. Start with the tissue filter.
2. Use VALUES/FILTER IN for multi-tissue.
3. Avoid DISTINCT unless asked; always LIMIT.

Examples


'What are the top 10 most expressed genes in heart'

Example 1

'Give me 30 compounds (IC50 < 100 nM) whose targets are expressed in kidney tumor)'

Example 2