Data Scientist, Knowledge Graphs
Company: Mithrl
Location: San Francisco
Posted on: April 2, 2026
|
|
|
Job Description:
ABOUT MITHRL We imagine a world where new medicines reach
patients in months, not years, and where scientific breakthroughs
happen at the speed of thought. Mithrl is building the world’s
first commercially available AI Co-Scientist. It is a discovery
engine that transforms messy biological data into insights in
minutes. Scientists ask questions in natural language, and Mithrl
responds with real analysis, novel targets, hypotheses, and
patent-ready reports. Our traction speaks for itself: 12X
year-over-year revenue growth Trusted by leading biotechs and big
pharma across three continents Driving real breakthroughs from
target discovery to patient outcomes. ABOUT THE ROLE We are hiring
a Data Scientist, Knowledge Graphs to build and scale the
biological knowledge layer that powers the Mithrl AI Co-Scientist.
This role focuses on ingesting and harmonizing the world’s most
important biological data sources and curating the relationships
that allow our system to reason across pathways, targets, diseases,
compounds, and multimodal datasets. You will ingest data from
public consortia and well maintained peer reviewed sources and
unify them into a coherent, versioned knowledge graph. You will
identify new node types, define relationship schemas, harmonize
variable IDs, and ensure metadata remains consistent across all
integrated sources. You will also build automated curation
pipelines that expand and refine the knowledge graph using both
data driven methods and domain logic. Beyond ingestion and
curation, you will create the tools and frameworks that allow users
to interact with the knowledge graph and even build their own
custom graphs based on the results they generate inside Mithrl.
Your work will form the foundation for pathway reasoning, target
scoring, evidence aggregation, and multimodal interpretation inside
the AI Co-Scientist. WHAT YOU WILL DO Ingest, harmonize, and
version high value public biological datasets such as CellxGene,
Gemma, ARCHS4, ENCODE, GTEx, TCGA, etc. Ingest well maintained peer
reviewed knowledgebases including OpenTargets, HPA, and similar
resources Build automated pipelines to curate and expand
relationships inside the knowledge graph Define and evolve schemas
for node types, relationships, metadata rules, and ontology
alignment Harmonize variable IDs and metadata fields across all
imported sources to create a unified knowledge layer Build and
maintain versioning, change tracking, and provenance systems for
all data and relationships Develop the framework that allows users
to build custom knowledge graphs from the analyses they run inside
Mithrl Build features that allow users to explore, query, and
interact with their graphs Work closely with ML engineers,
bioinformatics teams, and discovery application teams to ensure the
knowledge graph supports downstream reasoning and analysis Validate
the correctness, completeness, and integrity of the knowledge graph
across releases WHAT YOU BRING Required Qualifications Strong
experience in data science, bioinformatics, computational biology,
or a related field Experience working with biological
knowledgebases, public datasets, or ontology driven systems
Familiarity with graph data structures, relationship modeling, and
knowledge graph concepts Experience harmonizing heterogeneous
biological datasets and mapping variable IDs across sources
Proficiency in Python and scientific computing libraries Ability to
build ingestion pipelines for structured or semi structured
biological data Strong understanding of metadata standards,
biological ontologies, and domain logic Ability to translate
complex biological information into structured, machine readable
representations Excellent communication skills and comfort
collaborating across engineering and scientific teams Nice to Have
Experience with graph databases or graph query languages Experience
with KG curation, link prediction, relationship extraction, or
graph based ML Familiarity with multi modal data integration
Previous work on biological or chemical knowledge graphs Experience
with public consortia such as ENCODE, GTEx, TCGA, or ChEMBL, etc.
Prior experience in a tech bio startup or scientific software
environment WHAT YOU WILL LOVE AT MITHRL You will build the core
knowledge layer that the AI Co-Scientist uses to reason about
biology Team: Join a tight-knit, talent-dense team of engineers,
scientists, and builders Culture: We value consistency, clarity,
and hard work. We solve hard problems through focused daily
execution Speed: We ship fast (2x/week) and improve continuously
based on real user feedback Location: Beautiful SF office with a
high-energy, in-person culture Benefits: Comprehensive PPO health
coverage through Anthem (medical, dental, and vision) 401(k) with
top-tier plans We encourage you to apply even if you do not believe
you meet every single qualification. Not all strong candidates will
meet every single qualification as listed. Research shows that
people who identify as being from underrepresented groups are more
prone to experiencing imposter syndrome and doubting the strength
of their candidacy, so we urge you not to exclude yourself
prematurely and to submit an application if you're interested in
this work. We think AI systems like the ones we're building have
enormous social and ethical implications. We think this makes
representation even more important, and we strive to include a
range of diverse perspectives on our team.
Keywords: Mithrl, Walnut Creek , Data Scientist, Knowledge Graphs, Science, Research & Development , San Francisco, California