Senior Backend Engineer (Data Infrastructure)

Full-Time · SeniorWashington, DC / Remote (US)

Mission

You'll architect the autonomous agents that power our intelligence platform-forensic analysis systems that operate acYou'll architect and scale the data pipelines that feed our entire intelligence platform-transforming raw legal records from government sources and commercial databases into the forensic foundation that powers our agents and knowledge graph.Build the data infrastructure that makes litigation intelligence engineering possible.

Ready to Apply?

Remote-friendly team based in the Washington, DC metro area

Competitive salary, equity, and performance bonus

Comprehensive health coverage and flexible time off

Backed by elite investors and trusted by AmLaw 100 firms

Apply for this Role

Ex Parte is an equal opportunity employer.

What You'll Build

Lakehouse Architecture

Design and implement Bronze/Silver/Gold data layers in Databricks and own pipeline reliability
Build high-volume ingestion from USPTO, PACER, and other sources with strong audit trails
Implement Delta Lake for ACID transactions, time travel, and version history
Create streaming pipelines for real-time PACER docket updates

Entity Resolution & Linking

Unify patents, cases, and parties across heterogeneous data sources
Deduplicate entities with fuzzy matching (e.g., "Apple Inc." vs. "Apple Computer, Inc.")
Maintain version history for all entity merges and splits
Build audit trails for every data transformation (legal compliance requirement)

Document Processing

OCR correction and text extraction for scanned historical documents
PDF parsing for inconsistent USPTO formats (6+ different API versions)
Structure detection (identify claims vs. specification vs. prosecution history)
Text extraction from court dockets (PACER's inconsistent HTML across 94 district courts)

Data Quality & Monitoring

Validation systems that detect missing, corrupted, or anomalous data
Alerting when pipelines fail or fall behind SLA
Automated testing for all data transformations using Databricks workflows
Work closely with graph, AI, and product teams to expose clean, well-documented datasets

Requirements

6+ years building data infrastructure in production
Deep expertise with Databricks, Spark, and Delta Lake
Strong Python and SQL skills for pipeline engineering
Experience with Azure Data Factory or similar orchestration tools (Databricks Workflows, Airflow, etc.)
Track record of building systems that handle millions of records reliably

Bonus

Experience with graph databases (Neo4j) or vector search systems
Prior work with government APIs (USPTO, PACER, SEC, etc.)
Understanding of Time-Travel/immutability patterns for regulated industries
DevOps skills (Docker, Kubernetes, Terraform)

Compensation

Base salary: $160,000 - $220,000 (based on experience)
Stock Options
Performance bonus eligibility