Senior Backend Engineer (Data Infrastructure)
Full-Time · SeniorWashington, DC / Remote (US)
Mission
You'll architect the autonomous agents that power our intelligence platform-forensic analysis systems that operate acYou'll architect and scale the data pipelines that feed our entire intelligence platform-transforming raw legal records from government sources and commercial databases into the forensic foundation that powers our agents and knowledge graph.Build the data infrastructure that makes litigation intelligence engineering possible.
Ready to Apply?
Remote-friendly team based in the Washington, DC metro area
Competitive salary, equity, and performance bonus
Comprehensive health coverage and flexible time off
Backed by elite investors and trusted by AmLaw 100 firms
Apply for this Role
Ex Parte is an equal opportunity employer.
What You'll Build
Lakehouse Architecture
- Design and implement Bronze/Silver/Gold data layers in Databricks and own pipeline reliability
- Build high-volume ingestion from USPTO, PACER, and other sources with strong audit trails
- Implement Delta Lake for ACID transactions, time travel, and version history
- Create streaming pipelines for real-time PACER docket updates
Entity Resolution & Linking
- Unify patents, cases, and parties across heterogeneous data sources
- Deduplicate entities with fuzzy matching (e.g., "Apple Inc." vs. "Apple Computer, Inc.")
- Maintain version history for all entity merges and splits
- Build audit trails for every data transformation (legal compliance requirement)
Document Processing
- OCR correction and text extraction for scanned historical documents
- PDF parsing for inconsistent USPTO formats (6+ different API versions)
- Structure detection (identify claims vs. specification vs. prosecution history)
- Text extraction from court dockets (PACER's inconsistent HTML across 94 district courts)
Data Quality & Monitoring
- Validation systems that detect missing, corrupted, or anomalous data
- Alerting when pipelines fail or fall behind SLA
- Automated testing for all data transformations using Databricks workflows
- Work closely with graph, AI, and product teams to expose clean, well-documented datasets
Requirements
- 6+ years building data infrastructure in production
- Deep expertise with Databricks, Spark, and Delta Lake
- Strong Python and SQL skills for pipeline engineering
- Experience with Azure Data Factory or similar orchestration tools (Databricks Workflows, Airflow, etc.)
- Track record of building systems that handle millions of records reliably
Bonus
- Experience with graph databases (Neo4j) or vector search systems
- Prior work with government APIs (USPTO, PACER, SEC, etc.)
- Understanding of Time-Travel/immutability patterns for regulated industries
- DevOps skills (Docker, Kubernetes, Terraform)
Compensation
- Base salary: $160,000 - $220,000 (based on experience)
- Stock Options
- Performance bonus eligibility