Overview
We are seeking a skilled Data Architect with strong expertise in AWS technologies (EMR, SageMaker) and Python (FastAPI) to lead the design and implementation of the platform’s data architecture. This role involves defining data models, building ingestion pipelines, applying AI-driven entity resolution, and managing scalable, cost-effective infrastructure aligned with cloud best practices.
Client:
Our client is a leading legal recruiting company focused on building a cutting-edge data-driven platform for lawyers and law firms. The platform consolidates news and analytics, real-time deal and case tracking from multiple sources, firm and lawyer profiles with cross-linked insights, rankings, and more — all in one unified place.
Project Overview:
- Define entities, relationships, and persistent IDs; enforce the Fact schema with confidence scores, timestamps, validation status, and source metadata.
- Blueprint ingestion workflows from law firm site feeds; normalize data, extract entities, classify content, and route low-confidence items for review.
- Develop a hybrid of deterministic rules and LLM-assisted matching; configure thresholds for auto-accept, manual review, or rejection.
- Specify Ops Portal checkpoints, data queues, SLAs, and create a corrections/version history model.
- Stage phased rollout of data sources—from ingestion through processing, storage, replication, to management via CMS.
- Align architecture with AWS and Postgres baselines; design for scalability, appropriate storage tiers, and cost-effective compute and queuing solutions.
- Utilize AWS services such as EMR for big data processing and SageMaker for AI/ML workflows.
- Develop robust backend APIs using Python FastAPI for data services and platform integrations.
- Experience within legal tech or recruiting data domains.
- Familiarity with Content Management Systems (CMS) for managing data sources.
- Knowledge of data privacy, security regulations, and compliance standards.
- Proven experience as a Data Architect or Senior Data Engineer working extensively with AWS services, especially EMR and SageMaker.
- Strong proficiency in Python development, preferably with FastAPI or similar modern frameworks.
- Deep understanding of data modeling principles, entity resolution, and schema design for complex data systems.
- Hands-on experience designing and managing scalable data pipelines, workflows, and AI-driven data processing.
- Familiarity with relational databases such as PostgreSQL.
- Strong knowledge of cloud infrastructure cost optimization and performance tuning.
- Excellent problem-solving skills and ability to work in a collaborative, agile environment.
Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.
Please clearly mention that you have heard of this job opportunity on https://ijob.am.



