Overview

We are looking for an experienced Lead Data Engineer to drive the design, development, and scaling of robust data pipelines that handle large volumes of legal and financial data collected via scrapers. In this role, you will lead a team of data engineers, collaborate closely with AI/ML engineers, DevOps, Front‑end, and Back‑end teams, and ensure efficient, high‑quality data workflows critical to the platform. Client: Our client is a leading legal recruiting company building a data‑driven platform designed specifically for lawyers and law firms. The platform consolidates everything in one place — news and analytics, real-time deal and case tracking from multiple sources, enriched firm and lawyer profiles with cross‑linked insights, rankings, and more. Project Overview: The platform aggregates data from hundreds of public sources, including law firm websites, deal announcements, legal databases, and media publications, creating a unified ecosystem of structured and interconnected legal data. It combines AI-driven enrichment, automated data processing, and scalable infrastructure to ensure comprehensive and reliable coverage of the legal market.

Responsibilities:
  • Lead and mentor a team of data engineers, providing technical guidance, code reviews, and career development support.
  • Design and implement data ingestion pipelines to collect and process structured and unstructured data from multiple online sources (web scraping, APIs, feeds, etc.).
  • Develop and optimize ETL/ELT workflows using Python and SQL.
  • Build and orchestrate scalable data workflows using AWS services such as Batch and S3.
  • Develop and deploy internal data APIs and utilities supporting platform data access and manipulation.
  • Implement robust text extraction and parsing logic to handle diverse data formats.
  • Ensure data quality through validation, deduplication, normalization, and lineage tracking across Raw, Curated, and Enriched data layers.
  • Containerize and orchestrate data workloads using Docker and native AWS solutions.
  • Collaborate closely with AI, Back‑end, and Front‑end teams to ensure efficient data integration and flow.
Required Qualifications:
  • Proven experience leading and mentoring a data engineering team.
  • Strong experience with AWS (AWS Batch, S3, Step Functions, SQS).
  • Experience in Python for data processing and backend development.
  • Hands‑on experience with relational databases (PostgreSQL) and strong SQL skills.
  • Experience with Master Data Management (MDM) and data quality practices.
  • Practical experience with Docker and containerized development workflows.
  • Experience with web scraping, text extraction, and other data ingestion techniques.
  • Solid understanding of cloud‑based data pipelines and the AWS ecosystem.
  • Strong analytical mindset, good communication skills, and ability to collaborate across cross‑functional teams.
Nice To Have:
  • Hands‑on experience with Apache Spark and SQL for distributed data processing.
  • Experience with EMR or SageMaker.
Note:

✨ Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.

Please clearly mention that you have heard of this job opportunity on https://ijob.am.