Data Architect
Job Description
JOB DESCRIPTION
Key Responsibilities & Scope of Work
A. Architecture Assessment & Strategic Roadmap
● Evaluate the current data engineering framework end-to-end: medallion architecture layering, naming conventions, ingestion patterns, processing logic, security controls, and data quality mechanisms.
● Benchmark the current state against industry best practices and produce a prioritized improvement roadmap with clear effort-vs-impact trade-offs.
B. Data Estate Governance
● Build and maintain a comprehensive inventory of the data estate — cataloging all source
systems (onboarded and prospective) and the subject areas each covers (ingested and
not yet ingested).
● Establish this inventory as a living artifact that informs onboarding decisions, coverage
analysis, and platform planning.
C. Standards Definition & Enforcement
● Design, integrate, or refactor naming conventions for schemas, tables, views, orchestration jobs, and pipelines — along with the migration approach for transitioning to new standards where needed. ● Define standardized ingestion and processing patterns spanning the full medallion architecture, including sub-layering strategy, format standardization (Parquet, Avro, Delta), secure PII ingestion, data normalization, technical data quality tracking, row- and column-level access controls, late-arriving dimension management, and data export workflows.
● Establish clear pattern selection criteria so engineers know which approach to apply for a given source type or use case.
● Define and operationalize the exception management process for handling justified deviations from established standards.
D. Hands-On Implementation
● Build production-grade boilerplate code for each standardized pattern using the existing GCP toolchain (BigQuery, CloudSQL,Cloud Composer, Dataflow, Dataproc, Cloud Storage, Pub/Sub, and related services).
● Ensure templates are modular, well-documented, and immediately adoptable by the engineering team.
E. CI/CD & Developer Experience
● Support the integration of data engineering pipelines with the CI/CD solution, aligning with the broader CI/CD modernization initiative's timeline and tooling decisions.
● Contribute to developer experience improvements that reduce friction in pipeline development, testing, and deployment.
F. Knowledge Transfer & Enablement
● Author the "Source Onboarding Playbook" — a repeatable, step-by-step guide for bringing new data sources into the platform, covering initial assessment, pattern Page 3 selection, naming convention application, quality gates, access control setup, and production release.
● Mentor and upskill data engineers on the new standards, patterns, and tooling through documentation, walkthroughs, and hands-on pairing.
Resource Requirements (What We're Looking For)
Must-Have
● Substantial progressive experience in data engineering, data architecture, or analytics platform development, with a significant portion spent in hands-on, code-level roles — not purely advisory or managerial positions.
● Deep, demonstrable expertise in designing and operating large-scale analytical solutions (data warehouses, data lakes, lakehouses) serving enterprise-grade workloads.
● Strong hands-on proficiency with GCP data services — BigQuery, CloudSQL(Federated Query), Cloud Composer (Airflow), Dataflow (Apache Beam), Dataproc (Spark), Cloud Storage, and Pub/Sub.
● Proven track record of implementing medallion architecture (Bronze/Silver/Gold) or equivalent layered data platform patterns at scale.
● Experience defining and enforcing data engineering standards, naming conventions, and governance frameworks across multiple teams and workstreams.
● Experience with dbt, Apache Iceberg, Delta Lake, or similar transformation and open table format technologies.
● Practical experience with PII handling, data masking, tokenization, and implementing row- and column-level security in cloud data platforms.
● Strong background in CI/CD for data pipelines (Terraform, Cloud Build, GitHub Actions, dbt, or equivalent).
● A track record of building reusable templates, frameworks, and boilerplate code that engineering teams actually adopt and rely on.
● Solid understanding of data quality frameworks, data contracts, and pipeline observability.
Nice-to-Have
● Experience in the logistics industry or adjacent supply chain-intensive sectors, with exposure to high-volume transactional data, shipment tracking, fleet management, or warehouse and distribution analytics.
● Familiarity with data cataloging and metadata management tools (Dataplex, Purview, Alation, or equivalent).
● GCP Professional Data Engineer certification or equivalent.
# Deliverable Description
1 Current State Assessment & Gap Analysis A comprehensive evaluation of the existing data engineering framework, medallion architecture layering, and naming conventions — benchmarked against industry best practices with a prioritized improvement roadmap.
2 Data Estate Inventory A complete catalog of source systems (onboarded and not) and subject areas (ingested and not), serving as the single source of truth for coverage and onboarding decisions.
3 Naming Convention Standards & Migration Plan Integrated and standardized naming conventions for schemas, tables, views, jobs, and pipelines — with a defined migration approach for transitioning existing assets where applicable.
4 Standardized Ingestion & Processing Patterns Documented and codified patterns covering medallion sub-layering, format standards, secure PII ingestion,
normalization, data quality tracking, access controls, late-arriving dimensions, and data export — each with clear application criteria.
5 Exception Management Process A formal, operationalized process for requesting, reviewing, approving, and documenting deviations from data engineering standards.
6 GCP Boilerplate Implementation Production-ready, modular boilerplate code for each standardized pattern, built on the existing GCP toolchain and ready for team adoption.
7 CI/CD Integration Support Active contribution to integrating data engineering pipelines with the CI/CD solution, aligned with the modernization initiative's timeline.
8 Source Onboarding Playbook A step-by-step, repeatable playbook for onboarding new data sources — from initial assessment through production deployment, including pattern selection, quality gates, and access control setup.