Data Architect

JOB DESCRIPTION

Key Responsibilities & Scope of Work

A. Architecture Assessment & Strategic Roadmap

● Evaluate the current data engineering framework end-to-end: medallion architecture layering, naming conventions, ingestion patterns, processing logic, security controls, and data quality mechanisms.

● Benchmark the current state against industry best practices and produce a prioritized improvement roadmap with clear effort-vs-impact trade-offs.

B. Data Estate Governance

● Build and maintain a comprehensive inventory of the data estate — cataloging all source

systems (onboarded and prospective) and the subject areas each covers (ingested and

not yet ingested).

● Establish this inventory as a living artifact that informs onboarding decisions, coverage

analysis, and platform planning.

C. Standards Definition & Enforcement

● Design, integrate, or refactor naming conventions for schemas, tables, views, orchestration jobs, and pipelines — along with the migration approach for transitioning to new standards where needed. ● Define standardized ingestion and processing patterns spanning the full medallion architecture, including sub-layering strategy, format standardization (Parquet, Avro, Delta), secure PII ingestion, data normalization, technical data quality tracking, row- and column-level access controls, late-arriving dimension management, and data export workflows.

● Establish clear pattern selection criteria so engineers know which approach to apply for a given source type or use case.

● Define and operationalize the exception management process for handling justified deviations from established standards.

D. Hands-On Implementation

● Build production-grade boilerplate code for each standardized pattern using the existing GCP toolchain (BigQuery, CloudSQL,Cloud Composer, Dataflow, Dataproc, Cloud Storage, Pub/Sub, and related services).

● Ensure templates are modular, well-documented, and immediately adoptable by the engineering team.

E. CI/CD & Developer Experience

● Support the integration of data engineering pipelines with the CI/CD solution, aligning with the broader CI/CD modernization initiative's timeline and tooling decisions.

● Contribute to developer experience improvements that reduce friction in pipeline development, testing, and deployment.

F. Knowledge Transfer & Enablement

● Author the "Source Onboarding Playbook" — a repeatable, step-by-step guide for bringing new data sources into the platform, covering initial assessment, pattern Page 3 selection, naming convention application, quality gates, access control setup, and production release.

● Mentor and upskill data engineers on the new standards, patterns, and tooling through documentation, walkthroughs, and hands-on pairing.

Resource Requirements (What We're Looking For)

Must-Have

● Substantial progressive experience in data engineering, data architecture, or analytics platform development, with a significant portion spent in hands-on, code-level roles — not purely advisory or managerial positions.

● Deep, demonstrable expertise in designing and operating large-scale analytical solutions (data warehouses, data lakes, lakehouses) serving enterprise-grade workloads.

● Strong hands-on proficiency with GCP data services — BigQuery, CloudSQL(Federated Query), Cloud Composer (Airflow), Dataflow (Apache Beam), Dataproc (Spark), Cloud Storage, and Pub/Sub.

● Proven track record of implementing medallion architecture (Bronze/Silver/Gold) or equivalent layered data platform patterns at scale.

● Experience defining and enforcing data engineering standards, naming conventions, and governance frameworks across multiple teams and workstreams.

● Experience with dbt, Apache Iceberg, Delta Lake, or similar transformation and open table format technologies.

● Practical experience with PII handling, data masking, tokenization, and implementing row- and column-level security in cloud data platforms.

● Strong background in CI/CD for data pipelines (Terraform, Cloud Build, GitHub Actions, dbt, or equivalent).

● A track record of building reusable templates, frameworks, and boilerplate code that engineering teams actually adopt and rely on.

● Solid understanding of data quality frameworks, data contracts, and pipeline observability.

Nice-to-Have

● Experience in the logistics industry or adjacent supply chain-intensive sectors, with exposure to high-volume transactional data, shipment tracking, fleet management, or warehouse and distribution analytics.

● Familiarity with data cataloging and metadata management tools (Dataplex, Purview, Alation, or equivalent).

● GCP Professional Data Engineer certification or equivalent.

# Deliverable Description

1 Current State Assessment & Gap Analysis A comprehensive evaluation of the existing data engineering framework, medallion architecture layering, and naming conventions — benchmarked against industry best practices with a prioritized improvement roadmap.

2 Data Estate Inventory A complete catalog of source systems (onboarded and not) and subject areas (ingested and not), serving as the single source of truth for coverage and onboarding decisions.

3 Naming Convention Standards & Migration Plan Integrated and standardized naming conventions for schemas, tables, views, jobs, and pipelines — with a defined migration approach for transitioning existing assets where applicable.

4 Standardized Ingestion & Processing Patterns Documented and codified patterns covering medallion sub-layering, format standards, secure PII ingestion,

normalization, data quality tracking, access controls, late-arriving dimensions, and data export — each with clear application criteria.

5 Exception Management Process A formal, operationalized process for requesting, reviewing, approving, and documenting deviations from data engineering standards.

6 GCP Boilerplate Implementation Production-ready, modular boilerplate code for each standardized pattern, built on the existing GCP toolchain and ready for team adoption.

7 CI/CD Integration Support Active contribution to integrating data engineering pipelines with the CI/CD solution, aligned with the modernization initiative's timeline.

8 Source Onboarding Playbook A step-by-step, repeatable playbook for onboarding new data sources — from initial assessment through production deployment, including pattern selection, quality gates, and access control setup.

Job Description

About Avensys Consulting Pte. Ltd.