Job Description Overview
  • Skill: Databricks, AWS, SQL, Python, PySpark, Data Pipelines, API Development, Cloud Technologies, Data Integration, Troubleshooting
  • Location: Remote
  • Experience: 6

We are seeking a talented Data Engineer with experience in Databricks, AWS, and Python to join our growing data team. In this role, you will be responsible for designing, developing, and maintaining scalable data pipelines and workflows that efficiently manage and process large datasets. You will work with cutting-edge tools like PySpark and AWS services to ensure seamless data integration, transformation, and storage. If you are passionate about working with cloud technologies, optimizing data workflows, and driving innovation in data engineering, this is an excellent opportunity for you.


Required Skills & Qualifications:

  • Databricks:

    • Proficient in building and managing data pipelines on the Databricks platform using PySpark and Python for scalable data processing.
  • AWS:

    • Hands-on experience with AWS services such as S3, Lambda, Glue, and Redshift for data storage, processing, and integration.
  • SQL:

    • Strong SQL skills for writing complex queries to extract, transform, and analyze large datasets efficiently.
  • Python & PySpark:

    • Advanced coding skills in Python and PySpark to build and optimize data pipelines for large-scale data processing and transformation.
  • API Development:

    • Basic experience with developing and integrating APIs to enhance data accessibility and integration across systems.
  • Data Workflow Optimization:

    • Understanding of best practices in optimizing data pipelines for performance and scalability.
  • Problem Solving:

    • Strong troubleshooting skills to quickly identify and resolve data pipeline issues and ensure smooth operation.

Key Responsibilities:

As a Data Engineer, you will be responsible for:

  1. Data Pipeline Development:

    • Design, develop, and maintain robust data pipelines using Python and PySpark on Databricks to process and transform large datasets efficiently.
    • Implement data workflows to ensure seamless data extraction, transformation, and loading (ETL) from source systems to target storage.
  2. Data Workflow Optimization:

    • Work with AWS services like S3, Glue, and Redshift to manage and optimize data storage and processing workflows.
    • Continuously monitor and improve the performance and scalability of data pipelines to handle large-scale datasets.
  3. SQL Querying & Data Analysis:

    • Write and optimize complex SQL queries for data extraction, transformation, and analysis.
    • Ensure data quality and integrity through validation checks and troubleshooting of data pipeline issues.
  4. API Integration:

    • Collaborate with cross-functional teams to integrate APIs that enhance data accessibility and simplify data access for other teams.
    • Assist in the development and integration of APIs for seamless data exchange across platforms.
  5. Troubleshooting & Issue Resolution:

    • Troubleshoot issues related to data pipelines, ensuring smooth and uninterrupted data processing and storage operations.
    • Resolve data quality issues and optimize performance through debugging, testing, and implementing fixes.
  6. Collaboration & Continuous Improvement:

    • Work closely with other data engineers, data scientists, and business teams to deliver high-quality, actionable data insights.
    • Contribute to improving data engineering practices and workflows to drive continuous improvement in the team.