Job Description Overview
  • Skill: Scala, AWS, EMR, S3, Redshift, DocumentDB, MongoDB, Apache Spark, Spark Streaming, NoSQL, Vector Databases, data pipeline architecture
  • Location: Pune
  • Experience: 6

We are looking for an experienced Data Engineer with 5+ years of hands-on experience in developing data pipelines and working with big data technologies. You will be responsible for designing, implementing, and optimizing data solutions using Scala, AWS cloud services, and other relevant technologies such as Spark and MongoDB. Your expertise will be crucial in building scalable and efficient systems that enable the company to make data-driven decisions.

 

 

Key Responsibilities:

  • Develop, build, and maintain data pipelines and ETL processes using Scala and Apache Spark.
  • Design and implement solutions leveraging AWS services such as EMR, S3, Redshift, and DocumentDB/MongoDB.
  • Work on Spark Streaming and batch processing to process large volumes of real-time and historical data.
  • Collaborate with data scientists, analysts, and other engineering teams to ensure data flow consistency and quality.
  • Optimize and troubleshoot existing data pipelines for performance, scalability, and reliability.
  • Implement data models and database solutions using Vector Databases and other NoSQL/SQL technologies.
  • Automate tasks and ensure data pipeline deployment and monitoring are seamless.
  • Provide technical mentorship and support to junior data engineers.

 

 

Required Qualifications:

  • 5+ years of hands-on experience in data engineering, specifically working with large-scale data processing systems.
  • Strong proficiency in Scala for building data pipelines and applications.
  • Extensive experience with AWS cloud services (EMR, S3, Redshift, DocumentDB, etc.).
  • Experience with Apache Spark (both batch and streaming).
  • Hands-on experience with NoSQL databases such as MongoDB, DocumentDB, and Vector Databases.
  • Solid understanding of data modeling, pipeline architecture, and performance optimization.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes) is a plus.
  • Strong problem-solving, troubleshooting, and communication skills.
  • Experience with CI/CD pipelines and version control systems (e.g., Git).