Big Data Engineer

[REMOTE]

27 Jul 2021

Hello !!!

Hope you're doing well. I’ll get right to it. I am looking to hire an SR. Big Data Developer who has exp working with,
Spark, Scala, Databricks, SQL, Azure (nice to have). This is a project with our partner In IN. We already have someone working on this team and are looking to add one more to the existing team. If this role isn’t up your alley, I have a plethora of other roles with our Clients/vendors that I can review with you.

If you're in the market, please feel free to reach out to me and I'll go over the details with you. Lastly, I’d like to also mention, depending on your immigration status, our agency also sponsors individuals that require H1B & Green Cards. We cannot work through any layers and will be hiring the candidate on our payroll directly. Looking forward to hearing back from you.

Job Title: Sr. Big Data Engineer

Location: San Francisco, CA (open to remote)

Duration: 6 months (will extend we have multiple consultants on this team that have been there 2+ years)

Interview: 2 rounds (1st round 1-hour video technical interview, 2nd round 30 min formality personality call)

I have 8 openings in San Francisco, CA. These roles are open to remote candidates but they have to work PST hours. We have direct access to hiring managers with quick turnarounds on interviews. If candidates can crack the first interview they will get the job.

Top Skills' Details

Spark

Scala

SQL

Databricks

Azure (nice to have)

Job Description

Looking for a strong Big Data Engineer with Spark, Scala, SQL, and Azure

Architecture and Platform Organizations are looking for an experienced Big Data Engineer to build analytics and ML platforms to collect, store, process, and analyze huge sets of data spread across the organization. The platform will provide frameworks for quickly rolling out new data analysis for data-driven products and micro-services.

The platform will also enable machine/deep learning infrastructure that operationalizes data science models for broad consumption. You'll partner with end-to-end Product Managers and Data Scientists to understand customer requirements and design prototypes and bring ideas into production. You'll be developing real products. You need to be an expert in design, coding, and scripting. You'll be writing high-quality code that is consistent with our standards, creating new standards as necessary, and demonstrate correctness with pragmatic automated tests. You'll review the work of other engineers to improve quality and engineering practices and participate in continuing education programs to grow your skills. You'll be serving as a member of an Agile Engineering team and participate in the team's workflow.

Ideally 5-8 years of experience as a Software Engineer, experienced in building distributed, scalable, and reliable data pipelines that ingest and process data at scale and in batch and real-time. Strong knowledge of programming languages/tools including Java, Scala, Spark, SQL, Hive, ElasticSearch. Most tools within the Hadoop Ecosystem are necessary, but we're mainly looking for Spark and Scala (Java if not Scala). Experience with streaming technologies such as Spark Streaming, Flink, or Apache Bean. Experience with Kafka is a plus. Working experience with various NoSQL databases such as Cassandra, HBase, MongoDB, and/or Couchbase. Would be a plus if you have prior Machine Learning or Deep Learning knowledge (this will be learned in the job).

You will be working with the Marketing and Supply Chain side working on a Personalization initiative and getting data feed work to and from 3rd party vendors doing the analytics, marketing, and operations for email campaigns and catalog campaigns. Eventually will get into Machine Learning in areas of Product Recommendations on the site.

The team is working in Spark in Scala to ingest transaction and clickstream data to come up with associations and product recommendations. You'll be working on batch processing and real-time streaming projects. In batch processing, you'll be creating Spark Jobs & Azure Cloud using Azure tools to do some of the scheduling and workflow management for the batch jobs. Currently migrating from Teradata to Microsoft Azure. Overall, you'll be building a new Data Platform using Spark and building out a data pipeline from transactional systems and process them in Spark and the framework is written in Scala. (or Java)

1) Basic Transformations like filter, map & Actions like count, Group by, etc using Dataframe API
2) Iterating over Scala collections
3) Spark Parallelism – Data Ingestion from External RDBMS, Local Transformations
4) Datawarehouse – Dimensions, Facts when to do full load vs Incremental, etc’s
5) Basic software engineering principles.

Regards,

Rakesh Kumar

APPLY

Mid-Senior Level

Full Time

[REMOTE]

Share Job Opening