27 Feb 2024
Job Brief:
Steppingblocks brings big data analytics to higher education with rich data and interactive visualizations. We enable students to make data-driven, efficient decisions regarding their education and career journeys. We also help university administrators better understand outcomes for their graduates to better modify curricula for demanded skills, engage with employers/alumni, and report to relevant stakeholders.
We are looking for an experienced Senior Data Engineer to join our data team. You will build, improve, and maintain the Python-based ETL data pipeline and analytics infrastructure that powers our products and business decisions.
This is an opportunity to work with cutting-edge technologies and large volumes of data to solve complex problems. You will work with unstructured (text) data to transform and enrich data and tune functions for speed and accuracy at scale. You will work on socio-economic and firmographics data and collaborate with both Product and Business stakeholders to prioritize new features or fixes that are needed.
This opportunity requires the ability to quickly learn new technologies and techniques. Strong communication skills are also vital to interact with diverse teams across the organization. The role offers great potential for growth into team leadership positions.
Responsibilities:
Collect, integrate, and organize raw data from disparate sources into structured formats
Design, develop and optimize scalable ETL data pipelines and workflows
Build custom algorithms and data analysis processes to generate business insights
Work closely with data scientists, analysts and business teams to identify and fulfill new analytics feature requests
Monitor and enhance data quality, reliability, and performance
Architect new data collection procedures and data stores/databases
Contribute to the evolution of the overall data infrastructure and roadmap
Qualifications:
5-10+ years experience building and optimizing data pipelines, ETL processes and data sets
Expert knowledge of Python, including Pandas, NumPy, and other common data manipulation libraries
Experience with big data tools like Spark, Elasticsearch, Snowflake, etc.
High attention to detail
Excellent written and verbal communication skills
Knowledge of software engineering best practices including testing, documentation, and code reviews
Comfortable working with business stakeholders as well as software engineers
Preferred Qualifications:
BSC/MSC in Computer Science, Statistics, Mathematics or another quantitative field
Experience with Dask or distributed computation frameworks
Knowledge of advanced statistical methods and machine learning techniques
Background in economics, social sciences or business analytics
Web scraping skills
Strong mathematical skills and statistical background
Knowledge of other programming languages (C, Cython, Numba, Rust)
Experience with NoSQL and RDBMS databases
Familiarity with Graph Databases and algorithms
Familiarity with Data Governance principles
Perks
Unlimited PTO
Medical, Vision and Dental benefits
Development stipend
401K
Mid-Senior Level
Full Time
[REMOTE]