Job Title: Data Engineer
Location: Lagos
About Interswitch Group:
Interswitch is an Africa-focused integrated digital payments and commerce company that facilitates the electronic circulation of money as well as the exchange of value between individuals and organisations on a timely and consistent basis. We started operations in 2002 as a transaction switching and electronic payments processing, and have progressively evolved into an integrated payment services company, building and managing payment infrastructure as well as delivering innovative payment products and transactional services throughout the African continent. At Interswitch, we offer unique career opportunities for individuals capable of playing key roles and adding value in an innovative and fun environment.
Job Purpose:
- To build reliable data integration solutions, clean, transform, and analyze vast amounts of big data from various systems using Spark and other ETL tools to provide ready-to-use dataset to data scientists and data analysts, while ensuring data quality and integrity.
- Collaborate with stakeholders to design scalable and efficient data solutions that enable informed decision-making, and compliance with data governance.
Responsibilities:
- Develop and implement efficient data ingestion pipelines to acquire and extract large volumes of structured and unstructured data. Ensure data integrity and quality during the ingestion process.
- Integrate various data sources and formats into a unified data ecosystem.
- Design and execute data processing workflows to clean, transform, and enrich raw data.
- Develop scalable data processing algorithms and techniques to handle big data volumes efficiently.
- Optimize data processing pipelines for performance and reliability.
- Document data engineering processes, workflows, and system architectures for future reference and knowledge transfer.
- Prepare technical documentation, including data dictionaries, data lineage, and system specifications.
- Create and maintain documentation related to data governance, compliance, and security protocols.
- Create and maintain data storage architectures that cater to the specific needs of big data applications. Implement robust data management strategies, including data partitioning, indexing, and compression techniques.
- Ensure data security, privacy, and compliance with relevant regulations.
- Collaborate with data scientists and analysts to understand their requirements and translate them into scalable data models.
- Apply data visualization techniques to communicate insights effectively.
- Create and maintain data storage architectures that cater to the specific needs of big data applications. Implement robust data management strategies, including data partitioning, indexing, and compression techniques.
- Ensure data security, privacy, and compliance with relevant regulations.
- Collaborate with data scientists and analysts to understand their requirements and translate them into scalable data models. Apply data visualization techniques to communicate insights effectively.
- Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand their data requirements and provide technical support.
- Communicate complex technical concepts and findings to non-technical stakeholders in a clear and concise manner.
- Participate in knowledge sharing activities and contribute to the continuous improvement of data engineering practices Identify and implement strategies to enhance the performance and efficiency of big data applications and systems. Conduct performance tuning, load testing, and capacity planning to meet scalability and throughput requirements.
- Monitor system performance and troubleshoot issues related to data processing, storage, and retrieval.
- Establish and enforce data governance policies, standards, and best practices.
- Ensure compliance with data regulations, such as GDPR or HIPAA, by implementing appropriate data protection measures.
- Conduct data audits and implement data quality controls to maintain data accuracy and consistency.
Education:
General Education
- BSc in Computer Science/Engineering or related field.
- Evidence of strong industry/sector participation and relevant professional certifications such as:
- Azure Data Engineer Associate
- Databricks Certified Data Engineer Associate
- Databricks Certified Data Engineer Professional
- Amazon Web Services (AWS) Certified Data Analytics – Specialty
- Cloudera Data Platform Generalist Certification
- Data Science Council of America (DASCA) Associate Big Data Engineer
- Data Science Council of America (DASCA) Senior Big Data Engineer
- Google Professional Data Engineer
- IBM Certified Solution Architect – Cloud Pak for Data v4.x
- IBM Certified Solution Architect – Data Warehouse V1
Experience:
General Experience
- At least 3 years’ developing, deploying, and managing robust ETL/ELT data solutions, preferably in a reputable Financial Institution or FinTech company.
Behavioural Competencies:
- Have strong analytical thinking skills to understand complex data requirements, identify patterns and trends in data, and design efficient and scalable data solutions. Be able to break down complex problems into manageable components and develop logical and effective solutions. Be able to analyze datarelated issues, troubleshoot problems, and implement appropriate resolutions.
- Approach challenges with a proactive mindset and come up with innovative and practical solutions.
- Have a keen eye for detail, ensuring data accuracy, quality, and integrity through thorough data validation and verification processes. Paying attention to performance optimization and data security measures.
- Be able to work effectively in a team environment, communicate and collaborate with team members (data scientists, analysts, software engineers, and other stakeholders), and contribute your expertise to achieve common goals
- Possess good communication skills to effectively communicate technical concepts and requirements to both technical and non-technical stakeholders.
- Be able to articulate your ideas, document your work, and create clear and concise technical documentation for future reference.
- Adhere to ethical guidelines and maintain a high level of professionalism.
- Prioritize data privacy, security, and compliance with relevant regulations.
- Demonstrate integrity, honesty, and accountability in your work.
- Stay updated with the latest trends and advancements in data engineering technologies, tools, and best practices. Actively seek opportunities for professional development and self-improvement.
- Anticipate potential issues, design robust and scalable data architectures, and implement monitoring and alerting systems to detect and address issues proactively.
Skills:
- Proficiency in working with various Big Data technologies is essential. This includes Apache Hadoop, Apache Spark, Apache Kafka, Apache Hive, Apache Pig, and other related frameworks. Understand the architecture, components, and ecosystem of these technologies to design and implement robust data processing pipelines.
- Expertise in data processing and Extract, Transform, Load (ETL) techniques.
- Must be skilled in designing and implementing efficient data pipelines to extract data from various sources, transform it into a suitable format, and load it into target systems or data warehouses. Proficiency in PySpark and experience with Microsoft Azure Databrick is required.
- Proficiency in implementing CI/CD practices for data pipelines, including version control, automated testing, and deployment processes, using tools like Git, Jenkins, or similar platforms to ensure smooth and reliable deployment of data pipelines, for faster development cycles, improved code quality, and efficient release management.
- Proficiency in python programming language and SQL, to manipulate and transform data, build data pipelines, and automate processes. Should understand data profiling, data cleansing, and data validation techniques to ensure the accuracy, completeness, and consistency of the data. Knowledge of data privacy and compliance regulations is also important.
- Deep understanding of distributed systems and parallel computing concepts.
- Should be familiar with concepts like data partitioning, parallel processing, fault tolerance, and cluster management.
- Should be familiar with different data storage technologies and NoSQL databases like Apache HBase, Apache Cassandra, MongoDB, or Amazon DynamoDB. Should understand the trade-offs between different storage options and select the appropriate one based on the use case and requirements.
- Should be able to design efficient data schemas that optimize data storage, retrieval, and processing. Should understand concepts such as entity relationship modeling, dimensional modeling, and schema evolution
- Should have experience in real-time data processing techniques. Be familiar with stream processing frameworks like Apache Flink, Apache Kafka Streams, or Apache Storm. Understand concepts like event-driven architectures, message queues, and real-time analytics for building real-time data pipelines.