Big Data Platform Engineer / BD Site Reliability Engineer (SRE) - Job Requirements
Position Overview
We are seeking a Big Data Platform Engineer to support our enterprise customers' big data infrastructure deployments and operations. This role requires a hands-on technical professional who can work both remotely and on-site at customer locations, including weekend emergency support when critical issues arise.
Key Responsibilities
- Customer Site Support: Provide ongoing technical support for customer big data clusters and installations
- Installation & Upgrades: Lead and support installation/upgrade projects both remotely (back office support) and on-premises at customer sites
- Travel Requirements: Willingness to travel to customer sites as needed for installations, upgrades, and critical issue resolution
- Emergency Response: Available for weekend/after-hours support during customer site crises
- Installation Code Development: Develop, maintain, and fix installation automation scripts and tools
- CI/CD Pipeline Management: Build and maintain continuous integration/deployment pipelines
- Monitoring Solutions: Develop and enhance monitoring tools for big data infrastructure
Required Technical Skills
Core Infrastructure & Automation
- Ansible: Advanced proficiency in playbook development and infrastructure automation
- Bash: Strong shell scripting capabilities for system administration and deployment
- Python: Solid programming skills for automation tools and utilities development
- RedHat Linux: Deep knowledge of RedHat Enterprise Linux distribution, system administration, and package management
CI/CD & Virtualization
- Jenkins: Experience building and maintaining CI/CD pipelines
- VMware vSphere: Virtual infrastructure management and deployment
Big Data Ecosystem (Required)
- Apache Spark: Cluster configuration, tuning, and troubleshooting
- YARN: Resource management and cluster administration
- HDFS: Distributed file system management and optimization
- Apache Kafka: Streaming platform deployment and maintenance
- Apache ZooKeeper: Coordination service configuration and management
Preferred (Bonus) Skills
- Kubernetes: Container orchestration and cloud-native deployments
- Vanilla, Open Shift, RKE
- Harbor: Container registry management
- Longhorn: Distributed storage solutions
Work Environment & Requirements
- Hybrid Role: Mix of remote work and customer site visits
- Travel: Must be willing and able to travel to customer sites domestically and potentially internationally
- Availability: Flexible schedule including weekend emergency support capability
- Customer-Facing: Strong communication skills for direct customer interaction during installations and support
Ideal Candidate Profile
- 3-5 years experience with big data infrastructure and automation
- Proven track record in customer-facing technical roles
- Strong problem-solving skills under pressure
- Self-motivated with ability to work independently at customer sites