Your Hundertserver mission:
As a Site Reliability Engineer (SRE) at Hundertserver, you are responsible for the stable, high-performing, and secure operation of modern cloud platforms. Through automation, monitoring, SLAs, and incident response, you ensure that our systems not only run – but continuously improve. You work closely with customers, development, and infrastructure teams, bring clarity to complex operational issues, and create sustainable solutions – hands-on, pragmatic, and with a high degree of ownership.
The Main Tasks:
Key Responsibilities
Availability & Stability
• Ensuring platform availability according to defined SLOs / SLAs
• Analyzing and resolving incidents & performance issues (including on-call duties)
• Building and maintaining robust alerting, logging, and monitoring setups
• Root cause analysis & implementation of preventive measures
Automation & Infrastructure
• Automating provisioning, scaling, and maintenance (IaC with Terraform, Ansible, etc.)
• Operating and enhancing Kubernetes environments (cloud & on-prem)
• Developing and maintaining self-healing and auto-scaling mechanisms
• Creating and maintaining runbooks & playbooks
Monitoring, Observability & Performance
• End-to-end monitoring with tools like Prometheus, Grafana, Loki, ELK
• Setting up and managing SLIs and SLOs – data-driven platform control
• Performing performance analyses (workloads, traffic, databases) and ongoing optimization
• Setting up & maintaining distributed tracing and logging systems
Security & Operational Hygiene
• Implementing and enforcing security standards (least privilege, TLS, secrets management)
• Regular health checks, updates, and patching
• Ensuring availability through established backup & disaster recovery processes
Collaboration & Consulting
• Close collaboration with development, support, and platform teams
• Consulting customers on operating models, platform metrics & architectural decisions
• Training internal teams on topics such as monitoring, SRE basics & troubleshooting
You fit to our team when:
What You Should Bring
Technical Profile
• Linux expertise (Debian, Ubuntu, RHEL)
• Deep knowledge of Kubernetes – clusters, ingress, operators, Helm, etc.
• Experience with cloud platforms (AWS, Azure, GCP)
• Strong expertise in monitoring stacks (Prometheus, Grafana, Loki, ELK)
• Proficiency in Infrastructure-as-Code (Terraform, Ansible, Puppet)
• Scripting and automation skills (Bash, Python, Go)
• Familiarity with logging, tracing & incident management processes
Soft Skills & Working Style
• Proactive troubleshooting & high quality awareness
• Structured, analytical thinking – solution-oriented and pragmatic
• Excellent communication skills (with customers, developers, and operations)
• Focus on sustainability & automation rather than firefighting
• Willingness to participate in on-call rotations (standby, SLA windows)
Nice to Have
• Certifications such as CKA / CKS / AWS DevOps or equivalent
• Experience with GitOps, ArgoCD, or Policy-as-Code
• Knowledge of FinOps / cost optimization in cloud platforms
What we offer:
What You Can Expect at Hundertserver
• Real development – in technology, methodology & culture
• Modern platforms & tools – with room for your own ideas
• Ownership & trust – we work in partnership, not through hierarchy
• Flexible working hours & a remote-first culture
• Hands-on mentality & direct customer impact
About us
ONEHUNDRED / Hundertserver is the cloud service provider that doesn’t just support digital transformation – we actively shape it. Based in the heart of Berlin and trusted by clients such as Gründerszene, Edelman, and Prognos, we develop innovative, secure, and sovereign cloud solutions for a connected future.
Our team lives and breathes technology, thrives on challenges, and is always pushing the boundaries of what cloud can do. With over 20 years of experience, deep open-source expertise, and a strong focus on data sovereignty, efficiency, and quality, we guide organizations on their journey into the multi-cloud world.
What defines us? Integrity, team spirit, a passion for learning, and the courage to break new ground. We’re open, agile, and driven by progress – and we’re looking for people who share that mindset.
Join our team and help shape the future of cloud with us.