Made with
Standard Resume
Learn more

Tenzin Wangdhen

Cloud and Reliability Engineering Lead
San Francisco, California (CA)
|

linkedin.com/in/tenzinw
|

tenzinwangdhen.com
|

tenzinwangdhen@gmail.com
|

7732033994
Dedicated software engineer and leader with expertise in infrastructure, big data, and system reliability.
T
W

Work Experience

Iterable

Senior Staff Software Engineer - User Data Infrastructure
|

Nov 2022 - Current

At Iterable, a marketing automation platform serving giants like Doordash, Strava, and Box, spearheads efforts to scale and automate infrastructure using Terraform, Kubernetes, and the Elastic Stack.

Currently, as the tech lead and product owner for both User Data at Rest and User Data Infrastructure teams, I'm involved in:

  • Designing and implementing the Elasticsearch Broker service, set to enhance the interface between our application and Elasticsearch clusters with dynamic rate-limiting capabilities.
  • Overseeing the migration of 2 petabytes across 22 Elasticsearch clusters from Elasticsearch 5 on EC2 to Elasticsearch 8 on Kubernetes. This is projected to yield a 10x surge in search speed, 15% quicker data ingestion, and 20% cost savings on an $8m/yearly budget.
  • Collaborating closely with product managers to define the roadmap and quarterly objectives for Data at Rest, and identifying key future projects within the Architectural Support group.

Iterable

Staff Software Engineer - User Data Infrastructure
|

Apr 2020 - Oct 2022

As an initial "embedded" SRE for the User Data Infrastructure (UDI) team, addressed a notable volume of incidents. Later, transitioned to a core member of the team.

Created a Kubernetes Jobs framework called “Krobs” to run long-running tasks on Kubernetes instead of on engineer’s laptops, increasing security and automation.

  • Developed "Krobs", a Kubernetes Jobs framework, facilitating long-running tasks on Kubernetes rather than on individual laptops, enhancing both security and automation.
  • Implemented self-hosted CockroachDB clusters on EKS for high-scale tasks, such as UUID lookups and Journeys.
  • Established the UDI infrastructure team, experts in Kubernetes and cloud infrastructure serving the needs of the data platform.
  • Streamlined various UDI team processes, including rolling restarts of Elasticsearch, index reshards, snapshot restores, and cluster creation.
  • Spearheaded the design and rollout of an in-house database authentication service, enabling engineers to securely modify and reset authentication for Elasticsearch and CockroachDB through an API.

Iterable

Staff Site Reliability Engineer
|

Nov 2019 - Apr 2020
  • Established an automated CI/CD pipeline using Harness, achieving a threefold increase in deployment speed and slashing downtime risk by 85%.
  • Devised and set up an nginx ingress service, integrating a load balancer with nginx hosts between the API DNS and backend hosts.

Iterable

Senior Site Reliability Engineer
|

Nov 2018 - Nov 2019

Founding member of a three person SRE team at Iterable, which eventually grew to over 10 members.

  • Engineered an event stream pipeline, facilitating the monitoring of all application events as structured logs, handling 2TB of data daily.
  • Initiated the Blameless postmortem approach, resulting in a notable 43% decrease in critical incidents.
  • Collaborated with application teams to integrate Datadog APM, enhancing visibility into challenging deployments and services.
  • Served as the incident commander, coordinating diverse engineering teams to swiftly address incidents and spearhead post-mortem discussions.

Iterable

Platform Engineer
|

Nov 2017 - Nov 2018

At Iterable, co-founded the infrastructure team. Primary mission: transition from manually provisioned infrastructure to a streamlined, code-driven approach using Terraform and Puppet.

  • Orchestrated the automation of Elasticsearch cluster provisioning through Terraform and Puppet. This was used as a proof of concept for migrating the rest of our infrastructure to Terraform.
  • Designed Kibana dashboards and alerts, adopted by over 100 team members organization-wide.
  • Conceived and managed a 200TB Elasticsearch cluster dedicated to logging and monitoring, capable of ingesting upwards of 50K documents every second.

Iterable

Customer Success Engineer
|

Nov 2016 - Nov 2017

One of the first customer success engineers, supporting frontline support representatives with complex technical issues.

  • Provided technical integrations and support for companies large enterprise customers.
  • Managed and optimized the data ingestion pipeline built on S3 and Logstash.
  • Built internal tools to help scale the CS team, and setup Datadog monitoring/alerting to catch production issues.
  • Developed an internal Slackbot, automating routine database queries, catering to support reps without direct database access.

Wiser

Technical Account Manager / Solutions Engineer
|

Aug 2015 - Sep 2016

​Led technical integrations with enterprise clients such as Petsmart, Walmart, and Luxottica. Helped manage our internal admin tool, and built internal tools including a Slackbot for cross-database queries and a React app for picking scraper templates.

  • Designed and developed a responsive GUI for picking scraper templates (React.js, Node.js)
  • Designed and developed a Slackbot for cross-database queries, allowing non-technical team members to access data quickly (MySQL, PostgresSQL, DynamoDB)
  • Created modules using Node.js to automate the scraper build and QA process to significantly increase TAM team productivitiy
  • Contributed to and maintained the internal admin tool for Python job management (Ruby on Rails)

Epic Systems

Project Manager
|

Sep 2014 - Aug 2015

Epic Systems is the nation's largest electronic medical records company. The majority of its clients are large hospitals. Managed the $3M implementation of lab information systems (LIS) at Santa Clara Valley Medical Center (SCVMC).

  • Managed six analysts, providing technical support and leading daily one-on-one meetings and scrum sessions
  • Lead client meetings to review scope, third party vendor integration, and project planning
  • Worked with developers to customize Epic integration with a third party blood bank system

Education

University of Wisconsin - Madison

Sep 2010 - May 2014
  • Alan C. Filley Integra Mutual Scholarship in Entrepreneurship Award
  • Badger Innovations (president): responsible for finding clients, recruiting students, and managing web development projects
  • Software Training for Students trainer: taught HTML/CSS, Javascript, Photoshop, Illustrator
  • Garage Physics (EEG Quadcopter project)

Skills

  • Elasticsearch
  • Terraform
  • Kubernetes
  • Kibana
  • Logstash
  • Javascript
  • Node.js
  • Python
  • AWS
  • Salt Stack
  • Kafka / MSK
  • CockroachDB