Site Reliability Engineer Intern
Talos Trading
New York, New York, United States
June 2024 - Aug 2024

demo

  • Configured a Cloud Composer instance on GCP using Terraform and implemented order reconciliation DAGs using BigQuery, enabling actionable business-level alerts, analyzed costs to avoid a potential $1.5M annual increase
  • Optimized system performance and cost by developing pipelines to project hardware specifications from current metrics, meeting target usage, achieving a 50% performance boost and $100K monthly savings
  • Mapped VM connections in GCP using TypeScript to visualize complex market data flows, integrated real-time Datadog metrics for each VM using Flask, enhancing system health visibility and accelerating initial troubleshooting by 30%
  • Automated market data failover process and integrated post-deployment validation into the Octopus Deploy pipeline with Datadog alerting, increasing efficiency, enhancing system reliability, and security by minimizing direct access
  • Integrated commit message checks on pull requests with GitHub Actions, enabling traceability from Jira tickets
  • Added automatic YAML/JSON validators using submodules to ensure code quality, leading to faster code reviews
  • Analyzed logs in Linux to debug trading platform issues using Postgres database, demonstrating cross-functional expertise