Site Reliability Engineer Intern
Talos Trading
New York, New York, United States
June 2024 - Aug 2024
- Configured a
Cloud Composer
instance onGCP
usingTerraform
and implemented order reconciliation DAGs usingBigQuery
, enabling actionable business-level alerts, analyzed costs to avoid a potential $1.5M annual increase - Optimized system performance and cost by developing pipelines to project hardware specifications from current metrics, meeting target usage, achieving a 50% performance boost and $100K monthly savings
- Mapped VM connections in
GCP
usingTypeScript
to visualize complex market data flows, integrated real-timeDatadog
metrics for each VM usingFlask
, enhancing system health visibility and accelerating initial troubleshooting by 30% - Automated market data failover process and integrated post-deployment validation into the
Octopus Deploy
pipeline withDatadog
alerting, increasing efficiency, enhancing system reliability, and security by minimizing direct access - Integrated commit message checks on pull requests with
GitHub Actions
, enabling traceability from Jira tickets - Added automatic YAML/JSON validators using
submodules
to ensure code quality, leading to faster code reviews - Analyzed logs in
Linux
to debug trading platform issues usingPostgres
database, demonstrating cross-functional expertise