เบื้องหลังการสร้าง HFT Engine ใน C++ — บทเรียนจาก production

บันทึกการพัฒนา High-Frequency Trading engine จากศูนย์ — ทำไมเลือก C++, latency 12ms ได้ยังไง, risk control, และ pitfall จริงตอน deploy

12 มิถุนายน 2569 · 4 นาที

เบื้องหลังการสร้าง HFT Engine ใน C++ — บทเรียนจาก production

หนึ่งในโปรเจกต์ที่ challenging ที่สุดที่เราทำคือ HFT (High-Frequency Trading) engine ใน C++ — ระบบที่ต้องจัดการ 2.4 ล้าน market ticks/วัน และตอบสนองภายใน 12ms (p99)

บทความนี้แชร์บทเรียนจากการพัฒนาจริง

ทำไมต้อง C++ ไม่ใช่ Python

ตอนเริ่มโปรเจกต์ ลูกค้าถามทันที “Python ทำได้ไหม” คำตอบสั้น: “ทำได้ — แต่ละติจูดไม่ถึง”

ปัญหา latency ของ Python

GIL (Global Interpreter Lock) ทำให้ multithreading ไม่ true parallel
Garbage collector pause ที่ไม่ predictable
Interpreter overhead ~30–100ns ต่อ operation

ในตลาด HFT 1ms = ความแตกต่างระหว่าง profit กับ loss สำหรับโจทย์นี้:

Python: latency 80–200ms (p99)
C++: latency 8–15ms (p99)
Rust: latency 9–16ms (p99) — competitive แต่ ecosystem trading library น้อย

ที่สำคัญ: ไม่ใช่ average latency — tail latency (p99) คือสิ่งที่ฆ่า Python ในงานแบบนี้

สถาปัตยกรรม

[Exchange WebSocket] 
   ↓ market data feed
[Ring Buffer (lock-free, SPSC)]
   ↓
[Strategy Engine]
   ├─ Signal Generator (technical indicators)
   ├─ Risk Manager (position, drawdown, kill switch)
   └─ Order Manager (smart order routing)
   ↓
[FIX/REST API to broker]
   ↓ confirmation
[Database (PostgreSQL) — async batch write]

จุดสำคัญ: hot path ห้ามมี allocation ห้ามมี syscall

เทคนิคที่ทำให้เร็ว

1. Lock-free ring buffer

ใช้ SPSC (single producer, single consumer) ring buffer แทน queue ทั่วไป — ไม่ต้อง mutex ลด context switch

template<typename T, size_t N>
class SPSCRing {
  alignas(64) std::atomic<size_t> head_{0};
  alignas(64) std::atomic<size_t> tail_{0};
  T buf_[N];
public:
  bool push(const T& item);
  bool pop(T& out);
};

alignas(64) สำคัญ — กัน false sharing ระหว่าง CPU cache line

2. Memory pool pre-allocated

ไม่ใช้ new/malloc ใน hot path — สร้าง pool ตอน startup

class OrderPool {
  std::array<Order, 10000> pool_;
  std::atomic<size_t> next_{0};
public:
  Order* acquire() { return &pool_[next_.fetch_add(1) % pool_.size()]; }
};

3. CPU affinity

Pin thread สำคัญลง core เฉพาะ ปิด hyperthreading บน core นั้น

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset);  // bind to core 2
pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);

4. Avoid system calls in hot path

ไม่ใช่ gettimeofday() — ใช้ rdtsc ผ่าน __rdtsc()
ไม่ใช้ printf — log ผ่าน async ring buffer ที่ thread อื่นเขียน file

5. Compile flags

-O3 -march=native -flto -fno-exceptions -fno-rtti

-fno-exceptions ลด overhead try/catch ที่ไม่จำเป็น (เปลี่ยน error handling เป็น result type)

Risk control = สำคัญที่สุด

ความเร็วไม่มีประโยชน์ถ้าระบบ blow up เรามี risk layer ที่ check ทุกคำสั่งก่อนส่ง:

Position limit

ห้ามถือ position เกินจำนวนที่ตั้งไว้
ห้ามถือคู่กันที่ correlated สูง (เช่น BTC + ETH)

Drawdown kill switch

ถ้าขาดทุนเกิน X% ของ equity → ปิดทุก position หยุดเทรด
หยุดเทรด 1 ชม. ถ้า latency กระโดด > 30ms (อาจมีปัญหา network)

Sanity check

ราคา order ห่างจาก mid > 5% → reject (กัน fat-finger)
จำนวน order/วินาที > threshold → throttle

Audit log

ทุก decision เก็บ binary log ที่ replay ได้ — สำคัญสำหรับ debug หลัง market event แปลก ๆ

Backtest pipeline

ก่อน deploy strategy จริง ต้อง backtest ด้วยข้อมูลย้อนหลัง 2 ปีขึ้นไป — และต้องระวัง:

Survivorship bias

ถ้า dataset มีแต่ token ที่ยังอยู่ → ผลดูดี แต่ลืม token ที่ delisted ไป

Look-ahead bias

อย่าใช้ข้อมูลในอนาคต (เช่น close price ของ candle ปัจจุบัน) ในการตัดสินใจ

Realistic slippage

สมมุติว่าได้ราคา mid = ผลดี mock; จริง ๆ ต้องเผื่อ spread + market impact

Transaction cost

รวม fee + funding rate — strategy ที่ profitable ใน backtest แต่ลืม fee = ขาดทุนจริง

Deployment

ระบบรันที่ VPS ใน region เดียวกับ exchange เพื่อ minimize network latency:

Binance → Tokyo
OKX → Hong Kong
Bybit → Singapore

ใช้ bare-metal server ไม่ใช่ shared VPS — predictable performance สำคัญกว่าราคา

Monitoring

Latency histogram (P50, P99, P99.9) ทุก hot operation
Position + PnL real-time dashboard
Alert ทาง Telegram ถ้า:
- PnL drawdown > threshold
- Order rejection rate > 1%
- Latency spike

ข้อผิดพลาดที่เคยเจอ

1. ใช้ `std::string` ใน hot path

String allocation ทำให้ p99 พุ่ง — เปลี่ยนเป็น fixed-size buffer

2. Lock contention ที่เจอตอน scale

ตอน strategy เดียวไม่เห็น พอเปิด 5 strategy พร้อมกัน lock ที่ shared state ทำให้ latency พุ่ง 3x — solve ด้วย sharding state per-strategy

3. Hidden allocation ใน `std::function`

Lambda ที่ capture variable ใหญ่ → heap allocate → GC-like pause — เปลี่ยนเป็น function pointer + context struct

4. ไม่เผื่อ market data gap

WebSocket disconnect 100ms = miss ticks 5-6 รอบ — ต้องมี reconnect logic + state recovery

5. Clock skew

ใช้ NTP sync แต่ drift 5ms ต่อชั่วโมง — switch ไปใช้ PTP (Precision Time Protocol) สำหรับ infrastructure

ผลลัพธ์

หลัง deploy production:

Latency p99: 12ms (target 15ms)
Throughput: 2.4M ticks/วัน (4000+ orders/วัน)
Risk per trade: 0.8%
Sharpe ratio: 2.1 (3 เดือนแรก)

ระบบรันต่อเนื่อง 8 เดือนโดยไม่มี downtime นอกจาก scheduled maintenance

บทเรียนที่ได้

Premature optimization is the root of all evil — แต่ใน HFT คือ requirement
Risk management สำคัญกว่า strategy — strategy ที่ profitable แต่ blow up ตาย, mediocre strategy ที่ปลอดภัยอยู่นาน
Test ด้วย realistic data — synthetic data ไม่สะท้อน market microstructure
Monitor everything — latency, error rate, queue depth, system call counts
Have a kill switch — และทดสอบให้แน่ใจว่าใช้ได้จริง

ทำไมเล่าเรื่องนี้

ไม่ใช่เพื่อขายงานเขียน HFT — แต่เพื่อแสดงว่า QooCor ทำได้ทั้งงานง่ายและงานยาก

ถ้าธุรกิจคุณมีโจทย์ที่ต้องการ performance ระดับ extreme (real-time, low-latency, high-throughput) — เราคุยได้ ปรึกษาฟรีที่ /#contact หรือดูเคส real-time ฝั่งฮาร์ดแวร์ใน IoT Smart Farm

ดู portfolio Auto Trade C++ ใน /#work

/ บทความที่เกี่ยวข้อง

#Process#Onboarding#Behind the Scenes