651 Commits

Author SHA1 Message Date
xfy
445401c40f perf(accesslog): add sample_rate for access log to reduce CPU and allocations
Add configurable access log sampling via :
- 0.0-1.0 range; defaults to 1.0 (record all) for backward compatibility
- Uses lock-free atomic counter for deterministic sampling
- Non-2xx responses always logged regardless of sample rate

Benchmark results (combined format, /dev/null):
  Full logging:    ~2245 ns/op, 1987 B/op, 17 allocs/op
  10% sampling:    ~1593 ns/op, 1633 B/op,  6 allocs/op
  Improvement:     -29% latency, -65% allocations/op

This addresses the top application-layer CPU hotspot identified
in the v0.4.0 profile (LogAccess at 16.36% cumulative CPU).
2026-06-11 13:53:41 +08:00
xfy
88bb7bf267 docs(benchmark): add v0.4.0 performance analysis report
Key findings from CPU/allocs/heap profiling:
- LogAccess consumes 16.36% cumulative CPU (top app-layer hotspot)
- os.statNolog dominates 74.95% of allocations (static file path checks)
- net.IP.String + net.JoinHostPort account for 9.34% allocations
- bufio.NewReader/Writer hold 54.6% of heap memory

Includes detailed optimization priorities and next steps.
2026-06-11 13:49:57 +08:00
xfy
58e095a35b feat(pprof): add /debug/pprof/allocs endpoint for allocation profiling
- Add writeAllocsProfile() helper in pprof_impl.go
- Register /allocs route in PprofHandler.ServeHTTP
- Add handleAllocs() method with proper streaming response
- Update index page to list the new allocs profile link

This aligns lolly's pprof endpoints with net/http/pprof and enables
allocation hotspot analysis during performance benchmarking.
2026-06-11 13:47:47 +08:00
xfy
ebeb258c58 docs(benchmark): add v0.4.0 baseline summary and update gitignore
- Collect baseline benchmark summary across all core modules
- Save key results to benchmarks/v0.4.0/summary.txt
- Update .gitignore to track benchmark summaries/reports
- Include performance optimization design docs and plan
2026-06-11 13:43:28 +08:00
xfy
bc57e5b656 chore(benchmark): establish benchmark directory structure 2026-06-10 14:11:19 +08:00
xfy
afbbc3a951 chore: release v0.4.1 v0.4.1 2026-06-10 13:54:26 +08:00
xfy
66ea93e3c1 fix(stream): reset stopCh after Stop for restartability 2026-06-10 13:48:07 +08:00
xfy
7204432ca0 fix(stream): correct upstream selection and add graceful shutdown
- Fix handleConnection to use addr parameter for direct upstream map
  lookup instead of always selecting the first upstream
- Add Server.Stop() for graceful shutdown with listener closing, UDP
  server cleanup, health checker termination, and goroutine joining
- Add shutdownStream() to App and call it in SIGTERM/SIGQUIT/SIGUSR2
  signal handlers to prevent goroutine and port leaks on shutdown
2026-06-10 13:45:35 +08:00
xfy
f12ffd180f chore: release v0.4.0
- Update CHANGELOG.md for v0.4.0
- Update Makefile FALLBACK_VERSION to 0.4.0
- Fix lint warnings (godoc comments, goconst)
- Clean up code formatting
v0.4.0
2026-06-09 15:59:36 +08:00
xfy
503daf65d3 perf(loadbalance): add benchmarks for Least Time and Sticky
- Benchmark Select and Record operations
- Concurrent benchmark for realistic load testing
- Baseline performance:
  - LeastTime.Select: ~33ns/op, 0 allocs
  - LeastTime.Record: ~5.6ns/op, 0 allocs
  - StickySession.Select: ~205ns/op (with cookie lookup)
2026-06-08 18:21:03 +08:00
xfy
ef871f1d39 test(loadbalance): add integration tests for Least Time and Sticky
- Verify Least Time picks faster target consistently
- Verify Sticky fallback when target becomes unhealthy
- Test cookie encoding and session persistence
2026-06-08 18:19:20 +08:00
xfy
e5885ce888 fix(proxy): correct response time recording for Least Time
- Record headerTime when header is received
- Record lastByteTime when response is complete
- Use correct timing calculations (headerReceived/connectEnd/responseEnd)
2026-06-08 18:17:08 +08:00
xfy
72f189bba8 feat(proxy): integrate Least Time and Sticky balancers
- Add least_time and sticky to createBalancerByName
- Implement response time recording for Least Time
- Support StickySession in target selector with request context
- StickySession auto-starts when created
2026-06-08 18:11:47 +08:00
xfy
3b6b70a491 fix(config): validate least_time default_time is not negative 2026-06-08 18:03:52 +08:00
xfy
cb1f86298e fix: add missing test coverage for Task 4 config integration
- Add validation tests for least_time and sticky configs
- Add algorithm tests for least_time and sticky
- Add SameSite validation in validateProxy
2026-06-08 18:01:21 +08:00
xfy
88a2c1fc1b feat(config): add Least Time and Sticky configuration support
- Add least_time and sticky to valid algorithms list
- Add LeastTimeConfig and StickyConfig structures
- Update default config generation with new options
- Add configuration validation for new fields
2026-06-08 17:57:06 +08:00
xfy
a73da4e14a fix(sticky): recreate stopCh on Start to support restart 2026-06-08 17:52:20 +08:00
xfy
0a5443f6cf fix(sticky): guard against double Stop, nil fallback, and multiple Start calls
- Add sync.Once to prevent double close of stopCh in Stop()
- Add nil fallback guard in NewStickySession (defaults to RoundRobin)
- Add atomic.Bool to make Start() idempotent
- Add tests for double Stop() and nil fallback scenarios
2026-06-08 17:47:37 +08:00
xfy
360fd0da9d fix(sticky): check cookie expiration in Select method
- Fix Select to check if cookie is expired before routing
- Add TestStickySession_ExpiredCookie test
- Expired cookies now trigger fallback + new cookie set
2026-06-08 17:40:54 +08:00
xfy
66752a47f0 fix(sticky): fix cookie format, shard keying, and tests
- Encode cookie as base64(target_url + | + timestamp) per spec
- Use cookie value (not targetURL) for shard key and session map keys
- Add missing sticky.Start() calls in tests
- Fix time precision in cookie encode/decode tests
2026-06-08 17:36:41 +08:00
xfy
f69a11ea05 feat(loadbalance): implement Session Sticky balancer
- Add 256-shard lock map for concurrent session routing
- Cookie-based session persistence with base64 encoding
- TTL expiration with background cleanup goroutine
- Support Secure, HttpOnly, SameSite cookie attributes
- Fallback to configured balancer when session target unavailable
2026-06-08 17:30:06 +08:00
xfy
fa95b2a76e feat(loadbalance): implement Least Time balancer
- Add atomic EWMA Stats field to Target
- Implement LeastTime balancer with header_time and last_byte metrics
- Support Select and SelectExcluding with zero-lock design
- Add ResponseTimeRecorder interface for proxy integration
2026-06-08 17:21:20 +08:00
xfy
c6bb75cffe feat(loadbalance): add atomic EWMA statistics core
- Zero-lock atomic EWMA implementation using fixed-point arithmetic
- Supports header_time and last_byte_time tracking
- Concurrent-safe with CAS retry loop
2026-06-08 17:13:08 +08:00
xfy
a04dadbe16 feat(examples): add FreeBSD deployment examples 2026-06-05 17:21:26 +08:00
xfy
c847f6036d chore: release v0.3.0 v0.3.0 2026-06-05 14:24:39 +08:00
xfy
85ae7747b8 fix(integration): remove calls to removed proxy.Start/Stop methods 2026-06-05 14:24:34 +08:00
xfy
989a572467 docs(skills): add release workflow skill 2026-06-05 14:02:20 +08:00
xfy
93c0c151d0 fix(lua): wait for SchedulerLoop exit before closing LState; lock cleanupResources 2026-06-05 13:48:04 +08:00
xfy
4789265ca8 fix: add synchronization for concurrent access in server/app/http3/stream 2026-06-05 12:31:41 +08:00
xfy
5e3196c37e fix: resolve race conditions in handler sendfile and lua cosocket tests 2026-06-05 12:31:39 +08:00
xfy
f73a761632 fix(server): protect accessLogMiddleware and accessControl from concurrent writes 2026-06-05 11:49:19 +08:00
xfy
76257a7859 fix(lua): add schedulerMu to protect scheduler LState and callback queue 2026-06-05 11:38:52 +08:00
xfy
2be04f3fb9 fix(lua): add mutex protection for TCPSocket.currentOp in async methods 2026-06-05 11:35:20 +08:00
xfy
170e0f1942 feat(Makefile): add freebsd and openbsd build targets 2026-06-05 10:17:58 +08:00
xfy
0db14c239c chore(Makefile): optimize targets and fix inconsistencies
- Auto-detect VERSION from git tags with fallback
- Extract mkdir as order-only prerequisite to eliminate duplication
- Add PERF_GCFLAGS/PERF_ASMFLAGS to cross-platform builds and install
- Merge bench-regression into bench-check, unify file naming
- Fix bench scope and sampling consistency (internal/ only, -run=^$)
- Fix test-cover scope to avoid un-tagged integration/e2e code
- Fix deprecated go get -u ./... to go get -u
- Add clean-mod target, clean benchmark artifacts in clean
- Remove phantom build-prod/build-perf from help
- Split docker long line for readability
- Add .PHONY declarations for all targets
2026-06-05 10:15:21 +08:00
xfy
d82afa3233 Update readme 2026-06-04 13:29:45 +08:00
xfy
8757f0d5cb chore: add docs/plans/ to .gitignore 2026-06-04 11:31:44 +08:00
xfy
31faf77fcc style: add doc comments for exported hash and utils functions
Fix revive lint warnings for FNV64a, FNV64aBytes, BytesContainsFold.
2026-06-04 11:17:08 +08:00
xfy
2be6b67d0b fix(server): release MatchResult back to pool after use
Add matcher.ReleaseMatchResult(result) in the base handler to prevent
sync.Pool object leak. Every Match() call acquires from pool but the
caller never returned objects, causing unbounded pool growth.
2026-06-04 11:14:32 +08:00
xfy
10f16bfda9 test(ssl): update extractPEMBlock tests for DER output
Verify returned bytes are parseable by x509.ParseCertificate instead
of checking raw PEM text markers.
2026-06-04 11:14:23 +08:00
xfy
434ac0b114 fix(ssl): use encoding/pem for DER extraction in extractPEMBlock
Replace manual PEM text scanning with pem.Decode(). Returns proper
DER-encoded bytes instead of raw PEM text, fixing potential TLS
handshake failures with certificate chains.

Remove unused findMarker and matchMarker helpers.
2026-06-04 11:14:13 +08:00
xfy
197d0d2344 perf(security): reduce GeoIP lookups and deduplicate trusted proxy check
- Check(): single GeoIP LookupCountry call, result reused for both
  deny and allow checks. Removed goto label for structured flow.
- getClientIP(): single trusted proxy CIDR scan gates both
  X-Forwarded-For and X-Real-IP processing.
2026-06-04 11:09:29 +08:00
xfy
e535b9062c perf(gzip_static): pre-build extension set, use BytesContainsFold
- Pre-build extSet map for O(1) extension lookup instead of linear scan
- Replace bytes.ToLower allocation in supportsEncoding with
  utils.BytesContainsFold for case-insensitive encoding detection
2026-06-04 11:09:20 +08:00
xfy
e5fa9fe9de perf(compression): pre-compute MIME type byte slices for isCompressible
Add typesBytes and typesWildcardPrefix fields to Middleware, built once
at construction. isCompressible now uses pre-converted byte slices
instead of allocating []byte(t) per comparison per request.
2026-06-04 11:09:08 +08:00
xfy
bd97c05d0d test(matcher): update test callers for []byte Match/FindLongestPrefix 2026-06-04 11:06:09 +08:00
xfy
1eeab88c98 perf(server): pass ctx.Path() directly to Match, eliminate string alloc
Removes the string(ctx.Path()) conversion that caused one heap
allocation per request in the routing hot path.
2026-06-04 11:06:00 +08:00
xfy
aef0d8357b perf(matcher): change Match/FindLongestPrefix to accept []byte
Accept []byte directly instead of string, allowing callers to pass
fasthttp's ctx.Path() without string conversion. Internally uses
bytes.HasPrefix instead of strings.HasPrefix in radix tree search.
2026-06-04 11:05:49 +08:00
xfy
0a53622351 perf(proxy): pre-build cacheIgnoreSet, single-pass cache key, pool UpstreamTiming
Three optimizations in the proxy cache hot path:

- Pre-build cacheIgnoreSet map once at Proxy creation instead of
  per-response. Eliminates map allocation + linear scan per cached
  response.

- Compute cache key once per request via computeCacheKey() closure.
  Previously buildCacheKeyHash was called up to 5 times per request;
  now computed on first access and reused.

- Pool UpstreamTiming objects with sync.Pool. Eliminates one heap
  allocation per proxied request.
2026-06-04 10:57:38 +08:00
xfy
02775de641 perf(proxy): eliminate string allocations in isWebSocketRequest
Replace string(connection)/strings.EqualFold/strings.ToLower with
bytes.EqualFold and utils.BytesContainsFold. Removes 2-4 heap
allocations per proxied request.
2026-06-04 10:48:57 +08:00
xfy
613c5f8ff0 perf(utils): add BytesContainsFold for zero-allocation case-insensitive search
Reports whether a byte slice contains a subslice, case-insensitively,
without allocating (unlike bytes.Contains(bytes.ToLower(b), sub)).
2026-06-04 10:48:51 +08:00