- Router now enables FileInfoCache with 2s TTL for all static handlers
to reduce redundant os.Stat calls.
- FileInfoCache supports negative caching: missing files are cached
with a shorter TTL to avoid repeated stat on non-existent paths.
- Fix missing fileCache lookup for directory index files (index.html).
Previously handleStandard/handleTryFiles skipped fileCache when
serving index files, causing os.ReadFile on every request even
with file_cache configured.
- Extract tryServeFromFileCache() helper to unify cache hit logic
across file and index-file serving paths.
Verified with wrk 200 conn / 20s on static /index.html:
- Throughput: 140k -> 242k req/sec (+73%)
- alloc_space: 2.6 GB -> 4.6 MB (-99.8%)
Add logging.access.sample_rate config (0.0-1.0) for deterministic
request sampling. 5xx errors are always logged; 2xx/3xx/4xx follow
the configured rate. Uses atomic.Uint64 counter for lock-free,
zero-allocation sampling decisions.
Includes test updates to verify:
- sample_rate=1.0 logs all requests
- sample_rate=0.0 logs only 5xx
- 5xx are always logged regardless of rate
Production static file serving now uses FileInfoCache by default
with a 2-second TTL in router.go, dramatically reducing os.Stat
syscalls for missing files and repeated paths.
Changes:
- Add negative cache support to FileInfoCache (caches 'not found' results)
- Introduce statWithCache() helper in StaticHandler for uniform caching
- Make FileInfoCache TTL configurable via SetTTL()
- Default cacheTTL=0 disables caching in NewStaticHandler (tests compat)
- router.go enables fileInfoCache with 2s TTL for all static handlers
Benchmark (repeated 404s):
No cache: ~2651 ns/op, 2225 B/op, 15 allocs/op
With cache: ~1505 ns/op, 1905 B/op, 12 allocs/op
Improvement: -43% latency, -14% allocations
This addresses the dominant allocation source in v0.4.0 profile
(os.statNolog at 74.95% of allocations).
- Add writeAllocsProfile() helper in pprof_impl.go
- Register /allocs route in PprofHandler.ServeHTTP
- Add handleAllocs() method with proper streaming response
- Update index page to list the new allocs profile link
This aligns lolly's pprof endpoints with net/http/pprof and enables
allocation hotspot analysis during performance benchmarking.
- Collect baseline benchmark summary across all core modules
- Save key results to benchmarks/v0.4.0/summary.txt
- Update .gitignore to track benchmark summaries/reports
- Include performance optimization design docs and plan
- Fix handleConnection to use addr parameter for direct upstream map
lookup instead of always selecting the first upstream
- Add Server.Stop() for graceful shutdown with listener closing, UDP
server cleanup, health checker termination, and goroutine joining
- Add shutdownStream() to App and call it in SIGTERM/SIGQUIT/SIGUSR2
signal handlers to prevent goroutine and port leaks on shutdown
- Verify Least Time picks faster target consistently
- Verify Sticky fallback when target becomes unhealthy
- Test cookie encoding and session persistence
- Record headerTime when header is received
- Record lastByteTime when response is complete
- Use correct timing calculations (headerReceived/connectEnd/responseEnd)
- Add least_time and sticky to createBalancerByName
- Implement response time recording for Least Time
- Support StickySession in target selector with request context
- StickySession auto-starts when created
- Add least_time and sticky to valid algorithms list
- Add LeastTimeConfig and StickyConfig structures
- Update default config generation with new options
- Add configuration validation for new fields
- Add sync.Once to prevent double close of stopCh in Stop()
- Add nil fallback guard in NewStickySession (defaults to RoundRobin)
- Add atomic.Bool to make Start() idempotent
- Add tests for double Stop() and nil fallback scenarios
- Fix Select to check if cookie is expired before routing
- Add TestStickySession_ExpiredCookie test
- Expired cookies now trigger fallback + new cookie set
- Encode cookie as base64(target_url + | + timestamp) per spec
- Use cookie value (not targetURL) for shard key and session map keys
- Add missing sticky.Start() calls in tests
- Fix time precision in cookie encode/decode tests
- Add atomic EWMA Stats field to Target
- Implement LeastTime balancer with header_time and last_byte metrics
- Support Select and SelectExcluding with zero-lock design
- Add ResponseTimeRecorder interface for proxy integration
- Zero-lock atomic EWMA implementation using fixed-point arithmetic
- Supports header_time and last_byte_time tracking
- Concurrent-safe with CAS retry loop
- Auto-detect VERSION from git tags with fallback
- Extract mkdir as order-only prerequisite to eliminate duplication
- Add PERF_GCFLAGS/PERF_ASMFLAGS to cross-platform builds and install
- Merge bench-regression into bench-check, unify file naming
- Fix bench scope and sampling consistency (internal/ only, -run=^$)
- Fix test-cover scope to avoid un-tagged integration/e2e code
- Fix deprecated go get -u ./... to go get -u
- Add clean-mod target, clean benchmark artifacts in clean
- Remove phantom build-prod/build-perf from help
- Split docker long line for readability
- Add .PHONY declarations for all targets
Add matcher.ReleaseMatchResult(result) in the base handler to prevent
sync.Pool object leak. Every Match() call acquires from pool but the
caller never returned objects, causing unbounded pool growth.
Replace manual PEM text scanning with pem.Decode(). Returns proper
DER-encoded bytes instead of raw PEM text, fixing potential TLS
handshake failures with certificate chains.
Remove unused findMarker and matchMarker helpers.
- Check(): single GeoIP LookupCountry call, result reused for both
deny and allow checks. Removed goto label for structured flow.
- getClientIP(): single trusted proxy CIDR scan gates both
X-Forwarded-For and X-Real-IP processing.
- Pre-build extSet map for O(1) extension lookup instead of linear scan
- Replace bytes.ToLower allocation in supportsEncoding with
utils.BytesContainsFold for case-insensitive encoding detection
Add typesBytes and typesWildcardPrefix fields to Middleware, built once
at construction. isCompressible now uses pre-converted byte slices
instead of allocating []byte(t) per comparison per request.
Accept []byte directly instead of string, allowing callers to pass
fasthttp's ctx.Path() without string conversion. Internally uses
bytes.HasPrefix instead of strings.HasPrefix in radix tree search.