From ebeb258c58b7e6bb50c13461233d426d11a519f9 Mon Sep 17 00:00:00 2001
From: xfy <i@rua.plus>
Date: Thu, 11 Jun 2026 13:43:28 +0800
Subject: [PATCH] docs(benchmark): add v0.4.0 baseline summary and update
 gitignore

- Collect baseline benchmark summary across all core modules
- Save key results to benchmarks/v0.4.0/summary.txt
- Update .gitignore to track benchmark summaries/reports
- Include performance optimization design docs and plan
---
 .gitignore                                    |   10 +-
 benchmarks/v0.4.0/summary.txt                 |  335 ++++
 .../2026-06-03-eliminate-code-redundancy.md   |  929 ++++++++++
 .../plans/2026-06-03-redundancy-removal.md    |  791 ++++++++
 .../2026-06-04-performance-optimization.md    |  820 +++++++++
 .../2026-06-08-loadbalance-enhancement.md     | 1620 +++++++++++++++++
 ...026-06-10-performance-optimization-plan.md | 1235 +++++++++++++
 ...-06-03-eliminate-code-redundancy-design.md |  213 +++
 ...26-06-08-loadbalance-enhancement-design.md |  389 ++++
 ...6-06-10-performance-optimization-design.md |  261 +++
 10 files changed, 6601 insertions(+), 2 deletions(-)
 create mode 100644 benchmarks/v0.4.0/summary.txt
 create mode 100644 docs/superpowers/plans/2026-06-03-eliminate-code-redundancy.md
 create mode 100644 docs/superpowers/plans/2026-06-03-redundancy-removal.md
 create mode 100644 docs/superpowers/plans/2026-06-04-performance-optimization.md
 create mode 100644 docs/superpowers/plans/2026-06-08-loadbalance-enhancement.md
 create mode 100644 docs/superpowers/plans/2026-06-10-performance-optimization-plan.md
 create mode 100644 docs/superpowers/specs/2026-06-03-eliminate-code-redundancy-design.md
 create mode 100644 docs/superpowers/specs/2026-06-08-loadbalance-enhancement-design.md
 create mode 100644 docs/superpowers/specs/2026-06-10-performance-optimization-design.md

diff --git a/.gitignore b/.gitignore
index 2c620dc..c2be9d4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -51,9 +51,12 @@ logs/
 tmp/
 temp/
 
-# Benchmark results
+# Benchmark results: keep structure but ignore large raw data
 benchmarks/*/
 !benchmarks/.gitkeep
+benchmarks/**/*.txt
+!benchmarks/*/summary.txt
+!benchmarks/*/REPORT.md
 
 # oh-my-claudecode state directory
 .omc/
@@ -77,5 +80,8 @@ main
 .crush
 
 # Planning and specification documents (agent-generated)
-docs/superpowers/
+# Keep generated specs/plans checked in for traceability
+docs/superpowers/*/
+!docs/superpowers/specs/
+!docs/superpowers/plans/
 docs/plans/
diff --git a/benchmarks/v0.4.0/summary.txt b/benchmarks/v0.4.0/summary.txt
new file mode 100644
index 0000000..474d9b2
--- /dev/null
+++ b/benchmarks/v0.4.0/summary.txt
@@ -0,0 +1,335 @@
+=== cache.txt ===
+BenchmarkFileCacheGet/Size100-10                         	21736779	        52.62 ns/op	      16 B/op	       1 allocs/op
+BenchmarkFileCacheGet/Size1000-10                        	22091924	        50.41 ns/op	      16 B/op	       1 allocs/op
+BenchmarkFileCacheGet/Size10000-10                       	24389118	        44.63 ns/op	      21 B/op	       1 allocs/op
+BenchmarkFileCacheSet/Size100-10                         	 2482664	       513.3 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet/Size1000-10                        	 2469159	       482.6 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet/Size10000-10                       	 2264976	       713.2 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size100-10                  	 2663748	       449.6 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size1000-10                 	 1215387	       895.9 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size10000-10                	 1000000	      3025 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSetNoEviction-10                       	 2499050	       912.4 ns/op	     112 B/op	       5 allocs/op
+BenchmarkFileCacheConcurrent/Size100-10                  	 8061500	       139.6 ns/op	      26 B/op	       1 allocs/op
+BenchmarkFileCacheConcurrent/Size1000-10                 	 5312031	       222.8 ns/op	      31 B/op	       2 allocs/op
+BenchmarkFileCacheConcurrent/Size10000-10                	 5357617	       227.2 ns/op	      31 B/op	       2 allocs/op
+BenchmarkFileCacheGetOnly-10                             	15445342	        78.82 ns/op	      29 B/op	       1 allocs/op
+BenchmarkFileCacheSizeEviction-10                        	 1275866	       942.1 ns/op	    1121 B/op	       5 allocs/op
+BenchmarkFileCacheLRUTouch-10                            	10095694	       115.6 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyCacheGet-10                                	29701370	       125.0 ns/op	      13 B/op	       1 allocs/op
+BenchmarkProxyCacheSet-10                                	  717621	      1754 ns/op	     251 B/op	       3 allocs/op
+BenchmarkProxyCacheConcurrent-10                         	 4811952	       276.0 ns/op	      69 B/op	       2 allocs/op
+BenchmarkFileCacheSetAllocation_New-10                   	 2149519	       522.3 ns/op	      97 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_Update-10                	 3871534	       348.9 ns/op	      45 B/op	       2 allocs/op
+BenchmarkFileCacheSetAllocation_Eviction-10              	 2248743	       552.0 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_EvictionWithPool-10      	 2310462	       515.7 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_MemoryLimit-10           	 2186145	       563.2 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_Concurrent-10            	 1934901	       654.7 ns/op	      88 B/op	       3 allocs/op
+BenchmarkFileCacheSetAllocation_ConcurrentEviction-10    	 2139834	       609.0 ns/op	      96 B/op	       3 allocs/op
+BenchmarkFileCacheEntryPool_GetPut-10                    	85020030	        12.46 ns/op	       0 B/op	       0 allocs/op
+BenchmarkFileCacheLRUList_PushFront-10                   	 6249896	       206.8 ns/op	     232 B/op	       4 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/cache	45.363s
+
+=== handler.txt ===
+BenchmarkGenerateAutoIndex_HTML-10       	    3960	    267700 ns/op	   87857 B/op	     836 allocs/op
+BenchmarkStaticFileLookup-10             	   90567	     13565 ns/op	    5050 B/op	      33 allocs/op
+BenchmarkStaticFileCacheHit-10           	   90382	     14694 ns/op	    5109 B/op	      35 allocs/op
+BenchmarkStaticFileCacheMiss_1KB-10      	   89732	     13796 ns/op	    5042 B/op	      33 allocs/op
+BenchmarkStaticFileCacheMiss_10KB-10     	   63474	     19300 ns/op	   23990 B/op	      33 allocs/op
+BenchmarkStaticTryFiles-10               	   70876	     16249 ns/op	    4970 B/op	      51 allocs/op
+BenchmarkStaticIndex-10                  	   93171	     12437 ns/op	    3577 B/op	      33 allocs/op
+BenchmarkStaticNestedFile-10             	   84880	     14478 ns/op	   13679 B/op	      33 allocs/op
+BenchmarkStaticFileNotFound-10           	  492165	      2527 ns/op	    2225 B/op	      15 allocs/op
+BenchmarkStaticWithCacheParallel-10      	   56725	     19007 ns/op	   11592 B/op	      34 allocs/op
+BenchmarkStaticFileLookupWithAlias-10    	  101319	     11805 ns/op	    5090 B/op	      34 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/handler	13.431s
+
+=== http.txt ===
+BenchmarkAdapterConversion-10                          	 1512100	       824.2 ns/op	     256 B/op	      10 allocs/op
+BenchmarkAdapterWithBody-10                            	  226002	      5080 ns/op	    6928 B/op	      30 allocs/op
+BenchmarkServerCreation-10                             	 4511774	       265.1 ns/op	     416 B/op	       5 allocs/op
+BenchmarkHTTP2ServerStart-10                           	 4156614	       245.0 ns/op	     416 B/op	       5 allocs/op
+BenchmarkHTTP2FrameEncoding/SettingsFrame-10           	55689206	        18.42 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/DataFrame-10               	33784801	        37.93 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/DataFrame_Small-10         	65797820	        17.68 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/DataFrame_Large-10         	 2523252	       492.1 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/PingFrame-10               	94094857	        13.64 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/RSTStreamFrame-10          	100000000	        11.41 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/WindowUpdateFrame-10       	100000000	        12.14 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2FrameEncoding/GoAwayFrame-10             	66041313	        18.02 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2HeadersEncoding/CommonHeaders-10         	 2735127	       438.0 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2HeadersEncoding/CommonHeaders_Parallel-10         	  786241	      1505 ns/op	    1992 B/op	      28 allocs/op
+BenchmarkHTTP2HeadersEncoding/AuthHeaders-10                    	 1694182	       711.2 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2HeadersEncoding/BodyHeaders-10                    	 3570954	       389.7 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2HeadersEncoding/RepeatedHeaders-10                	 2209392	       514.4 ns/op	       0 B/op	       0 allocs/op
+BenchmarkHTTP2StreamCreate-10                                   	  221667	      4558 ns/op	    6731 B/op	      29 allocs/op
+BenchmarkHTTP2ConcurrentStreams-10                              	  328290	      4151 ns/op	    6742 B/op	      31 allocs/op
+BenchmarkHTTP2RequestRoundTrip-10                               	  638324	      1630 ns/op	     343 B/op	      12 allocs/op
+BenchmarkHTTP2RequestRoundTrip_WithBody-10                      	  232758	      4992 ns/op	    7391 B/op	      33 allocs/op
+BenchmarkHTTP2RequestRoundTrip_WithBody_Parallel-10             	  232129	      4407 ns/op	    7288 B/op	      32 allocs/op
+BenchmarkHTTP2AdapterWithHPACKHeaders-10                        	  245524	      4732 ns/op	    6780 B/op	      31 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/http2	27.192s
+BenchmarkAdapterWrap-10                        	 2033492	       565.5 ns/op	     520 B/op	       7 allocs/op
+BenchmarkAdapterConvertRequest-10              	 1272426	       937.9 ns/op	     164 B/op	       6 allocs/op
+BenchmarkAdapterConvertRequestBody_1KB-10      	  200964	      5239 ns/op	    3338 B/op	      13 allocs/op
+BenchmarkAdapterConvertRequestBody_10KB-10     	   78788	     13594 ns/op	   34841 B/op	      20 allocs/op
+BenchmarkAdapterConvertRequestBody_100KB-10    	   34011	     34759 ns/op	  213538 B/op	      10 allocs/op
+
+=== loadbalance.txt ===
+BenchmarkRoundRobinSelect/3targets-10        	75176475	        15.67 ns/op	       0 B/op	       0 allocs/op
+BenchmarkRoundRobinSelect/50targets-10       	46881991	        25.77 ns/op	       0 B/op	       0 allocs/op
+BenchmarkRoundRobinSelect/200targets-10      	13730298	        89.31 ns/op	       0 B/op	       0 allocs/op
+BenchmarkWeightedRoundRobin/3targets_equal-10         	74647123	        15.59 ns/op	       0 B/op	       0 allocs/op
+BenchmarkWeightedRoundRobin/3targets_weighted-10      	68335051	        15.72 ns/op	       0 B/op	       0 allocs/op
+BenchmarkWeightedRoundRobin/50targets_equal-10        	35494826	        32.86 ns/op	       0 B/op	       0 allocs/op
+BenchmarkWeightedRoundRobin/50targets_weighted-10     	33776556	        34.54 ns/op	       0 B/op	       0 allocs/op
+BenchmarkWeightedRoundRobin/200targets_equal-10       	10033557	       118.6 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelect/10targets_50vnodes-10   	37505451	        27.27 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelect/10targets_150vnodes-10  	44527291	        26.94 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelect/10targets_200vnodes-10  	46628412	        26.60 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelect/50targets_150vnodes-10  	43033684	        26.59 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelect/100targets_150vnodes-10 	46417550	        26.51 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashRebuild/10targets_150vnodes-10 	    8913	    119725 ns/op	  114009 B/op	      35 allocs/op
+BenchmarkConsistentHashRebuild/50targets_150vnodes-10 	    1285	    905655 ns/op	  828420 B/op	     108 allocs/op
+BenchmarkConsistentHashRebuild/100targets_150vnodes-10         	     606	   1945285 ns/op	 1623333 B/op	     210 allocs/op
+BenchmarkConsistentHashSelectExcluding/50targets_150vnodes_exclude5-10         	 1000000	      1091 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelectExcluding/50targets_150vnodes_exclude10-10        	 1000000	      1174 ns/op	       0 B/op	       0 allocs/op
+BenchmarkConsistentHashSelectExcluding/100targets_150vnodes_exclude5-10        	  596529	      2061 ns/op	       0 B/op	       0 allocs/op
+BenchmarkLeastConnSelect/3targets-10                                           	1000000000	         0.3424 ns/op	       0 B/op	       0 allocs/op
+BenchmarkLeastConnSelect/50targets-10                                          	245764088	         4.778 ns/op	       0 B/op	       0 allocs/op
+BenchmarkLeastConnSelect/200targets-10                                         	64952187	        17.68 ns/op	       0 B/op	       0 allocs/op
+BenchmarkIPHashSelect/3targets-10                                              	253943542	         4.684 ns/op	       0 B/op	       0 allocs/op
+BenchmarkIPHashSelect/50targets-10                                             	48979803	        24.41 ns/op	       0 B/op	       0 allocs/op
+BenchmarkIPHashSelect/200targets-10                                            	12602810	        87.73 ns/op	       0 B/op	       0 allocs/op
+BenchmarkAllBalancers/RoundRobin-10                                            	 6389318	       187.1 ns/op	       0 B/op	       0 allocs/op
+BenchmarkAllBalancers/WeightedRoundRobin-10                                    	 5199241	       234.9 ns/op	       0 B/op	       0 allocs/op
+BenchmarkAllBalancers/LeastConnections-10                                      	35844194	        31.77 ns/op	       0 B/op	       0 allocs/op
+BenchmarkAllBalancers/IPHash-10                                                	 6075333	       190.8 ns/op	       0 B/op	       0 allocs/op
+BenchmarkAllBalancers/ConsistentHash-10                                        	41145982	        28.54 ns/op	       0 B/op	       0 allocs/op
+
+=== logging.txt ===
+
+=== lua.txt ===
+BenchmarkCoroutineCreation-10                	 1080924	      1199 ns/op	     272 B/op	       4 allocs/op
+BenchmarkLuaContextPool-10                   	13166972	        82.07 ns/op	       0 B/op	       0 allocs/op
+BenchmarkBytecodeCompilation-10              	 1000000	      1060 ns/op	     360 B/op	       5 allocs/op
+BenchmarkSharedDictSetGet-10                 	21471429	        52.44 ns/op	       0 B/op	       0 allocs/op
+BenchmarkTimerCallbackThroughput-10          	  450582	      2337 ns/op	     509 B/op	       6 allocs/op
+BenchmarkTimerCallbackWithLuaExecution-10    	   20617	     56030 ns/op	   53561 B/op	     120 allocs/op
+BenchmarkUpvalueDetection-10                 	   30464	     36669 ns/op	   54112 B/op	     149 allocs/op
+BenchmarkTimerGracefulShutdown-10            	     148	   7389734 ns/op	12962100 B/op	   47107 allocs/op
+BenchmarkLuaContextPoolReuse-10              	24460232	        56.96 ns/op	       0 B/op	       0 allocs/op
+BenchmarkLuaCoroutinePoolThroughput-10       	 2039503	       520.7 ns/op	     272 B/op	       4 allocs/op
+BenchmarkLuaTablePool/NewTable_NoPool-10     	  864272	      2798 ns/op	    3368 B/op	      16 allocs/op
+BenchmarkLuaTablePool/SharedDict_AsPool-10   	 3215400	       403.8 ns/op	     128 B/op	       3 allocs/op
+BenchmarkLuaMiddlewareOverhead-10            	   10000	    121057 ns/op	   84627 B/op	     351 allocs/op
+BenchmarkLuaMiddlewareMultiPhase-10          	    6144	    256399 ns/op	  167706 B/op	     700 allocs/op
+BenchmarkLuaMiddlewareNgxExit-10             	   10000	    135602 ns/op	   86886 B/op	     393 allocs/op
+BenchmarkCosocket_Connect-10                 	    1041	   1095333 ns/op	    6442 B/op	      43 allocs/op
+BenchmarkCosocket_SendReceive-10             	   24416	     49961 ns/op	    1040 B/op	       2 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/lua	23.109s
+
+=== matcher.txt ===
+BenchmarkRadixTreeFindLongestPrefix-10            	19755723	        60.87 ns/op	       0 B/op	       0 allocs/op
+BenchmarkRadixTreeFindLongestPrefixParallel-10    	122318263	        10.27 ns/op	       0 B/op	       0 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/matcher	3.460s
+
+=== middleware.txt ===
+PASS
+ok  	rua.plus/lolly/internal/middleware	0.005s
+BenchmarkAccessLogProcess-10            	  458827	      2197 ns/op	    1987 B/op	      17 allocs/op
+BenchmarkAccessLogProcessParallel-10    	  365294	      3255 ns/op	    1959 B/op	      16 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/middleware/accesslog	2.244s
+BenchmarkBodyLimitProcess-10         	 1000000	      1210 ns/op	    1768 B/op	      11 allocs/op
+BenchmarkBodyLimitGetLimit-10        	17057452	        77.30 ns/op	       0 B/op	       0 allocs/op
+BenchmarkBodyLimitPathMatching-10    	 7554831	       162.0 ns/op	       0 B/op	       0 allocs/op
+BenchmarkParseSize-10                	29615168	        40.98 ns/op	       0 B/op	       0 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/middleware/bodylimit	4.980s
+BenchmarkGzipCompress_1KB-10                   	   55242	     20921 ns/op	     900 B/op	       4 allocs/op
+BenchmarkGzipCompress_10KB-10                  	   41601	     28889 ns/op	     906 B/op	       4 allocs/op
+BenchmarkGzipCompress_100KB-10                 	   10000	    119901 ns/op	    2012 B/op	       5 allocs/op
+BenchmarkBrotliCompress_1KB-10                 	   33718	     35480 ns/op	     403 B/op	       2 allocs/op
+BenchmarkBrotliCompress_10KB-10                	   25119	     46113 ns/op	     433 B/op	       2 allocs/op
+BenchmarkCompressionPool-10                    	   50222	     21297 ns/op	     901 B/op	       4 allocs/op
+BenchmarkCompressionMiddleware-10              	   35152	     33261 ns/op	   12016 B/op	      17 allocs/op
+BenchmarkCompressionMiddlewareNoCompress-10    	  421274	      3153 ns/op	   10324 B/op	       6 allocs/op
+BenchmarkIsCompressible-10                     	19387118	        54.13 ns/op	       0 B/op	       0 allocs/op
+BenchmarkCompressionLevelComparison/Level1-10  	   57207	     21935 ns/op	     894 B/op	       4 allocs/op
+BenchmarkCompressionLevelComparison/Level6-10  	   35198	     32387 ns/op	     911 B/op	       4 allocs/op
+BenchmarkCompressionLevelComparison/Level9-10  	   16784	     72023 ns/op	     948 B/op	       4 allocs/op
+BenchmarkCompressionMiddlewareParallel-10      	  170314	      6881 ns/op	   12700 B/op	      17 allocs/op
+BenchmarkGzipPool_GetPut-10                    	  118927	     10414 ns/op	      22 B/op	       1 allocs/op
+BenchmarkGzipWriter_New-10                     	    3289	    504494 ns/op	  814744 B/op	      21 allocs/op
+BenchmarkGzipWriter_Pool-10                    	   59080	     20177 ns/op	     898 B/op	       4 allocs/op
+BenchmarkCompressionMiddleware_Pool-10         	   31626	     39955 ns/op	   14310 B/op	      18 allocs/op
+BenchmarkGzipCompress_Sizes/100B-10            	  141144	      7318 ns/op	     247 B/op	       3 allocs/op
+
+=== proxy.txt ===
+BenchmarkCacheKeyHashValue_ZeroAlloc-10       	11774799	        85.10 ns/op	       0 B/op	       0 allocs/op
+BenchmarkCacheKeyHash_WithAlloc-10            	 5119413	       285.8 ns/op	      48 B/op	       1 allocs/op
+BenchmarkCacheKeyHash_Compare/ZeroAlloc-10    	12423754	        92.28 ns/op	       0 B/op	       0 allocs/op
+BenchmarkCacheKeyHash_Compare/WithAlloc-10    	 6716290	       171.4 ns/op	      32 B/op	       1 allocs/op
+BenchmarkConnectionPool_Normal-10             	       1	3100765506 ns/op	   10000 B/op	      96 allocs/op
+BenchmarkConnectionPool_HighConcurrency-10    	       2	1550554418 ns/op	   11152 B/op	      86 allocs/op
+BenchmarkConnectionPool_SmallBody-10          	       1	3000232015 ns/op	   71792 B/op	      81 allocs/op
+BenchmarkConnectionPool_LargeBody-10          	       2	1948867432 ns/op	    9616 B/op	      76 allocs/op
+BenchmarkConnectionPool_MultiTarget-10        	       1	1200480850 ns/op	   85392 B/op	     158 allocs/op
+BenchmarkHostClient_AcquireRelease-10         	       1	3000973314 ns/op	    8944 B/op	      61 allocs/op
+BenchmarkProxyForward/concurrency1-10         	       2	1500263114 ns/op	   41692 B/op	      85 allocs/op
+BenchmarkProxyForward/concurrency10-10        	       2	1500346107 ns/op	   11280 B/op	      82 allocs/op
+BenchmarkProxyForward/concurrency100-10       	       2	1500509108 ns/op	   41660 B/op	      85 allocs/op
+BenchmarkProxyForwardSmallRequest-10          	       2	1500492839 ns/op	   11344 B/op	      82 allocs/op
+BenchmarkProxyForwardLargeRequest-10          	       2	1500835596 ns/op	   46780 B/op	      97 allocs/op
+BenchmarkProxyForwardMultipleTargets-10       	       2	1500471841 ns/op	    7704 B/op	      72 allocs/op
+BenchmarkProxyHostClient-10                   	       2	1981847248 ns/op	   37060 B/op	      40 allocs/op
+BenchmarkProxyHostClientParallel-10           	       2	1500465370 ns/op	    4112 B/op	      42 allocs/op
+BenchmarkProxyWithMockBackend-10              	   96135	     12150 ns/op	    3065 B/op	      42 allocs/op
+BenchmarkProxyLoadBalancerSelection/round_robin_3-10         	21344373	        59.75 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyLoadBalancerSelection/round_robin_50-10        	13515140	        86.74 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyLoadBalancerSelection/weighted_round_robin_3-10         	18620368	        61.38 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyLoadBalancerSelection/least_conn_3-10                   	20915076	        56.56 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyLoadBalancerSelection/ip_hash_3-10                      	12006486	        94.68 ns/op	      48 B/op	       3 allocs/op
+BenchmarkProxyHeaderProcessing-10                                     	  386919	      2660 ns/op	    2930 B/op	      35 allocs/op
+BenchmarkBuildCacheKeyHash/buildCacheKeyHash_with_string-10           	17636347	        63.94 ns/op	      24 B/op	       1 allocs/op
+BenchmarkBuildCacheKeyHash/buildCacheKeyHashValue_direct-10           	38463036	        31.97 ns/op	       0 B/op	       0 allocs/op
+BenchmarkProxyObjectPoolGetRelease/UpstreamTiming_Pooled-10           	   29078	     40040 ns/op	       0 B/op	       0 allocs/op
+BenchmarkProxyObjectPoolGetRelease/VariableContext_Pooled-10          	15730250	        76.39 ns/op	       8 B/op	       1 allocs/op
+BenchmarkProxyResponsePoolParallel-10                                 	       1	3000834054 ns/op	   79184 B/op	     133 allocs/op
+
+=== resolver.txt ===
+BenchmarkDNSResolverLookupWithCache-10    	 6284577	       236.4 ns/op	      48 B/op	       1 allocs/op
+BenchmarkDNSResolverConcurrent-10         	 6265792	       206.6 ns/op	      48 B/op	       1 allocs/op
+BenchmarkDNSResolverCacheExpiry-10        	 2145366	       548.2 ns/op	     144 B/op	       3 allocs/op
+BenchmarkDNSResolverCacheWriteLock-10     	 7186472	       167.5 ns/op	      32 B/op	       2 allocs/op
+BenchmarkDNSResolverMixedWorkload-10      	 3976573	       322.6 ns/op	      64 B/op	       2 allocs/op
+BenchmarkDNSCacheEntryRLock-10            	100000000	        22.01 ns/op	       0 B/op	       0 allocs/op
+BenchmarkDNSCacheEntryRWLock-10           	 5163608	       241.5 ns/op	     175 B/op	       5 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/resolver	11.167s
+
+=== server.txt ===
+BenchmarkMiddlewareNewChainApply-10                 	 7164360	       154.6 ns/op	      48 B/op	       3 allocs/op
+BenchmarkMiddlewareProcessChain-10                  	1000000000	         1.098 ns/op	       0 B/op	       0 allocs/op
+BenchmarkMiddlewareChainExecution-10                	182316565	         6.727 ns/op	       0 B/op	       0 allocs/op
+BenchmarkMiddlewareChainExecutionWithResponse-10    	 1052726	      1024 ns/op	    1568 B/op	       3 allocs/op
+BenchmarkMiddlewareEmptyChain-10                    	40100878	       498.1 ns/op	      12 B/op	       0 allocs/op
+BenchmarkMiddlewareSingleMiddleware-10              	88622024	        23.34 ns/op	      10 B/op	       0 allocs/op
+BenchmarkGoroutinePoolSubmit-10                     	70090189	        17.06 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolParallel-10                   	45083467	        36.31 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolSubmit_BlockingPath-10        	133075401	        10.14 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolQueueFull-10                  	126751026	        13.89 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolWorkerRecycle-10              	      15	  71130566 ns/op	   17697 B/op	     220 allocs/op
+BenchmarkGoroutinePoolSubmitWithWork/Workers10-10   	 5490272	       223.2 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolSubmitWithWork/Workers100-10  	 5219361	       225.5 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolSubmitWithWork/Workers1000-10 	 2774235	       462.6 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolMinWorkers/WithMinWorkers-10  	85318759	        16.51 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolMinWorkers/NoMinWorkers-10    	81247957	        17.23 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolObjectPool/PoolTask_Submit-10 	60380559	        16.60 ns/op	       0 B/op	       0 allocs/op
+BenchmarkGoroutinePoolObjectPool/PoolTask_Reuse_NoClosure-10         	62161117	        16.80 ns/op	       0 B/op	       0 allocs/op
+BenchmarkPoolMemoryReuse/WithPool_GetPut-10                          	93926037	        12.02 ns/op	       0 B/op	       0 allocs/op
+BenchmarkPoolMemoryReuse/WithoutPool_Alloc-10                        	10268364	       119.9 ns/op	     256 B/op	       1 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/server	45.979s
+
+=== stream.txt ===
+BenchmarkStreamFilterHealthy/3_healthy-10           	51941268	        24.31 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamFilterHealthy/10_healthy_80-10       	52304758	        37.76 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamFilterHealthy/50_healthy_50-10       	51739732	        39.39 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamFilterHealthy/100_healthy_80-10      	49306483	        41.64 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamFilterHealthyPreallocated-10         	100000000	        10.45 ns/op	       0 B/op	       0 allocs/op
+BenchmarkUDPSessionAllocations/no_pool_65k-10       	   33656	     37885 ns/op	   65536 B/op	       1 allocs/op
+BenchmarkUDPSessionAllocations/sync_pool_65k-10     	10833096	        93.02 ns/op	      24 B/op	       1 allocs/op
+BenchmarkUDPSessionAllocations/no_pool_16k-10       	  269760	      4646 ns/op	   16384 B/op	       1 allocs/op
+BenchmarkUDPSessionAllocations/sync_pool_16k-10     	31780728	        37.38 ns/op	      24 B/op	       1 allocs/op
+BenchmarkUDPSessionGetOrCreate-10                   	15100776	        70.40 ns/op	      32 B/op	       3 allocs/op
+BenchmarkUDPSessionGetOnly-10                       	19103670	        66.96 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStreamBalancerSelect/round_robin_3-10      	47924217	        24.54 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/round_robin_10-10     	56421152	        22.36 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/round_robin_50-10     	47163234	        22.13 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/weighted_round_robin_3-10         	37397344	        32.71 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/weighted_round_robin_10-10        	40612486	        29.41 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/least_conn_3-10                   	1000000000	         0.7223 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/least_conn_10-10                  	826034833	         1.427 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamBalancerSelect/ip_hash_3-10                      	70855179	        22.03 ns/op	      16 B/op	       1 allocs/op
+BenchmarkStreamBalancerSelect/ip_hash_10-10                     	93524262	        18.33 ns/op	      16 B/op	       1 allocs/op
+BenchmarkStreamRoundRobinWithUnhealthy/3_1_unhealthy-10         	68404255	        15.29 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamRoundRobinWithUnhealthy/10_3_unhealthy-10        	54019622	        22.22 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamRoundRobinWithUnhealthy/50_20_unhealthy-10       	22136433	        55.78 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamLeastConnWithVaryingConns/uniform-10             	426217518	         2.781 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamLeastConnWithVaryingConns/varying-10             	434543780	         2.777 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamLeastConnWithVaryingConns/extreme-10             	411333520	         2.789 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamWeightedRoundRobinDistribution/equal-10          	64135672	        18.20 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamWeightedRoundRobinDistribution/linear-10         	66114645	        19.27 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamWeightedRoundRobinDistribution/heavy-10          	56737513	        19.44 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStreamWeightedRoundRobinDistribution/exponential-10    	58088670	        21.10 ns/op	       0 B/op	       0 allocs/op
+
+=== summary.txt ===
+BenchmarkFileCacheGet/Size100-10                         	21736779	        52.62 ns/op	      16 B/op	       1 allocs/op
+BenchmarkFileCacheGet/Size1000-10                        	22091924	        50.41 ns/op	      16 B/op	       1 allocs/op
+BenchmarkFileCacheGet/Size10000-10                       	24389118	        44.63 ns/op	      21 B/op	       1 allocs/op
+BenchmarkFileCacheSet/Size100-10                         	 2482664	       513.3 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet/Size1000-10                        	 2469159	       482.6 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet/Size10000-10                       	 2264976	       713.2 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size100-10                  	 2663748	       449.6 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size1000-10                 	 1215387	       895.9 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSet_Pooled/Size10000-10                	 1000000	      3025 ns/op	     120 B/op	       5 allocs/op
+BenchmarkFileCacheSetNoEviction-10                       	 2499050	       912.4 ns/op	     112 B/op	       5 allocs/op
+BenchmarkFileCacheConcurrent/Size100-10                  	 8061500	       139.6 ns/op	      26 B/op	       1 allocs/op
+BenchmarkFileCacheConcurrent/Size1000-10                 	 5312031	       222.8 ns/op	      31 B/op	       2 allocs/op
+BenchmarkFileCacheConcurrent/Size10000-10                	 5357617	       227.2 ns/op	      31 B/op	       2 allocs/op
+BenchmarkFileCacheGetOnly-10                             	15445342	        78.82 ns/op	      29 B/op	       1 allocs/op
+BenchmarkFileCacheSizeEviction-10                        	 1275866	       942.1 ns/op	    1121 B/op	       5 allocs/op
+BenchmarkFileCacheLRUTouch-10                            	10095694	       115.6 ns/op	      16 B/op	       1 allocs/op
+BenchmarkProxyCacheGet-10                                	29701370	       125.0 ns/op	      13 B/op	       1 allocs/op
+BenchmarkProxyCacheSet-10                                	  717621	      1754 ns/op	     251 B/op	       3 allocs/op
+BenchmarkProxyCacheConcurrent-10                         	 4811952	       276.0 ns/op	      69 B/op	       2 allocs/op
+BenchmarkFileCacheSetAllocation_New-10                   	 2149519	       522.3 ns/op	      97 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_Update-10                	 3871534	       348.9 ns/op	      45 B/op	       2 allocs/op
+BenchmarkFileCacheSetAllocation_Eviction-10              	 2248743	       552.0 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_EvictionWithPool-10      	 2310462	       515.7 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_MemoryLimit-10           	 2186145	       563.2 ns/op	      96 B/op	       4 allocs/op
+BenchmarkFileCacheSetAllocation_Concurrent-10            	 1934901	       654.7 ns/op	      88 B/op	       3 allocs/op
+BenchmarkFileCacheSetAllocation_ConcurrentEviction-10    	 2139834	       609.0 ns/op	      96 B/op	       3 allocs/op
+BenchmarkFileCacheEntryPool_GetPut-10                    	85020030	        12.46 ns/op	       0 B/op	       0 allocs/op
+BenchmarkFileCacheLRUList_PushFront-10                   	 6249896	       206.8 ns/op	     232 B/op	       4 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/cache	45.363s
+
+=== utils.txt ===
+BenchmarkExtractClientIP/X-Forwarded-For_single_IP-10         	15637682	        76.06 ns/op	      32 B/op	       2 allocs/op
+BenchmarkExtractClientIP/X-Forwarded-For_multiple_IPs-10      	11151398	       107.4 ns/op	      96 B/op	       2 allocs/op
+BenchmarkExtractClientIP/X-Real-IP_only-10                    	16888720	        71.44 ns/op	      16 B/op	       1 allocs/op
+BenchmarkExtractClientIP/RemoteAddr_fallback-10               	14076492	        85.15 ns/op	       8 B/op	       1 allocs/op
+BenchmarkExtractClientIPNet/X-Forwarded-For_single_IP-10      	 9092592	       133.8 ns/op	      48 B/op	       3 allocs/op
+BenchmarkExtractClientIPNet/X-Real-IP_only-10                 	 9696522	       125.2 ns/op	      32 B/op	       2 allocs/op
+BenchmarkExtractClientIPNet/RemoteAddr_fallback-10            	22064487	        52.76 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStripPort/IPv4_with_port-10                          	298226187	         3.951 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStripPort/IPv6_with_port-10                          	286273158	         4.211 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStripPort/no_port-10                                 	252196581	         4.841 ns/op	       0 B/op	       0 allocs/op
+BenchmarkStripPort/empty_string-10                            	1000000000	         0.4960 ns/op	       0 B/op	       0 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/netutil	12.508s
+BenchmarkLoadCACertPool-10                           	   74826	     15530 ns/op	    6448 B/op	      54 allocs/op
+BenchmarkGenerateTicketKey-10                        	11567258	       101.1 ns/op	      32 B/op	       1 allocs/op
+BenchmarkSessionTicketManager_GetKeys-10             	10594251	       116.0 ns/op	     176 B/op	       4 allocs/op
+BenchmarkSessionTicketManager_RotateKey-10           	 8896942	       135.0 ns/op	      80 B/op	       1 allocs/op
+BenchmarkTLSHandshake-10                             	    1578	    758396 ns/op	  117043 B/op	     844 allocs/op
+BenchmarkTLSHandshake_TLS13Only-10                   	    1525	    752984 ns/op	  116542 B/op	     839 allocs/op
+BenchmarkTLSCertificateLoad-10                       	   27265	     44040 ns/op	    8637 B/op	     121 allocs/op
+BenchmarkTLSCertificateLoad_InMemory-10              	   44395	     27435 ns/op	    6796 B/op	     111 allocs/op
+BenchmarkTLSCertificateLoad_Parallel-10              	   73044	     16299 ns/op	    8681 B/op	     121 allocs/op
+BenchmarkTLSRenegotiation-10                         	    1742	    651865 ns/op	   41879 B/op	     442 allocs/op
+BenchmarkOCSPStapling-10                             	49240950	        23.65 ns/op	       0 B/op	       0 allocs/op
+BenchmarkOCSPStapling_Miss-10                        	49655992	        23.88 ns/op	       0 B/op	       0 allocs/op
+BenchmarkSessionTicketManager_ApplyToTLSConfig-10    	  948468	      1204 ns/op	     928 B/op	       7 allocs/op
+BenchmarkCipherSuiteParsing-10                       	13597928	        86.37 ns/op	      16 B/op	       1 allocs/op
+BenchmarkTLSVersionsParsing-10                       	235120939	         5.090 ns/op	       0 B/op	       0 allocs/op
+PASS
+ok  	rua.plus/lolly/internal/ssl	18.173s
+
diff --git a/docs/superpowers/plans/2026-06-03-eliminate-code-redundancy.md b/docs/superpowers/plans/2026-06-03-eliminate-code-redundancy.md
new file mode 100644
index 0000000..260409a
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-03-eliminate-code-redundancy.md
@@ -0,0 +1,929 @@
+# 消除代码冗余实施计划
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 消除 lolly 项目中的代码冗余：删除 8 处死代码、重构 2 处源文件重复模式、提取测试辅助函数减少 184 处配置字面量重复。
+
+**Architecture:** 分三阶段实施：阶段 1 删除未使用的死代码（零风险）；阶段 2 提取路由注册和 DEBUG 日志辅助函数（低风险重构）；阶段 3 创建测试辅助函数包并迁移重复代码（逐步替换）。
+
+**Tech Stack:** Go 1.22+, golangci-lint, dupl/unused linters
+
+---
+
+## 文件结构
+
+**创建：**
+- `internal/testutil/proxy.go` - 测试辅助函数（ProxyConfig、Target 创建）
+
+**修改：**
+- `internal/config/validate.go` - 删除 `validateStatic()` 函数
+- `internal/config/validate_test.go` - 删除 `TestValidateStatic` 测试
+- `internal/http2/server.go` - 删除 `connectionPool.get()` 和 `connectionPool.count()`
+- `internal/middleware/bodylimit/bodylimit.go` - 删除 `formatSize()` 函数
+- `internal/middleware/bodylimit/bodylimit_test.go` - 删除 `TestFormatSize` 测试
+- `internal/middleware/security/headers.go` - 删除 3 个 security headers 函数
+- `internal/middleware/security/headers_test.go` - 删除 3 个对应测试
+- `internal/ssl/ocsp.go` - 删除 `extractCertificates()` 函数
+- `internal/ssl/ocsp_test.go` - 删除 2 个对应测试
+- `internal/server/router.go` - 提取 `registerRoute` 辅助函数
+- `internal/proxy/proxy.go` - 提取 `proxyDebugLog` 辅助函数
+
+---
+
+## 阶段 1：死代码删除
+
+### Task 1: 删除 `validateStatic` 函数及其测试
+
+**Files:**
+- Modify: `internal/config/validate.go:475-484`
+- Modify: `internal/config/validate_test.go:752-809`
+
+- [ ] **Step 1: 删除 `validateStatic` 函数**
+
+删除 `internal/config/validate.go` 第 475-484 行：
+
+```go
+// validateStatic 验证静态文件配置。
+//
+// 参数：
+//   - s: 静态文件配置对象
+//
+// 返回值：
+//   - error: 验证失败时返回错误信息，成功返回 nil
+func validateStatic(s *StaticConfig) error {
+	// 静态文件根目录非空时验证路径有效性
+	if s.Root != "" {
+		// 路径安全检查：不允许包含 ".."
+		if err := ValidatePathTraversal(s.Root, "根目录路径"); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+```
+
+- [ ] **Step 2: 删除对应的单元测试**
+
+删除 `internal/config/validate_test.go` 第 752-809 行的 `TestValidateStatic` 函数：
+
+```go
+func TestValidateStatic(t *testing.T) {
+	t.Parallel()
+	// TestValidateStatic 测试静态文件配置验证。
+	tests := []struct {
+		name    string
+		errMsg  string
+		config  StaticConfig
+		wantErr bool
+	}{
+		{
+			name:    "空配置有效",
+			config:  StaticConfig{},
+			wantErr: false,
+		},
+		{
+			name: "有效根目录",
+			config: StaticConfig{
+				Root: "/var/www/html",
+			},
+			wantErr: false,
+		},
+		{
+			name: "根目录含..路径遍历",
+			config: StaticConfig{
+				Root: "/var/www/../etc",
+			},
+			wantErr: true,
+			errMsg:  "根目录路径不能包含 '..'",
+		},
+		{
+			name: "根目录含多个..",
+			config: StaticConfig{
+				Root: "/var/../www/../html",
+			},
+			wantErr: true,
+			errMsg:  "根目录路径不能包含 '..'",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			err := validateStatic(&tt.config)
+			if tt.wantErr {
+				if err == nil {
+					t.Errorf("validateStatic() 期望返回错误，但返回 nil")
+					return
+				}
+				if tt.errMsg != "" && !strings.Contains(err.Error(), tt.errMsg) {
+					t.Errorf("validateStatic() 错误消息不匹配，期望包含 %q，实际 %q", tt.errMsg, err.Error())
+				}
+			} else {
+				if err != nil {
+					t.Errorf("validateStatic() 期望返回 nil，但返回错误: %v", err)
+				}
+			}
+		})
+	}
+}
+```
+
+- [ ] **Step 3: 运行测试确认通过**
+
+Run: `go test ./internal/config/... -run TestValidateStatic -v`
+Expected: 无此测试（因为已删除）
+
+Run: `go test ./internal/config/... -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/config/validate.go internal/config/validate_test.go
+git commit -m "refactor: remove unused validateStatic function and its test"
+```
+
+---
+
+### Task 2: 删除 `connectionPool` 未使用的方法
+
+**Files:**
+- Modify: `internal/http2/server.go:575-587`
+
+- [ ] **Step 1: 删除 `get` 和 `count` 方法**
+
+删除 `internal/http2/server.go` 第 575-587 行：
+
+```go
+// get 获取连接。
+func (p *connectionPool) get(key string) []net.Conn {
+	p.mu.RLock()
+	defer p.mu.RUnlock()
+	return p.conns[key]
+}
+
+// count 获取连接数。
+func (p *connectionPool) count(key string) int {
+	p.mu.RLock()
+	defer p.mu.RUnlock()
+	return len(p.conns[key])
+}
+```
+
+- [ ] **Step 2: 运行测试确认通过**
+
+Run: `go test ./internal/http2/... -v`
+Expected: PASS
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/http2/server.go
+git commit -m "refactor: remove unused connectionPool.get and connectionPool.count methods"
+```
+
+---
+
+### Task 3: 删除 `bodylimit.formatSize` 函数及其测试
+
+**Files:**
+- Modify: `internal/middleware/bodylimit/bodylimit.go:279-305`
+- Modify: `internal/middleware/bodylimit/bodylimit_test.go:36-72`
+
+- [ ] **Step 1: 删除 `formatSize` 函数**
+
+删除 `internal/middleware/bodylimit/bodylimit.go` 第 279-305 行：
+
+```go
+// formatSize 将字节数格式化为人类可读的字符串。
+//
+// 根据大小自动选择合适的单位（b、kb、mb、gb）。
+//
+// 参数：
+//   - size: 字节数
+//
+// 返回值：
+//   - string: 格式化后的字符串，如 "1.00mb"、"10.00kb"
+func formatSize(size int64) string {
+	const (
+		KB = 1024
+		MB = 1024 * KB
+		GB = 1024 * MB
+	)
+
+	switch {
+	case size >= GB:
+		return fmt.Sprintf("%.2fgb", float64(size)/GB)
+	case size >= MB:
+		return fmt.Sprintf("%.2fmb", float64(size)/MB)
+	case size >= KB:
+		return fmt.Sprintf("%.2fkb", float64(size)/KB)
+	default:
+		return fmt.Sprintf("%db", size)
+	}
+}
+```
+
+- [ ] **Step 2: 删除对应的单元测试**
+
+删除 `internal/middleware/bodylimit/bodylimit_test.go` 第 36-72 行的 `TestFormatSize` 函数：
+
+```go
+func TestFormatSize(t *testing.T) {
+	tests := []struct {
+		input    int64
+		expected string
+	}{
+		{512, "512b"},
+		{1024, "1.00kb"},
+		{1024 * 1024, "1.00mb"},
+		{1024 * 1024 * 1024, "1.00gb"},
+		{1536, "1.50kb"},
+	}
+
+	for _, tt := range tests {
+		t.Run(formatSize(tt.input), func(t *testing.T) {
+			got := formatSize(tt.input)
+			if got != tt.expected {
+				t.Errorf("formatSize(%d) = %s, want %s", tt.input, got, tt.expected)
+			}
+		})
+	}
+}
+```
+
+- [ ] **Step 3: 运行测试确认通过**
+
+Run: `go test ./internal/middleware/bodylimit/... -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/middleware/bodylimit/bodylimit.go internal/middleware/bodylimit/bodylimit_test.go
+git commit -m "refactor: remove unused bodylimit.formatSize function and test"
+```
+
+---
+
+### Task 4: 删除 security headers 未使用的函数及其测试
+
+**Files:**
+- Modify: `internal/middleware/security/headers.go:291-331`
+- Modify: `internal/middleware/security/headers_test.go:184-215`
+
+- [ ] **Step 1: 删除 3 个 security headers 函数**
+
+删除 `internal/middleware/security/headers.go` 第 291-331 行：
+
+```go
+// defaultSecurityHeaders 返回安全的安全头默认配置。
+//
+// 返回值：
+//   - *config.SecurityHeaders: 包含安全默认值的配置对象
+func defaultSecurityHeaders() *config.SecurityHeaders {
+	return &config.SecurityHeaders{
+		XFrameOptions:       "DENY",
+		XContentTypeOptions: "nosniff",
+		ReferrerPolicy:      "strict-origin-when-cross-origin",
+	}
+}
+
+// strictSecurityHeaders 返回严格模式的安全头配置。
+//
+// 适用于高安全要求的应用场景，包含严格的 CSP 和权限策略。
+//
+// 返回值：
+//   - *config.SecurityHeaders: 包含严格安全值的配置对象
+func strictSecurityHeaders() *config.SecurityHeaders {
+	return &config.SecurityHeaders{
+		XFrameOptions:         "DENY",
+		XContentTypeOptions:   "nosniff",
+		ContentSecurityPolicy: "default-src 'self'; script-src 'self'; style-src 'self'; img-src 'self'; font-src 'self'; connect-src 'self'; frame-ancestors 'none'",
+		ReferrerPolicy:        "no-referrer",
+		PermissionsPolicy:     "accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()",
+	}
+}
+
+// developmentSecurityHeaders 返回开发环境使用的宽松安全头配置。
+//
+// 警告：请勿在生产环境使用此配置，安全性较低。
+//
+// 返回值：
+//   - *config.SecurityHeaders: 包含宽松安全值的配置对象
+func developmentSecurityHeaders() *config.SecurityHeaders {
+	return &config.SecurityHeaders{
+		XFrameOptions:       "SAMEORIGIN",
+		XContentTypeOptions: "nosniff",
+		ReferrerPolicy:      "strict-origin-when-cross-origin",
+	}
+}
+```
+
+- [ ] **Step 2: 删除对应的单元测试**
+
+删除 `internal/middleware/security/headers_test.go` 第 184-215 行：
+
+```go
+func TestDefaultSecurityHeaders(t *testing.T) {
+	cfg := defaultSecurityHeaders()
+
+	if cfg.XFrameOptions != "DENY" {
+		t.Errorf("Expected default X-Frame-Options 'DENY', got %s", cfg.XFrameOptions)
+	}
+	if cfg.XContentTypeOptions != "nosniff" {
+		t.Errorf("Expected default X-Content-Type-Options 'nosniff', got %s", cfg.XContentTypeOptions)
+	}
+}
+
+func TestStrictSecurityHeaders(t *testing.T) {
+	cfg := strictSecurityHeaders()
+
+	if cfg.XFrameOptions != "DENY" {
+		t.Errorf("Expected X-Frame-Options 'DENY', got %s", cfg.XFrameOptions)
+	}
+	if cfg.ReferrerPolicy != "no-referrer" {
+		t.Errorf("Expected Referrer-Policy 'no-referrer', got %s", cfg.ReferrerPolicy)
+	}
+	if cfg.ContentSecurityPolicy == "" {
+		t.Error("Expected non-empty CSP for strict config")
+	}
+}
+
+func TestDevelopmentSecurityHeaders(t *testing.T) {
+	cfg := developmentSecurityHeaders()
+
+	if cfg.XFrameOptions != "SAMEORIGIN" {
+		t.Errorf("Expected X-Frame-Options 'SAMEORIGIN' for dev, got %s", cfg.XFrameOptions)
+	}
+}
+```
+
+- [ ] **Step 3: 运行测试确认通过**
+
+Run: `go test ./internal/middleware/security/... -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/middleware/security/headers.go internal/middleware/security/headers_test.go
+git commit -m "refactor: remove unused security header preset functions and tests"
+```
+
+---
+
+### Task 5: 删除 `extractCertificates` 函数及其测试
+
+**Files:**
+- Modify: `internal/ssl/ocsp.go:482-514`
+- Modify: `internal/ssl/ocsp_test.go:311-335`
+
+- [ ] **Step 1: 删除 `extractCertificates` 函数**
+
+删除 `internal/ssl/ocsp.go` 第 482-514 行：
+
+```go
+// extractCertificates 解析 PEM 数据并返回证书列表。
+//
+// 参数：
+//   - pemData: PEM 编码的证书数据
+//
+// 返回值：
+//   - []*x509.Certificate: 解析后的证书列表
+//   - error: 解析失败时返回错误
+func extractCertificates(pemData []byte) ([]*x509.Certificate, error) {
+	var certs []*x509.Certificate
+	rest := pemData
+
+	for {
+		block, remaining := pem.Decode(rest)
+		if block == nil {
+			break
+		}
+		if block.Type == "CERTIFICATE" {
+			cert, err := x509.ParseCertificate(block.Bytes)
+			if err != nil {
+				return nil, fmt.Errorf("failed to parse certificate: %w", err)
+			}
+			certs = append(certs, cert)
+		}
+		rest = remaining
+	}
+
+	if len(certs) == 0 {
+		return nil, errors.New("no certificates found in PEM data")
+	}
+
+	return certs, nil
+}
+```
+
+- [ ] **Step 2: 删除对应的单元测试**
+
+删除 `internal/ssl/ocsp_test.go` 第 311-335 行：
+
+```go
+func TestExtractCertificates(t *testing.T) {
+	// Create valid PEM data
+	certPEM, _ := generateTestCertWithOCSP(t, nil)
+
+	certs, err := extractCertificates(certPEM)
+	if err != nil {
+		t.Fatalf("extractCertificates() failed: %v", err)
+	}
+
+	if len(certs) == 0 {
+		t.Error("Expected at least one certificate")
+	}
+}
+
+func TestExtractCertificatesInvalidPEM(t *testing.T) {
+	invalidPEM := []byte("not valid pem data")
+
+	certs, err := extractCertificates(invalidPEM)
+	if err == nil {
+		t.Error("Expected error for invalid PEM data")
+	}
+	if certs != nil {
+		t.Error("Expected nil certs for invalid PEM data")
+	}
+}
+```
+
+- [ ] **Step 3: 运行测试确认通过**
+
+Run: `go test ./internal/ssl/... -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/ssl/ocsp.go internal/ssl/ocsp_test.go
+git commit -m "refactor: remove unused extractCertificates function and tests"
+```
+
+---
+
+## 阶段 2：源文件重复模式重构
+
+### Task 6: 提取路由注册辅助函数
+
+**Files:**
+- Modify: `internal/server/router.go:84-124` 和 `internal/server/router.go:190-220` 和 `internal/server/router.go:390-420`
+
+- [ ] **Step 1: 添加 `registerRoute` 辅助函数**
+
+在 `internal/server/router.go` 的 `configureProxyRoutes` 函数之前添加：
+
+```go
+// registerRoute 根据位置类型注册路由
+func (s *Server) registerRoute(
+	locType string,
+	path string,
+	handler fasthttp.RequestHandler,
+	internal bool,
+	source string,
+) error {
+	var err error
+	switch locType {
+	case matcher.LocationTypeExact:
+		err = s.locationEngine.AddExact(path, handler, internal)
+	case matcher.LocationTypePrefixPriority:
+		err = s.locationEngine.AddPrefixPriority(path, handler, internal)
+	case matcher.LocationTypeRegex:
+		err = s.locationEngine.AddRegex(path, handler, false, internal)
+	case matcher.LocationTypeRegexCaseless:
+		err = s.locationEngine.AddRegex(path, handler, true, internal)
+	case matcher.LocationTypeNamed:
+		err = s.locationEngine.AddNamed(path, handler)
+	default:
+		err = s.locationEngine.AddPrefix(path, handler, internal)
+	}
+	if err != nil {
+		return s.handleRegistrationError(source, path, err)
+	}
+	return nil
+}
+```
+
+- [ ] **Step 2: 重构 proxy 路由注册**
+
+将 `internal/server/router.go` 第 84-124 行的 switch 语句替换为：
+
+```go
+		switch locType {
+		case matcher.LocationTypeExact:
+			if err := s.registerRoute(locType, proxyCfg.Path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+				return err
+			}
+		case matcher.LocationTypePrefixPriority:
+			if err := s.registerRoute(locType, proxyCfg.Path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+				return err
+			}
+		case matcher.LocationTypeRegex, matcher.LocationTypeRegexCaseless:
+			caseInsensitive := locType == matcher.LocationTypeRegexCaseless
+			if err := s.registerRoute(locType, proxyCfg.Path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+				return err
+			}
+		case matcher.LocationTypeNamed:
+			if proxyCfg.LocationName != "" {
+				if err := s.registerRoute(locType, "@"+proxyCfg.LocationName, p.ServeHTTP, false, "proxy"); err != nil {
+					return err
+				}
+			}
+		case matcher.LocationTypePrefix:
+			if err := s.registerRoute(locType, proxyCfg.Path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+				return err
+			}
+		default:
+			if err := s.registerRoute(locType, proxyCfg.Path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+				return err
+			}
+		}
+```
+
+- [ ] **Step 3: 重构 static 路由注册**
+
+将 `internal/server/router.go` 第 190-220 行的类似代码替换为 `registerRoute` 调用。
+
+- [ ] **Step 4: 重构 lua 路由注册**
+
+将 `internal/server/router.go` 第 390-420 行的类似代码替换为 `registerRoute` 调用。
+
+- [ ] **Step 5: 运行测试确认通过**
+
+Run: `go test ./internal/server/... -v`
+Expected: PASS
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add internal/server/router.go
+git commit -m "refactor: extract registerRoute helper to reduce repetition"
+```
+
+---
+
+### Task 7: 提取 DEBUG 日志辅助函数
+
+**Files:**
+- Modify: `internal/proxy/proxy.go:470-476` 和类似位置
+
+- [ ] **Step 1: 添加 `proxyDebugLog` 辅助函数**
+
+在 `internal/proxy/proxy.go` 的 `ServeHTTP` 方法之前添加：
+
+```go
+// proxyDebugLog 在 DEBUG 级别记录代理日志
+func proxyDebugLog(msg string, kv ...interface{}) {
+	if !logging.Debug().Enabled() {
+		return
+	}
+	event := logging.Debug()
+	for i := 0; i < len(kv)-1; i += 2 {
+		key, ok := kv[i].(string)
+		if !ok {
+			continue
+		}
+		switch v := kv[i+1].(type) {
+		case string:
+			event = event.Str(key, v)
+		case int:
+			event = event.Int(key, v)
+		case bool:
+			event = event.Bool(key, v)
+		}
+	}
+	event.Msg(msg)
+}
+```
+
+- [ ] **Step 2: 替换第一个 DEBUG 日志**
+
+将第 470-476 行：
+```go
+	if logging.Debug().Enabled() {
+		logging.Debug().
+			Str("path", b2s(ctx.Path())).
+			Str("host", b2s(ctx.Host())).
+			Str("method", b2s(ctx.Method())).
+			Msg("[PROXY] 收到请求")
+	}
+```
+替换为：
+```go
+	proxyDebugLog("[PROXY] 收到请求",
+		"path", b2s(ctx.Path()),
+		"host", b2s(ctx.Host()),
+		"method", b2s(ctx.Method()),
+	)
+```
+
+- [ ] **Step 3: 替换其余 4 个 DEBUG 日志**
+
+重复 Step 2 的模式，替换第 536-540、555-559、627-631、715-719 行的 DEBUG 日志。
+
+- [ ] **Step 4: 运行测试确认通过**
+
+Run: `go test ./internal/proxy/... -v`
+Expected: PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add internal/proxy/proxy.go
+git commit -m "refactor: extract proxyDebugLog helper for repeated debug logging"
+```
+
+---
+
+## 阶段 3：测试辅助函数
+
+### Task 8: 创建测试辅助函数包
+
+**Files:**
+- Create: `internal/testutil/proxy.go`
+
+- [ ] **Step 1: 创建测试辅助函数文件**
+
+创建 `internal/testutil/proxy.go`：
+
+```go
+package testutil
+
+import (
+	"time"
+
+	"rua.plus/lolly/internal/config"
+	"rua.plus/lolly/internal/loadbalance"
+)
+
+// NewTestProxyConfig 创建测试用的代理配置
+//
+// 参数：
+//   - path: 代理路径
+//   - targetURLs: 后端目标 URL 列表
+//
+// 返回值：
+//   - *config.ProxyConfig: 配置好的代理配置
+func NewTestProxyConfig(path string, targetURLs ...string) *config.ProxyConfig {
+	cfg := &config.ProxyConfig{
+		Path:        path,
+		LoadBalance: "round_robin",
+		Timeout: config.ProxyTimeout{
+			Connect: 5 * time.Second,
+			Read:    30 * time.Second,
+			Write:   30 * time.Second,
+		},
+	}
+
+	if len(targetURLs) > 0 {
+		cfg.Targets = make([]config.ProxyTargetConfig, len(targetURLs))
+		for i, url := range targetURLs {
+			cfg.Targets[i] = config.ProxyTargetConfig{URL: url}
+		}
+	}
+
+	return cfg
+}
+
+// NewTestProxyConfigWithCache 创建带缓存的测试代理配置
+func NewTestProxyConfigWithCache(path string, maxAge time.Duration, targetURLs ...string) *config.ProxyConfig {
+	cfg := NewTestProxyConfig(path, targetURLs...)
+	cfg.Cache = config.ProxyCacheConfig{
+		Enabled: true,
+		MaxAge:  maxAge,
+	}
+	return cfg
+}
+
+// NewTestTarget 创建测试用的代理目标
+//
+// 参数：
+//   - url: 目标 URL
+//
+// 返回值：
+//   - *loadbalance.Target: 测试目标
+func NewTestTarget(url string) *loadbalance.Target {
+	return &loadbalance.Target{URL: url}
+}
+
+// NewTestTargets 批量创建测试目标
+func NewTestTargets(urls ...string) []*loadbalance.Target {
+	targets := make([]*loadbalance.Target, len(urls))
+	for i, url := range urls {
+		targets[i] = NewTestTarget(url)
+	}
+	return targets
+}
+
+// NewTestHealthyTarget 创建已标记为健康的测试目标
+//
+// 参数：
+//   - url: 目标 URL
+//
+// 返回值：
+//   - *loadbalance.Target: 已标记为健康的测试目标
+func NewTestHealthyTarget(url string) *loadbalance.Target {
+	t := NewTestTarget(url)
+	t.Healthy.Store(true)
+	return t
+}
+
+// NewTestHealthyTargets 批量创建健康测试目标
+func NewTestHealthyTargets(urls ...string) []*loadbalance.Target {
+	targets := make([]*loadbalance.Target, len(urls))
+	for i, url := range urls {
+		targets[i] = NewTestHealthyTarget(url)
+	}
+	return targets
+}
+```
+
+- [ ] **Step 2: 编写辅助函数测试**
+
+创建 `internal/testutil/proxy_test.go`：
+
+```go
+package testutil
+
+import (
+	"testing"
+	"time"
+)
+
+func TestNewTestProxyConfig(t *testing.T) {
+	cfg := NewTestProxyConfig("/api", "http://localhost:8080")
+
+	if cfg.Path != "/api" {
+		t.Errorf("expected path /api, got %s", cfg.Path)
+	}
+	if len(cfg.Targets) != 1 {
+		t.Errorf("expected 1 target, got %d", len(cfg.Targets))
+	}
+	if cfg.Timeout.Connect != 5*time.Second {
+		t.Errorf("expected 5s connect timeout, got %v", cfg.Timeout.Connect)
+	}
+}
+
+func TestNewTestHealthyTarget(t *testing.T) {
+	target := NewTestHealthyTarget("http://localhost:8080")
+
+	if target.URL != "http://localhost:8080" {
+		t.Errorf("expected URL http://localhost:8080, got %s", target.URL)
+	}
+	if !target.Healthy.Load() {
+		t.Error("expected target to be healthy")
+	}
+}
+
+func TestNewTestHealthyTargets(t *testing.T) {
+	targets := NewTestHealthyTargets("http://localhost:8080", "http://localhost:8081")
+
+	if len(targets) != 2 {
+		t.Errorf("expected 2 targets, got %d", len(targets))
+	}
+	for i, target := range targets {
+		if !target.Healthy.Load() {
+			t.Errorf("expected target %d to be healthy", i)
+		}
+	}
+}
+```
+
+- [ ] **Step 3: 运行测试确认通过**
+
+Run: `go test ./internal/testutil/... -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/testutil/
+git commit -m "feat: add testutil package for proxy config helpers"
+```
+
+---
+
+### Task 9: 迁移 proxy 测试使用辅助函数
+
+**Files:**
+- Modify: `internal/proxy/proxy_test.go`
+- Modify: `internal/integration/proxy_integration_test.go`
+
+- [ ] **Step 1: 修改 `internal/proxy/proxy_test.go` 导入**
+
+添加导入：
+```go
+import (
+	"rua.plus/lolly/internal/testutil"
+)
+```
+
+- [ ] **Step 2: 替换重复的 ProxyConfig 创建**
+
+将测试中的重复模式替换为：
+```go
+// 替换前：
+cfg := &config.ProxyConfig{
+    Path:        "/api",
+    LoadBalance: "round_robin",
+    Timeout: config.ProxyTimeout{
+        Connect: 5 * time.Second,
+        Read:    30 * time.Second,
+        Write:   30 * time.Second,
+    },
+}
+
+// 替换后：
+cfg := testutil.NewTestProxyConfig("/api")
+```
+
+- [ ] **Step 3: 替换重复的 Target 创建**
+
+将：
+```go
+targets := []*loadbalance.Target{{URL: "http://localhost:8080"}}
+targets[0].Healthy.Store(true)
+```
+替换为：
+```go
+targets := testutil.NewTestHealthyTargets("http://localhost:8080")
+```
+
+- [ ] **Step 4: 运行测试确认通过**
+
+Run: `go test ./internal/proxy/... -v`
+Expected: PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add internal/proxy/proxy_test.go internal/integration/proxy_integration_test.go
+git commit -m "refactor: use testutil helpers in proxy tests"
+```
+
+---
+
+### Task 10: 迁移 server 测试使用辅助函数
+
+**Files:**
+- Modify: `internal/server/*_test.go`
+
+- [ ] **Step 1: 批量替换 server 测试中的重复代码**
+
+使用与 Task 9 相同的模式，替换 `internal/server/` 下所有测试文件中的重复 ProxyConfig 和 Target 创建。
+
+- [ ] **Step 2: 运行测试确认通过**
+
+Run: `go test ./internal/server/... -v`
+Expected: PASS
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/server/
+git commit -m "refactor: use testutil helpers in server tests"
+```
+
+---
+
+## 验收检查
+
+### Task 11: 最终验证
+
+- [ ] **Step 1: 运行 unused linter**
+
+Run: `golangci-lint run --enable=unused ./...`
+Expected: 无 unused 错误
+
+- [ ] **Step 2: 运行 dupl linter**
+
+Run: `golangci-lint run --enable=dupl ./...`
+Expected: 源文件无 dupl 错误（测试文件允许）
+
+- [ ] **Step 3: 运行完整测试套件**
+
+Run: `go test ./...`
+Expected: 全部 PASS
+
+- [ ] **Step 4: 统计代码行数变化**
+
+Run: `git diff --stat`
+Expected: 总行数净减少 >200 行
+
+- [ ] **Step 5: 最终 Commit**
+
+```bash
+git commit -m "chore: eliminate code redundancy - dead code removal, pattern extraction, test helpers"
+```
+
+---
+
+## Self-Review Checklist
+
+1. **Spec coverage**: 所有 3 个阶段都有详细任务 ✓
+2. **Placeholder scan**: 无 TBD、TODO 或模糊描述 ✓
+3. **Type consistency**: `registerRoute` 和 `proxyDebugLog` 签名与使用处一致 ✓
+4. **File paths**: 所有路径均为绝对路径，与代码库匹配 ✓
+5. **Commands**: 每个测试步骤都有明确的运行命令和预期输出 ✓
diff --git a/docs/superpowers/plans/2026-06-03-redundancy-removal.md b/docs/superpowers/plans/2026-06-03-redundancy-removal.md
new file mode 100644
index 0000000..080abab
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-03-redundancy-removal.md
@@ -0,0 +1,791 @@
+# Lolly 代码冗余优化实施计划
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 系统性消除 Lolly 代码库中的冗余代码，包括死代码、重复实现、过度工程化和测试重复，提升可维护性和代码质量。
+
+**Architecture:** 采用分阶段、增量式重构策略。每阶段独立可交付，确保随时可回滚。优先处理死代码（零风险、高回报），然后处理重复实现（低风险、中回报），最后处理架构级重复（中风险、长期收益）。
+
+**Tech Stack:** Go 1.24, fasthttp, staticcheck, go vet
+
+---
+
+## 文件结构映射
+
+### 删除/清理的文件
+- `internal/middleware/limitrate/limitrate.go` — 死代码包主文件
+- `internal/middleware/limitrate/writer.go` — 死代码包辅助文件  
+- `internal/middleware/limitrate/limitrate_test.go` — 死代码包测试文件
+- `internal/stream/ssl.go` — 死代码（所有字段未使用）
+- `internal/stream/ssl_test.go` — 死代码测试文件
+- `internal/variable/pool.go` — 死代码（所有字段未使用）
+- `internal/proxy/proxy_coverage_extra_test.go` 中的 `TestExtractHostFromURL` — 被测函数即将删除
+
+### 修改的文件（按模块分组）
+
+**Phase 1 - 死代码清理：**
+- `internal/mimeutil/detect.go:154` — 添加 defaultMIME 回退逻辑
+- `internal/app/app_test.go:448` — 删除未使用的 `customSig`
+- `internal/app/testutil.go:17` — 删除未使用的 `setupTestLogger`
+- `internal/http3/server_test.go:138` — 删除未使用的 `generateTestCertificate`
+- `internal/proxy/proxy_dns_test.go:91` — 删除未使用的方法
+- `internal/server/testutil.go:15` — 删除未使用的常量
+- `internal/server/upgrade_test.go:291` — 删除未使用的 `containsString`
+- `internal/server/pool_bench_test.go:305` — 删除未使用的 `id` 字段
+- `internal/stream/stream_test.go:24` — 删除未使用的 `generateTestCertificate`
+
+**Phase 2 - 重复实现消除：**
+- `internal/proxy/proxy.go:362,1003-1018` — 删除 `extractHostFromURL`，改用 `netutil.ParseTargetURL`
+- `internal/proxy/header_modifier.go:33` — 改用 `netutil.ParseTargetURL`
+- `internal/handler/static.go:628,832-836` — 删除 `generateETag` 包装，直接调用 `utils.GenerateETag`
+- `internal/cache/file_cache.go:47,181` — 删除 `generateETag` 包装，直接调用 `utils.GenerateETag`
+- `internal/utils/httperror.go:67-86` — 简化 `CheckIPAccess`，复用 `IPInAllowList`
+
+**Phase 3 - 路由和服务器逻辑简化：**
+- `internal/server/router.go:118-145,217-234,402-423` — 消除冗余 switch 块
+- `internal/server/server.go:454-868` — 提取三种启动模式的公共函数
+
+**Phase 4 - 负载均衡统一（可选）：**
+- `internal/stream/stream.go:61-285` — 复用 `internal/loadbalance` 的算法实现
+
+---
+
+## 任务分解
+
+### Phase 1: 死代码清理（P0）
+
+---
+
+#### Task 1.1: 删除 limitrate 死代码包
+
+**Files:**
+- Delete: `internal/middleware/limitrate/limitrate.go`
+- Delete: `internal/middleware/limitrate/writer.go`
+- Delete: `internal/middleware/limitrate/limitrate_test.go`
+
+- [ ] **Step 1: 确认包未被引用**
+
+```bash
+grep -r "limitrate" --include="*.go" /home/xfy/Developer/lolly/internal/
+```
+
+Expected: 仅返回 `internal/middleware/limitrate/` 目录内的匹配，无外部引用。
+
+- [ ] **Step 2: 删除整个目录**
+
+```bash
+rm -rf /home/xfy/Developer/lolly/internal/middleware/limitrate/
+```
+
+- [ ] **Step 3: 验证编译通过**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./...
+```
+
+Expected: 无错误，编译成功。
+
+- [ ] **Step 4: 运行受影响包的测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go test ./internal/middleware/...
+```
+
+Expected: 全部通过。
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove dead code package internal/middleware/limitrate"
+```
+
+---
+
+#### Task 1.2: 删除 stream/ssl.go 死代码
+
+**Files:**
+- Delete: `internal/stream/ssl.go`
+- Delete: `internal/stream/ssl_test.go`
+
+- [ ] **Step 1: 确认 ssl.go 字段未被使用**
+
+```bash
+grep -r "SSLManager\|ProxySSLManager" --include="*.go" /home/xfy/Developer/lolly/internal/
+```
+
+Expected: 仅 `internal/stream/ssl.go` 自身有定义，无其他引用。
+
+- [ ] **Step 2: 删除文件**
+
+```bash
+rm /home/xfy/Developer/lolly/internal/stream/ssl.go
+rm /home/xfy/Developer/lolly/internal/stream/ssl_test.go
+```
+
+- [ ] **Step 3: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/stream/... && go test ./internal/stream/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove unused stream SSL dead code"
+```
+
+---
+
+#### Task 1.3: 删除 variable/pool.go 死代码
+
+**Files:**
+- Delete: `internal/variable/pool.go`
+
+- [ ] **Step 1: 确认 pool.go 变量未被使用**
+
+```bash
+grep -r "PoolStats\|gets\.\|puts\.\|newCount\.\|active\." --include="*.go" /home/xfy/Developer/lolly/internal/
+```
+
+Expected: 无引用（除 `pool.go` 自身定义外）。
+
+- [ ] **Step 2: 删除文件**
+
+```bash
+rm /home/xfy/Developer/lolly/internal/variable/pool.go
+```
+
+- [ ] **Step 3: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/variable/... && go test ./internal/variable/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove unused variable pool statistics dead code"
+```
+
+---
+
+#### Task 1.4: 修复 mimeutil defaultMIME 未使用问题
+
+**Files:**
+- Modify: `internal/mimeutil/detect.go:154`
+
+- [ ] **Step 1: 阅读当前 DetectContentType 实现**
+
+Read: `internal/mimeutil/detect.go:95-155`
+
+当前实现：当 `mime.TypeByExtension` 返回空字符串时，直接缓存并返回空字符串，从未使用 `defaultMIME`。
+
+- [ ] **Step 2: 在 DetectContentType 末尾添加 defaultMIME 回退**
+
+```go
+// 在 internal/mimeutil/detect.go 第 154 行（return mimeType 之前）添加：
+
+	if mimeType == "" {
+		defaultMutex.RLock()
+		mimeType = defaultMIME
+		defaultMutex.RUnlock()
+	}
+
+	return mimeType
+```
+
+完整修改后的第 149-158 行应为：
+
+```go
+	// 插入新条目
+	entry := &mimeCacheEntry{ext: ext, mimeType: mimeType}
+	entry.element = mimeLRU.PushFront(entry)
+	mimeCache[ext] = entry
+
+	if mimeType == "" {
+		defaultMutex.RLock()
+		mimeType = defaultMIME
+		defaultMutex.RUnlock()
+	}
+
+	return mimeType
+```
+
+- [ ] **Step 3: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/mimeutil/... && go test ./internal/mimeutil/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "fix: use defaultMIME fallback in DetectContentType"
+```
+
+---
+
+#### Task 1.5: 清理其他静态检查发现的死代码
+
+**Files:**
+- Modify: `internal/app/app_test.go` — 删除未使用的 `customSig`
+- Modify: `internal/app/testutil.go` — 删除未使用的 `setupTestLogger`
+- Modify: `internal/http3/server_test.go` — 删除未使用的 `generateTestCertificate`
+- Modify: `internal/proxy/proxy_dns_test.go` — 删除未使用的方法
+- Modify: `internal/server/testutil.go` — 删除未使用的 `testListenAddr`
+- Modify: `internal/server/upgrade_test.go` — 删除未使用的 `containsString`
+- Modify: `internal/server/pool_bench_test.go` — 删除未使用的 `id` 字段
+- Modify: `internal/stream/stream_test.go` — 删除未使用的 `generateTestCertificate`
+
+- [ ] **Step 1: 运行 staticcheck 获取精确行号**
+
+```bash
+cd /home/xfy/Developer/lolly && staticcheck ./... 2>&1 | grep "U1000"
+```
+
+Expected: 输出每个死代码的精确文件路径和行号。
+
+- [ ] **Step 2: 逐个删除死代码**
+
+对每个 staticcheck 报告的死代码：
+1. 打开文件
+2. 定位到报告的函数/变量/字段
+3. 删除整个未使用的声明
+4. 保存文件
+
+示例（以 `internal/server/testutil.go` 为例）：
+
+```go
+// 删除前：
+const testListenAddr = "127.0.0.1:0"
+
+// 删除后：
+// （整行删除）
+```
+
+- [ ] **Step 3: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./... && go test ./internal/app/... ./internal/http3/... ./internal/proxy/... ./internal/server/... ./internal/stream/...
+```
+
+Expected: 全部通过。
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove unused code identified by staticcheck"
+```
+
+---
+
+### Phase 2: 重复实现消除（P1）
+
+---
+
+#### Task 2.1: 删除 proxy.go 中的 extractHostFromURL，统一使用 netutil
+
+**Files:**
+- Modify: `internal/proxy/proxy.go:362` — 替换调用
+- Modify: `internal/proxy/proxy.go:993-1018` — 删除函数
+- Modify: `internal/proxy/header_modifier.go:33` — 替换调用
+- Modify: `internal/proxy/proxy_coverage_extra_test.go` — 删除测试
+
+- [ ] **Step 1: 修改 proxy.go:362 的调用**
+
+Read: `internal/proxy/proxy.go:360-365`
+
+将：
+```go
+	tlsCfg, err := CreateTLSConfig(sslCfg, extractHostFromURL(targetURL))
+```
+改为：
+```go
+	host, _, _, err := netutil.ParseTargetURL(targetURL, false)
+	if err != nil {
+		return nil, fmt.Errorf("parse target URL %q: %w", targetURL, err)
+	}
+	tlsCfg, err := CreateTLSConfig(sslCfg, host)
+```
+
+并确保文件已导入 `rua.plus/lolly/internal/netutil`。
+
+- [ ] **Step 2: 修改 header_modifier.go:33 的调用**
+
+Read: `internal/proxy/header_modifier.go:30-36`
+
+将：
+```go
+	targetHost := extractHostFromURL(target.URL)
+```
+改为：
+```go
+	targetHost, _, _, err := netutil.ParseTargetURL(target.URL, false)
+	if err != nil {
+		targetHost = target.URL
+	}
+```
+
+并确保文件已导入 `rua.plus/lolly/internal/netutil`。
+
+- [ ] **Step 3: 删除 proxy.go 中的 extractHostFromURL 函数**
+
+删除 `internal/proxy/proxy.go` 第 993-1018 行的整个函数：
+
+```go
+// extractHostFromURL 从 URL 字符串中提取 host:port 部分。
+// ...
+func extractHostFromURL(urlStr string) string {
+	// ...
+}
+```
+
+- [ ] **Step 4: 删除 proxy_coverage_extra_test.go 中的 TestExtractHostFromURL**
+
+Read: `internal/proxy/proxy_coverage_extra_test.go:1426-1480`
+
+删除整个 `TestExtractHostFromURL` 函数及其相关测试用例。
+
+- [ ] **Step 5: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/proxy/... && go test ./internal/proxy/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 6: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove extractHostFromURL, use netutil.ParseTargetURL"
+```
+
+---
+
+#### Task 2.2: 删除 generateETag 包装函数
+
+**Files:**
+- Modify: `internal/handler/static.go:628,832-836`
+- Modify: `internal/cache/file_cache.go:45-49,181`
+
+- [ ] **Step 1: 修改 handler/static.go**
+
+Read: `internal/handler/static.go:626-630`
+
+将：
+```go
+	etag := generateETag(info.ModTime(), info.Size())
+```
+改为：
+```go
+	etag := utils.GenerateETag(info.ModTime(), info.Size())
+```
+
+删除 `internal/handler/static.go` 第 832-836 行的 `generateETag` 函数。
+
+- [ ] **Step 2: 修改 cache/file_cache.go**
+
+Read: `internal/cache/file_cache.go:179-183`
+
+将：
+```go
+	etag := generateETag(modTime, size)
+```
+改为：
+```go
+	etag := utils.GenerateETag(modTime, size)
+```
+
+删除 `internal/cache/file_cache.go` 第 45-49 行的 `generateETag` 函数。
+
+- [ ] **Step 3: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/handler/... ./internal/cache/... && go test ./internal/handler/... ./internal/cache/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: remove redundant generateETag wrappers, use utils.GenerateETag directly"
+```
+
+---
+
+#### Task 2.3: 简化 CheckIPAccess 复用 IPInAllowList
+
+**Files:**
+- Modify: `internal/utils/httperror.go:67-86`
+
+- [ ] **Step 1: 重构 CheckIPAccess**
+
+Read: `internal/utils/httperror.go:67-86`
+
+将：
+```go
+func CheckIPAccess(ctx *fasthttp.RequestCtx, allowed []net.IPNet) bool {
+	if len(allowed) == 0 {
+		return true
+	}
+
+	clientIP := netutil.ExtractClientIPNet(ctx)
+	if clientIP == nil {
+		return false
+	}
+
+	for _, network := range allowed {
+		if network.Contains(clientIP) {
+			return true
+		}
+	}
+
+	return false
+}
+```
+改为：
+```go
+func CheckIPAccess(ctx *fasthttp.RequestCtx, allowed []net.IPNet) bool {
+	if len(allowed) == 0 {
+		return true
+	}
+
+	clientIP := netutil.ExtractClientIPNet(ctx)
+	if clientIP == nil {
+		return false
+	}
+
+	return IPInAllowList(clientIP, allowed)
+}
+```
+
+- [ ] **Step 2: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/utils/... && go test ./internal/utils/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 3: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: simplify CheckIPAccess by reusing IPInAllowList"
+```
+
+---
+
+### Phase 3: 路由和服务器逻辑简化（P1-P2）
+
+---
+
+#### Task 3.1: 简化 router.go 中的冗余 switch 块
+
+**Files:**
+- Modify: `internal/server/router.go:118-145` (`registerProxyRoutesWithLocationEngine`)
+- Modify: `internal/server/router.go:217-234` (`registerStaticHandlersWithLocationEngine`)
+- Modify: `internal/server/router.go:402-423` (`registerLuaRoutesWithLocationEngine`)
+
+- [ ] **Step 1: 简化 registerProxyRoutesWithLocationEngine**
+
+Read: `internal/server/router.go:108-148`
+
+将第 118-145 行的 switch 块替换为：
+
+```go
+	for i := range serverCfg.Proxy {
+		proxyCfg := &serverCfg.Proxy[i]
+		p := s.createProxyForConfig(proxyCfg)
+		if p == nil {
+			continue
+		}
+
+		locType := proxyCfg.LocationType
+		if locType == "" {
+			locType = matcher.LocationTypePrefix
+		}
+
+		path := proxyCfg.Path
+		if locType == matcher.LocationTypeNamed && proxyCfg.LocationName != "" {
+			path = "@" + proxyCfg.LocationName
+		}
+
+		if err := s.registerRoute(locType, path, p.ServeHTTP, proxyCfg.Internal, "proxy"); err != nil {
+			return err
+		}
+	}
+	return nil
+```
+
+- [ ] **Step 2: 简化 registerStaticHandlersWithLocationEngine**
+
+Read: `internal/server/router.go:208-236`
+
+将第 217-234 行的 switch 块替换为类似逻辑（直接调用 `s.registerRoute`）。
+
+- [ ] **Step 3: 简化 registerLuaRoutesWithLocationEngine**
+
+Read: `internal/server/router.go:393-425`
+
+将第 402-423 行的 switch 块替换为类似逻辑（直接调用 `s.registerRoute`）。
+
+- [ ] **Step 4: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/server/... && go test ./internal/server/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: eliminate redundant switch blocks in router.go LocationEngine functions"
+```
+
+---
+
+#### Task 3.2: 提取 server.go 三种启动模式的公共函数
+
+**Files:**
+- Modify: `internal/server/server.go:454-868`
+
+**新增辅助函数（添加到 server.go 末尾，在 SetResolver 之前）：**
+
+- [ ] **Step 1: 提取 `registerMonitoringEndpoints` 函数**
+
+在 `internal/server/server.go` 中新增：
+
+```go
+// registerMonitoringEndpoints 注册状态监控、性能分析和缓存清理端点。
+// isDefault 为 true 时注册所有端点，否则跳过（用于多服务器模式）。
+func (s *Server) registerMonitoringEndpoints(router *handler.Router, serverCfg *config.ServerConfig, isDefault bool) {
+	// 状态监控端点
+	if isDefault && s.config.Monitoring.Status.Enabled {
+		statusHandler, err := NewStatusHandler(s, &s.config.Monitoring.Status)
+		if err != nil {
+			logging.Error().Msg("Failed to create status handler: " + err.Error())
+		} else {
+			router.GET(statusHandler.Path(), statusHandler.ServeHTTP)
+		}
+	}
+
+	// pprof 性能分析端点
+	if isDefault && s.config.Monitoring.Pprof.Enabled {
+		pprofHandler, err := NewPprofHandler(&s.config.Monitoring.Pprof)
+		if err != nil {
+			logging.Error().Msg("Failed to create pprof handler: " + err.Error())
+		} else {
+			router.GET(pprofHandler.Path(), pprofHandler.ServeHTTP)
+			router.GET(pprofHandler.Path()+"/{profile:*}", pprofHandler.ServeHTTP)
+		}
+	}
+
+	// 缓存清理 API
+	if isDefault && serverCfg.CacheAPI != nil && serverCfg.CacheAPI.Enabled {
+		purgeHandler, err := NewPurgeHandler(s, serverCfg.CacheAPI)
+		if err != nil {
+			logging.Error().Msg("Failed to create cache purge handler: " + err.Error())
+		} else {
+			router.POST(purgeHandler.Path(), purgeHandler.ServeHTTP)
+		}
+	}
+}
+```
+
+- [ ] **Step 2: 提取 `wrapHandler` 函数**
+
+```go
+// wrapHandler 应用中间件链、连接池包装和统计追踪。
+func (s *Server) wrapHandler(base fasthttp.RequestHandler, serverCfg *config.ServerConfig) (fasthttp.RequestHandler, error) {
+	chain, err := s.buildMiddlewareChain(serverCfg)
+	if err != nil {
+		return nil, err
+	}
+
+	handler := chain.Apply(base)
+	if s.pool != nil {
+		handler = s.pool.WrapHandler(handler)
+	}
+	handler = s.trackStats(handler)
+	return handler, nil
+}
+```
+
+- [ ] **Step 3: 提取 `startServer` 函数**
+
+```go
+// startServer 创建监听器并启动 fasthttp.Server，支持可选 TLS。
+func (s *Server) startServer(serverCfg *config.ServerConfig, fastSrv *fasthttp.Server) error {
+	ln, err := s.createListener(serverCfg)
+	if err != nil {
+		return fmt.Errorf("failed to listen: %w", err)
+	}
+	s.listeners = append(s.listeners, ln)
+
+	// 检查 SSL/TLS
+	if serverCfg.SSL.Cert != "" && serverCfg.SSL.Key != "" {
+		tlsManager, err := ssl.NewTLSManager(&serverCfg.SSL)
+		if err != nil {
+			return fmt.Errorf("failed to create TLS manager: %w", err)
+		}
+		fastSrv.TLSConfig = tlsManager.GetTLSConfig()
+		return fastSrv.ServeTLS(ln, "", "")
+	}
+
+	return fastSrv.Serve(ln)
+}
+```
+
+- [ ] **Step 4: 重构 startSingleMode 使用新函数**
+
+将 `startSingleMode` 中的监控注册、中间件链构建、fasthttp.Server 创建和启动逻辑替换为对新辅助函数的调用。
+
+重构后的 `startSingleMode` 核心逻辑：
+
+```go
+func (s *Server) startSingleMode() error {
+	serverCfg := &s.config.Servers[0]
+	s.applyTypesConfig(serverCfg)
+
+	s.locationEngine = matcher.NewLocationEngine()
+	s.registerMonitoringEndpointsWithLocationEngine(serverCfg)
+
+	if err := s.registerProxyRoutesWithLocationEngine(serverCfg); err != nil {
+		return err
+	}
+	// ... Lua 和静态文件注册
+
+	s.locationEngine.MarkInitialized()
+
+	baseHandler := func(ctx *fasthttp.RequestCtx) {
+		// LocationEngine 匹配逻辑
+	}
+
+	handler, err := s.wrapHandler(baseHandler, serverCfg)
+	if err != nil {
+		return err
+	}
+	s.handler = handler
+
+	s.fastServer = s.createFastServer(serverCfg, s.handler)
+	s.running.Store(true)
+
+	return s.startServer(serverCfg, s.fastServer)
+}
+```
+
+- [ ] **Step 5: 重构 startVHostMode 使用新函数**
+
+类似地，将 `startVHostMode` 中的重复逻辑替换为对新辅助函数的调用。
+
+- [ ] **Step 6: 重构 startMultiServerMode 使用新函数**
+
+类似地，将 `startMultiServerMode` 中的重复逻辑替换为对新辅助函数的调用。
+
+- [ ] **Step 7: 验证编译和测试**
+
+```bash
+cd /home/xfy/Developer/lolly && go build ./internal/server/... && go test ./internal/server/...
+```
+
+Expected: 编译和测试全部通过。
+
+- [ ] **Step 8: Commit**
+
+```bash
+cd /home/xfy/Developer/lolly && git add -A && git commit -m "refactor: extract common functions from server startup modes"
+```
+
+---
+
+### Phase 4: 负载均衡统一（P3 - 可选/长期）
+
+---
+
+#### Task 4.1: 分析 Stream 和 HTTP 负载均衡的差异
+
+**Files:**
+- Read: `internal/stream/stream.go:61-285`
+- Read: `internal/loadbalance/balancer.go:101-273`
+
+- [ ] **Step 1: 对比两种实现的差异**
+
+重点关注：
+- Stream 版本使用 `sync.Pool` 优化，HTTP 版本没有
+- HTTP 版本有 `SelectExcluding` 方法，Stream 版本没有
+- 两者 Target 类型不同（Stream 用 `string`，HTTP 用 `*Target`）
+
+- [ ] **Step 2: 决策是否统一**
+
+如果差异较小，建议：
+1. 在 `internal/loadbalance` 中定义接口
+2. Stream 复用 HTTP 的实现，只保留 `sync.Pool` 优化作为可选项
+
+如果差异较大，建议：
+1. 保持现状
+2. 在文档中注明重复，待架构演进时统一
+
+---
+
+## 验证清单
+
+每阶段完成后运行：
+
+```bash
+# 1. 编译检查
+cd /home/xfy/Developer/lolly && go build ./...
+
+# 2. 静态分析
+cd /home/xfy/Developer/lolly && staticcheck ./...
+
+# 3. 单元测试
+cd /home/xfy/Developer/lolly && go test ./internal/...
+
+# 4. 完整测试套件
+cd /home/xfy/Developer/lolly && make test
+```
+
+Expected: 
+- `go build ./...` — 无错误
+- `staticcheck ./...` — 无新的警告
+- `go test ./internal/...` — 全部通过
+- `make test` — 全部通过
+
+---
+
+## 回滚策略
+
+每个 Task 完成后立即 commit。如需回滚：
+
+```bash
+# 回滚单个 Task
+git revert <commit-hash>
+
+# 回滚整个 Phase
+git revert <phase-first-commit>..<phase-last-commit>
+```
+
+---
+
+## 风险评估
+
+| 任务 | 风险等级 | 影响范围 | 缓解措施 |
+|------|----------|----------|----------|
+| Task 1.1-1.5 | 极低 | 仅删除死代码 | 编译和测试验证 |
+| Task 2.1-2.3 | 低 | 替换函数调用 | 全量测试 |
+| Task 3.1 | 低 | router.go 内部重构 | server 包测试 |
+| Task 3.2 | 中 | server.go 核心逻辑 | 完整回归测试 |
+| Task 4.1 | 中 | 架构变更 | 延后到单独迭代 |
+
+---
+
+*Plan generated: 2026-06-03*
+*Estimated effort: 4-6 hours for Phases 1-3, 2-4 hours for Phase 4*
diff --git a/docs/superpowers/plans/2026-06-04-performance-optimization.md b/docs/superpowers/plans/2026-06-04-performance-optimization.md
new file mode 100644
index 0000000..d02aa27
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-04-performance-optimization.md
@@ -0,0 +1,820 @@
+# 性能热路径优化 Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 消除 6 个已确认的热路径性能瓶颈，减少每请求堆分配和锁竞争。
+
+**Architecture:** 针对 loadbalance filterHealthy（每请求分配）、RadixTree 堆分配、DNS LRU O(n) 操作、FileInfoCache 双重锁升级、ConsistentHash 双重锁、IsAvailable mutex 逐个进行激进优化。每项优化独立可测，不改变外部接口。
+
+**Tech Stack:** Go 1.26+, sync.Pool, container/list, atomic operations, unsafe pointer (b2s/s2b)
+
+---
+
+## Task 1: loadbalance — filterHealthy 零分配优化
+
+**Files:**
+- Modify: `internal/loadbalance/balancer.go`
+- Test: `internal/loadbalance/balancer_test.go`
+- Benchmark: `internal/loadbalance/balancer_bench_test.go`
+
+**问题**: `filterHealthy` 每次调用分配 2 个切片（`available` + `backups`），`filterHealthyAndExclude` 分配 3 个（加 `excludeSet` map）。`IPHash.SelectByIP` 额外分配 `fnv.New64a()` 对象。这些在每个请求的负载均衡选择中触发。
+
+**方案**: 引入 `filterContext` 结构体持有可复用缓冲区，通过 `sync.Pool` 管理。`filterHealthy` 改为写入 `filterContext` 的预分配切片而非每次 `make`。IPHash 使用内联 FNV-64a 哈希避免 `fnv.New64a()` 分配。
+
+- [ ] **Step 1: 定义 filterContext 和 Pool**
+
+在 `balancer.go` 中添加：
+
+```go
+type filterContext struct {
+	available []*Target
+	backups   []*Target
+	excludeSet map[string]bool
+}
+
+var filterContextPool = sync.Pool{
+	New: func() any {
+		return &filterContext{
+			available:  make([]*Target, 0, 64),
+			backups:    make([]*Target, 0, 64),
+			excludeSet: make(map[string]bool, 8),
+		}
+	},
+}
+
+func acquireFilterContext() *filterContext {
+	fc := filterContextPool.Get().(*filterContext)
+	return fc
+}
+
+func releaseFilterContext(fc *filterContext) {
+	fc.available = fc.available[:0]
+	fc.backups = fc.backups[:0]
+	for k := range fc.excludeSet {
+		delete(fc.excludeSet, k)
+	}
+	filterContextPool.Put(fc)
+}
+```
+
+- [ ] **Step 2: 重写 filterHealthy 为 filterInto**
+
+```go
+func filterInto(fc *filterContext, targets []*Target) []*Target {
+	for _, t := range targets {
+		if !t.IsAvailable() {
+			continue
+		}
+		if t.IsBackup() {
+			fc.backups = append(fc.backups, t)
+		} else {
+			fc.available = append(fc.available, t)
+		}
+	}
+	if len(fc.available) > 0 {
+		return fc.available
+	}
+	return fc.backups
+}
+```
+
+- [ ] **Step 3: 重写 filterHealthyAndExclude 为 filterIntoExcluding**
+
+```go
+func filterIntoExcluding(fc *filterContext, targets []*Target, excluded []*Target) []*Target {
+	if len(excluded) > 0 {
+		for _, t := range excluded {
+			if t != nil {
+				fc.excludeSet[t.URL] = true
+			}
+		}
+	}
+	for _, t := range targets {
+		if !t.IsAvailable() || fc.excludeSet[t.URL] {
+			continue
+		}
+		if t.IsBackup() {
+			fc.backups = append(fc.backups, t)
+		} else {
+			fc.available = append(fc.available, t)
+		}
+	}
+	if len(fc.available) > 0 {
+		return fc.available
+	}
+	return fc.backups
+}
+```
+
+- [ ] **Step 4: 添加内联 FNV-64a 哈希函数**
+
+避免 `fnv.New64a()` 的堆分配：
+
+```go
+func fnvHash64a(key string) uint64 {
+	var h uint64 = 14695981039346656037
+	for i := 0; i < len(key); i++ {
+		h ^= uint64(key[i])
+		h *= 1099511628211
+	}
+	return h
+}
+```
+
+- [ ] **Step 5: 重写所有 Balancer 的 Select/SelectExcluding 使用 Pool**
+
+RoundRobin 示例：
+```go
+func (r *RoundRobin) Select(targets []*Target) *Target {
+	fc := acquireFilterContext()
+	defer releaseFilterContext(fc)
+	healthy := filterInto(fc, targets)
+	if len(healthy) == 0 {
+		return nil
+	}
+	idx := r.counter.Add(1) - 1
+	return healthy[idx%uint64(len(healthy))]
+}
+```
+
+对所有 6 个算法的 `Select`/`SelectExcluding` 方法应用相同模式。
+IPHash 中将 `fnv.New64a()` + `h.Write()` + `h.Sum64()` 替换为 `fnvHash64a(clientIP)`。
+ConsistentHash 中 `hashKeyString` 也替换为 `fnvHash64a`。
+
+- [ ] **Step 6: 保留旧函数作为兼容别名（可选）**
+
+保留 `filterHealthy` 和 `filterHealthyAndExclude` 函数签名但标记 `// Deprecated`，内部调用新实现，确保外部调用方不受影响。如果没有外部调用方，可直接删除。
+
+- [ ] **Step 7: 运行现有测试验证正确性**
+
+```bash
+go test -v -count=1 ./internal/loadbalance/...
+```
+
+预期：全部 PASS，无行为变化。
+
+- [ ] **Step 8: 运行基准测试验证性能提升**
+
+```bash
+go test -bench=BenchmarkAllBalancers -benchmem -count=5 ./internal/loadbalance/...
+```
+
+预期：allocs/op 从 2-3 降低到 0-1。
+
+- [ ] **Step 9: 提交**
+
+```bash
+git add internal/loadbalance/balancer.go internal/loadbalance/random.go internal/loadbalance/consistent_hash.go
+git commit -m "perf(loadbalance): eliminate per-request allocations in filterHealthy with sync.Pool"
+```
+
+---
+
+## Task 2: loadbalance — IsAvailable 无锁化
+
+**Files:**
+- Modify: `internal/loadbalance/balancer.go`
+- Test: `internal/loadbalance/balancer_test.go`
+
+**问题**: `IsAvailable()` 在 `MaxFails > 0` 时获取 `failMu` mutex。这发生在 `filterHealthy`/`filterInto` 的每次目标遍历中，意味着每次 LB Select 都会对每个目标加锁一次。
+
+**方案**: 将 `failCount` 和 `failedUntil` 改为 atomic 操作，消除 `failMu` mutex。使用 CAS 循环实现 `RecordFailure` 和冷却重置。
+
+- [ ] **Step 1: 修改 Target 字段为 atomic**
+
+```go
+type Target struct {
+	// ... 保留其他字段 ...
+	failCount   atomic.Int64
+	failedUntil atomic.Int64
+	// 删除: failMu sync.Mutex
+}
+```
+
+- [ ] **Step 2: 重写 IsAvailable 为无锁版本**
+
+```go
+func (t *Target) IsAvailable() bool {
+	if !t.Healthy.Load() || t.Down {
+		return false
+	}
+	if t.MaxConns > 0 && atomic.LoadInt64(&t.Connections) >= t.MaxConns {
+		return false
+	}
+	if t.MaxFails > 0 {
+		failCount := t.failCount.Load()
+		if failCount >= t.MaxFails {
+			failedUntil := t.failedUntil.Load()
+			if time.Now().UnixNano() < failedUntil {
+				return false
+			}
+			// 冷却已过期，尝试重置（允许竞争，不影响正确性）
+			if failedUntil > 0 {
+				t.failCount.Store(0)
+				t.failedUntil.Store(0)
+			}
+		}
+	}
+	return true
+}
+```
+
+- [ ] **Step 3: 重写 RecordFailure 和 RecordSuccess 为无锁版本**
+
+```go
+func (t *Target) RecordFailure() int64 {
+	if t.MaxFails <= 0 {
+		return 0
+	}
+	count := t.failCount.Add(1)
+	if count >= t.MaxFails {
+		timeout := t.FailTimeout
+		if timeout <= 0 {
+			timeout = 10 * time.Second
+		}
+		t.failedUntil.Store(time.Now().Add(timeout).UnixNano())
+	}
+	return count
+}
+
+func (t *Target) RecordSuccess() {
+	if t.MaxFails <= 0 {
+		return
+	}
+	t.failCount.Store(0)
+	t.failedUntil.Store(0)
+}
+```
+
+- [ ] **Step 4: 运行测试**
+
+```bash
+go test -v -count=1 -run=TestTarget ./internal/loadbalance/...
+```
+
+预期：全部 PASS。
+
+- [ ] **Step 5: 运行完整包测试**
+
+```bash
+go test -v -count=1 ./internal/loadbalance/...
+```
+
+- [ ] **Step 6: 提交**
+
+```bash
+git add internal/loadbalance/balancer.go
+git commit -m "perf(loadbalance): replace failMu mutex with atomic operations in IsAvailable"
+```
+
+---
+
+## Task 3: matcher — RadixTree 零分配搜索
+
+**Files:**
+- Modify: `internal/matcher/radix.go`
+- Test: `internal/matcher/radix_test.go`, `internal/matcher/integration_test.go`
+- Benchmark: 新建 `internal/matcher/radix_bench_test.go`
+
+**问题**: `searchLongest` 递归搜索中，每次遇到带 handler 的节点都分配 `&MatchResult{}`，一次查找可能分配 N 个 MatchResult 但只保留 1 个。正则匹配器 `GetCaptures` 每次分配 `map[string]string`。
+
+**方案**: 使用 `sync.Pool` 复用 MatchResult。引入 `searchState` 避免递归中的多次分配，改为栈式迭代或就地更新最佳匹配。
+
+- [ ] **Step 1: 添加 MatchResult Pool**
+
+在 `radix.go` 中添加：
+
+```go
+var matchResultPool = sync.Pool{
+	New: func() any {
+		return &MatchResult{}
+	},
+}
+```
+
+- [ ] **Step 2: 重写 searchLongest 为就地更新最佳匹配**
+
+将递归中创建 newMatch 改为直接比较节点字段，仅在最终返回时从池中获取 MatchResult：
+
+```go
+func (t *RadixTree) searchLongest(node *RadixNode, path string, bestNode *RadixNode, bestPrefixLen int) *RadixNode {
+	if node == nil || path == "" {
+		return bestNode
+	}
+	if !strings.HasPrefix(path, node.prefix) {
+		return bestNode
+	}
+	remaining := path[len(node.prefix):]
+	if node.handler != nil {
+		if bestNode == nil || node.priority < bestNode.priority {
+			bestNode = node
+		} else if node.priority == bestNode.priority && len(node.prefix) > bestPrefixLen {
+			bestNode = node
+		}
+	}
+	for _, child := range node.children {
+		bestNode = t.searchLongest(child, remaining, bestNode, bestPrefixLen)
+	}
+	return bestNode
+}
+```
+
+- [ ] **Step 3: 修改 FindLongestPrefix 在返回时构建 MatchResult**
+
+```go
+func (t *RadixTree) FindLongestPrefix(path string) *MatchResult {
+	bestNode := t.searchLongest(t.root, path, nil, 0)
+	if bestNode == nil {
+		return nil
+	}
+	result := matchResultPool.Get().(*MatchResult)
+	result.Handler = bestNode.handler
+	result.Path = bestNode.prefix
+	result.Priority = bestNode.priority
+	result.LocationType = bestNode.locationType
+	result.Internal = bestNode.internal
+	return result
+}
+```
+
+注意：调用方使用完 MatchResult 后需调用 `PutMatchResult(result)` 归还池。
+
+- [ ] **Step 4: 添加 ReleaseMatchResult 函数供调用方使用**
+
+```go
+func ReleaseMatchResult(r *MatchResult) {
+	if r == nil {
+		return
+	}
+	r.Handler = nil
+	r.Captures = nil
+	r.Path = ""
+	r.LocationType = ""
+	r.Internal = false
+	r.Priority = 0
+	matchResultPool.Put(r)
+}
+```
+
+- [ ] **Step 5: 更新 LocationEngine.Match 调用 FindLongestPrefix 后释放**
+
+在 `location.go` 中，确保所有 `FindLongestPrefix` 返回值在函数结束前调用 `ReleaseMatchResult`（需分析调用链确认所有权）。
+
+- [ ] **Step 6: 添加基准测试文件**
+
+创建 `internal/matcher/radix_bench_test.go`：
+
+```go
+func BenchmarkRadixTreeFindLongestPrefix(b *testing.B) {
+	tree := NewRadixTree()
+	paths := []string{"/", "/api", "/api/v1", "/api/v1/users", "/api/v1/users/:id", "/static", "/static/css", "/static/js", "/health", "/favicon.ico"}
+	for _, p := range paths {
+		tree.Insert(p, func(ctx *fasthttp.RequestCtx) {}, 0, "prefix", false)
+	}
+	tree.MarkInitialized()
+
+	b.ResetTimer()
+	b.ReportAllocs()
+	for b.Loop() {
+		result := tree.FindLongestPrefix("/api/v1/users/123")
+		ReleaseMatchResult(result)
+	}
+}
+
+func BenchmarkRadixTreeFindLongestPrefixParallel(b *testing.B) {
+	// 同上但用 b.RunParallel
+}
+```
+
+- [ ] **Step 7: 运行所有 matcher 测试**
+
+```bash
+go test -v -count=1 ./internal/matcher/...
+```
+
+- [ ] **Step 8: 运行基准测试**
+
+```bash
+go test -bench=BenchmarkRadixTree -benchmem ./internal/matcher/...
+```
+
+预期：allocs/op 从 N（匹配路径上的 handler 节点数）降低到 1（仅池获取）。
+
+- [ ] **Step 9: 提交**
+
+```bash
+git add internal/matcher/radix.go internal/matcher/radix_bench_test.go
+git commit -m "perf(matcher): eliminate heap allocations in RadixTree search with sync.Pool"
+```
+
+---
+
+## Task 4: resolver — LRU 从 O(n) 切换到 O(1)
+
+**Files:**
+- Modify: `internal/resolver/resolver.go`, `internal/resolver/cache.go`
+- Test: `internal/resolver/resolver_test.go`, `internal/resolver/mock_dns_test.go`
+- Benchmark: `internal/resolver/resolver_bench_test.go`
+
+**问题**: DNS 缓存的 LRU 使用 `[]string` 切片实现 `moveToFrontLocked`，每次操作 O(n) 线性扫描 + 切片重组。`storeCache` 持有写锁执行整个 O(n) 操作，阻塞所有并发读。
+
+**方案**: 将 LRU 从 `[]string` 切片替换为 `container/list` + `map[string]*list.Element`（与 FileCache 和 FileInfoCache 的模式一致）。moveToFront 和 eviction 都变为 O(1)。
+
+- [ ] **Step 1: 修改 DNSResolver 结构体**
+
+```go
+type DNSResolver struct {
+	config       *config.ResolverConfig
+	stopCh       chan struct{}
+	refreshHosts map[string]struct{}
+	cache        map[string]*DNSCacheEntry
+	lruList      *list.List                    // 替代 lruOrder []string
+	lruIndex     map[string]*list.Element      // 新增：host -> list.Element
+	hits         atomic.Int64
+	misses       atomic.Int64
+	errors       atomic.Int64
+	latencyNs    atomic.Int64
+	count        atomic.Int64
+	mu           sync.RWMutex
+	serverIdx    atomic.Uint32
+	started      atomic.Bool
+}
+```
+
+- [ ] **Step 2: 重写 storeCache**
+
+```go
+func (r *DNSResolver) storeCache(host string, entry *DNSCacheEntry) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+
+	if elem, ok := r.lruIndex[host]; ok {
+		r.cache[host] = entry
+		r.lruList.MoveToFront(elem)
+		return
+	}
+
+	if r.config.CacheSize > 0 && len(r.cache) >= r.config.CacheSize {
+		r.evictLRULocked()
+	}
+
+	r.cache[host] = entry
+	elem := r.lruList.PushFront(host)
+	r.lruIndex[host] = elem
+}
+```
+
+- [ ] **Step 3: 重写 evictLRULocked**
+
+```go
+func (r *DNSResolver) evictLRULocked() {
+	oldest := r.lruList.Back()
+	if oldest == nil {
+		return
+	}
+	host := oldest.Value.(string)
+	delete(r.cache, host)
+	delete(r.lruIndex, host)
+	r.lruList.Remove(oldest)
+}
+```
+
+- [ ] **Step 4: 删除 moveToFrontLocked**（不再需要，由 `lruList.MoveToFront` 替代）
+
+- [ ] **Step 5: 更新 New() 构造函数**
+
+```go
+return &DNSResolver{
+	config:       &configCopy,
+	stopCh:       make(chan struct{}),
+	refreshHosts: make(map[string]struct{}),
+	cache:        make(map[string]*DNSCacheEntry),
+	lruList:      list.New(),
+	lruIndex:     make(map[string]*list.Element),
+}
+```
+
+- [ ] **Step 6: 更新 DeleteCacheEntry**
+
+```go
+func (r *DNSResolver) DeleteCacheEntry(host string) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	delete(r.cache, host)
+	if elem, ok := r.lruIndex[host]; ok {
+		r.lruList.Remove(elem)
+		delete(r.lruIndex, host)
+	}
+	delete(r.refreshHosts, host)
+}
+```
+
+- [ ] **Step 7: 更新 ClearCache**
+
+```go
+func (r *DNSResolver) ClearCache() {
+	r.mu.Lock()
+	r.cache = make(map[string]*DNSCacheEntry)
+	r.lruList = list.New()
+	r.lruIndex = make(map[string]*list.Element)
+	r.refreshHosts = make(map[string]struct{})
+	r.mu.Unlock()
+}
+```
+
+- [ ] **Step 8: 添加 import "container/list"**
+
+- [ ] **Step 9: 运行所有 resolver 测试**
+
+```bash
+go test -v -count=1 ./internal/resolver/...
+```
+
+- [ ] **Step 10: 运行基准测试验证**
+
+```bash
+go test -bench=BenchmarkDNS -benchmem -count=5 ./internal/resolver/...
+```
+
+预期：`BenchmarkDNSResolverCacheWriteLock` 和 `BenchmarkDNSResolverMixedWorkload` 显著提速。
+
+- [ ] **Step 11: 提交**
+
+```bash
+git add internal/resolver/resolver.go internal/resolver/cache.go
+git commit -m "perf(resolver): replace slice-based LRU with container/list for O(1) operations"
+```
+
+---
+
+## Task 5: handler — FileInfoCache 近似 LRU 消除读锁升级
+
+**Files:**
+- Modify: `internal/handler/fileinfo_cache.go`
+- Test: `internal/handler/static_test.go`（间接，通过现有测试验证）
+- Benchmark: `internal/handler/static_bench_test.go`
+
+**问题**: `FileInfoCache.Get()` 在每次缓存命中时需要 **两次锁获取**：先 RLock 检查存在性和 TTL，然后释放 RLock，再 Lock 做 `MoveToFront` LRU 更新。每次命中都有 RLock→Lock 升级。
+
+**方案**: 采用近似 LRU 策略——Get 路径跳过 `MoveToFront`，仅 RLock 快速路径返回。仅在 Set 路径（写操作）时更新 LRU 位置。这与 FileCache 的近似 LRU 策略一致。
+
+- [ ] **Step 1: 重写 Get 为纯 RLock 快速路径**
+
+```go
+func (c *FileInfoCache) Get(filePath string) (os.FileInfo, bool) {
+	c.mu.RLock()
+	entry, ok := c.entries[filePath]
+	if !ok {
+		c.mu.RUnlock()
+		return nil, false
+	}
+	if time.Since(entry.cachedAt) > fileInfoCacheTTL {
+		c.mu.RUnlock()
+		// 过期删除仍需写锁
+		c.mu.Lock()
+		if e, ok := c.entries[filePath]; ok && time.Since(e.cachedAt) > fileInfoCacheTTL {
+			c.lruList.Remove(e.element)
+			delete(c.entries, filePath)
+		}
+		c.mu.Unlock()
+		return nil, false
+	}
+	info := entry.info
+	c.mu.RUnlock()
+	return info, true
+}
+```
+
+- [ ] **Step 2: 在 Set 中添加 LRU 位置更新**
+
+```go
+func (c *FileInfoCache) Set(filePath string, info os.FileInfo) {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+
+	if entry, ok := c.entries[filePath]; ok {
+		entry.info = info
+		entry.cachedAt = time.Now()
+		c.lruList.MoveToFront(entry.element)
+		return
+	}
+	// ... 淘汰和插入逻辑不变 ...
+}
+```
+
+- [ ] **Step 3: 添加 FileInfoCache 专项基准测试**
+
+在 `internal/handler/static_bench_test.go` 中添加：
+
+```go
+func BenchmarkFileInfoCacheGetHit(b *testing.B) {
+	cache := NewFileInfoCache()
+	info, _ := os.Stat("testdata/style.css")
+	cache.Set("/style.css", info)
+
+	b.ResetTimer()
+	b.ReportAllocs()
+	for b.Loop() {
+		cache.Get("/style.css")
+	}
+}
+
+func BenchmarkFileInfoCacheGetHitParallel(b *testing.B) {
+	cache := NewFileInfoCache()
+	info, _ := os.Stat("testdata/style.css")
+	cache.Set("/style.css", info)
+
+	b.ResetTimer()
+	b.ReportAllocs()
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			cache.Get("/style.css")
+		}
+	})
+}
+```
+
+注意：需确认 `NewFileInfoCache` 是否已导出，若未导出则在包内测试。
+
+- [ ] **Step 4: 运行所有 handler 测试**
+
+```bash
+go test -v -count=1 ./internal/handler/...
+```
+
+- [ ] **Step 5: 运行基准测试**
+
+```bash
+go test -bench=BenchmarkFileInfoCache -benchmem ./internal/handler/...
+```
+
+预期：Get hit 路径从 2 次锁操作降到 1 次 RLock，并行吞吐显著提升。
+
+- [ ] **Step 6: 提交**
+
+```bash
+git add internal/handler/fileinfo_cache.go internal/handler/static_bench_test.go
+git commit -m "perf(handler): eliminate read-lock upgrade in FileInfoCache.Get with approximate LRU"
+```
+
+---
+
+## Task 6: loadbalance — ConsistentHash 消除双重锁
+
+**Files:**
+- Modify: `internal/loadbalance/consistent_hash.go`
+- Test: `internal/loadbalance/balancer_test.go`
+
+**问题**: `SelectByKey` 和 `SelectExcludingByKey` 在发现 `circle` 为空时执行 `RLock → RUnlock → rebuildCircle(Lock) → RLock`，即释放读锁、获取写锁重建、再获取读锁。在冷启动高并发时，多个 goroutine 可能同时触发 rebuild。
+
+**方案**: 使用 `sync.Once` 或 `atomic.Bool` 保证 rebuild 只执行一次。在首次 Select 前完成 rebuild，后续调用直接 RLock 读取。同时将 `hashKeyString` 替换为内联 `fnvHash64a`（Task 1 中已定义）。
+
+- [ ] **Step 1: 添加 rebuildOnce 字段**
+
+```go
+type ConsistentHash struct {
+	circle       map[uint64]*Target
+	hashKey      string
+	sortedHashes []uint64
+	virtualNodes int
+	mu           sync.RWMutex
+	rebuilt      atomic.Bool
+}
+```
+
+- [ ] **Step 2: 重写 SelectByKey 使用 ensureRebuilt**
+
+```go
+func (c *ConsistentHash) ensureRebuilt(targets []*Target) {
+	if c.rebuilt.Load() {
+		return
+	}
+	c.rebuildCircle(targets)
+}
+
+func (c *ConsistentHash) SelectByKey(targets []*Target, key string) *Target {
+	c.ensureRebuilt(targets)
+
+	c.mu.RLock()
+	defer c.mu.RUnlock()
+
+	if len(c.sortedHashes) == 0 {
+		return nil
+	}
+
+	hash := fnvHash64a(key)
+	idx := sort.Search(len(c.sortedHashes), func(i int) bool {
+		return c.sortedHashes[i] >= hash
+	})
+	if idx >= len(c.sortedHashes) {
+		idx = 0
+	}
+	return c.circle[c.sortedHashes[idx]]
+}
+```
+
+- [ ] **Step 3: 更新 Rebuild 方法重置 rebuilt 标志**
+
+```go
+func (c *ConsistentHash) Rebuild(targets []*Target) {
+	c.rebuilt.Store(false)
+	c.rebuildCircle(targets)
+}
+```
+
+- [ ] **Step 4: 更新 rebuildCircle 设置 rebuilt 标志**
+
+```go
+func (c *ConsistentHash) rebuildCircle(targets []*Target) {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	// ... 现有逻辑不变 ...
+	c.rebuilt.Store(true)
+}
+```
+
+- [ ] **Step 5: 同样更新 SelectExcludingByKey**
+
+移除内部的 `RLock → RUnlock → rebuildCircle → RLock` 模式，改为先 `ensureRebuilt` 再 `RLock`。
+
+- [ ] **Step 6: 将 hashKeyString 替换为 fnvHash64a**
+
+```go
+// 删除 hashKeyString 方法
+// 在 PrecomputeHashes 中将 c.hashKeyString(key) 替换为 fnvHash64a(key)
+```
+
+- [ ] **Step 7: 运行测试**
+
+```bash
+go test -v -count=1 ./internal/loadbalance/...
+```
+
+- [ ] **Step 8: 运行基准测试**
+
+```bash
+go test -bench=BenchmarkConsistentHash -benchmem ./internal/loadbalance/...
+```
+
+- [ ] **Step 9: 提交**
+
+```bash
+git add internal/loadbalance/consistent_hash.go
+git commit -m "perf(loadbalance): eliminate double-lock in ConsistentHash with atomic rebuild guard"
+```
+
+---
+
+## Task 7: 全局验证与基准对比
+
+**Files:**
+- 无新文件修改
+
+- [ ] **Step 1: 运行完整测试套件**
+
+```bash
+make test
+```
+
+- [ ] **Step 2: 运行集成测试**
+
+```bash
+make test-integration
+```
+
+- [ ] **Step 3: 运行代码格式化和静态检查**
+
+```bash
+make fmt && make lint
+```
+
+- [ ] **Step 4: 保存基准对比结果**
+
+```bash
+make bench-stat
+mv benchmark-current.txt bench-after-optimization.txt
+```
+
+如有优化前的基准数据，运行 `benchstat bench-before.txt bench-after-optimization.txt` 对比。
+
+- [ ] **Step 5: 最终提交（如有 lint 修复）**
+
+```bash
+git add -A
+git commit -m "chore: lint fixes after performance optimization"
+```
+
+---
+
+## 依赖关系
+
+```
+Task 1 (filterHealthy Pool) ──→ Task 6 (ConsistentHash，复用 fnvHash64a)
+Task 2 (IsAvailable atomic) ──→ 无依赖（可并行）
+Task 3 (RadixTree Pool)     ──→ 无依赖（可并行）
+Task 4 (Resolver LRU)        ──→ 无依赖（可并行）
+Task 5 (FileInfoCache)       ──→ 无依赖（可并行）
+Task 7 (全局验证)            ──→ 依赖 Task 1-6 全部完成
+```
+
+**推荐并行执行**: Task 1+2 可同一批（同一文件），Task 3/4/5 可并行，Task 6 在 Task 1 后执行。
diff --git a/docs/superpowers/plans/2026-06-08-loadbalance-enhancement.md b/docs/superpowers/plans/2026-06-08-loadbalance-enhancement.md
new file mode 100644
index 0000000..6dc8e22
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-08-loadbalance-enhancement.md
@@ -0,0 +1,1620 @@
+# Least Time & Session Sticky Load Balancer Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 为 Lolly 实现高性能的 Least Time 负载均衡算法和 Session Sticky 会话保持功能
+
+**Architecture:** Least Time 使用原子 EWMA 统计器记录每个后端的响应时间，选择响应时间最短的目标；Session Sticky 使用 256 分片锁 + Cookie 路由表实现会话保持
+
+**Tech Stack:** Go 1.26+, fasthttp, atomic operations, sync.RWMutex
+
+---
+
+## File Structure
+
+### New Files
+- `internal/loadbalance/ewma.go` - 原子 EWMA 统计器
+- `internal/loadbalance/ewma_test.go` - EWMA 测试
+- `internal/loadbalance/least_time.go` - Least Time balancer
+- `internal/loadbalance/least_time_test.go` - Least Time 测试
+- `internal/loadbalance/sticky.go` - Session Sticky balancer
+- `internal/loadbalance/sticky_test.go` - Session Sticky 测试
+- `internal/loadbalance/sticky_config.go` - Sticky 配置结构体
+
+### Modified Files
+- `internal/loadbalance/algorithms.go` - 添加新算法到 validAlgorithms
+- `internal/loadbalance/balancer.go` - Target 增加 Stats 字段
+- `internal/config/proxy_config.go` - 添加 LeastTimeConfig + StickyConfig
+- `internal/config/defaults.go` - 添加默认配置注释
+- `internal/config/validate.go` - 验证新配置项
+- `internal/proxy/proxy.go` - 集成 createBalancer + RecordResponseTime
+- `internal/proxy/target_selector.go` - Select 支持 StickySession
+
+---
+
+## Task 1: EWMA Statistics Core
+
+**Files:**
+- Create: `internal/loadbalance/ewma.go`
+- Create: `internal/loadbalance/ewma_test.go`
+
+### Step 1.1: Write EWMA Failing Test
+
+```go
+package loadbalance
+
+import (
+    "sync"
+    "testing"
+    "time"
+)
+
+func TestEWMAStats_BasicRecord(t *testing.T) {
+    stats := NewEWMAStats()
+    
+    // Record a 100ms response time
+    stats.Record(100*time.Millisecond, 200*time.Millisecond)
+    
+    headerTime := stats.HeaderTime()
+    lastByteTime := stats.LastByteTime()
+    
+    if headerTime == 0 {
+        t.Error("headerTime should not be zero after recording")
+    }
+    if lastByteTime == 0 {
+        t.Error("lastByteTime should not be zero after recording")
+    }
+    
+    // First sample: avg should equal the sample (alpha=1.0 for first sample)
+    if headerTime != 100*time.Millisecond {
+        t.Errorf("first headerTime = %v, want %v", headerTime, 100*time.Millisecond)
+    }
+    if lastByteTime != 200*time.Millisecond {
+        t.Errorf("first lastByteTime = %v, want %v", lastByteTime, 200*time.Millisecond)
+    }
+}
+
+func TestEWMAStats_Convergence(t *testing.T) {
+    stats := NewEWMAStats()
+    
+    // Record multiple samples
+    for i := 0; i < 10; i++ {
+        stats.Record(100*time.Millisecond, 200*time.Millisecond)
+    }
+    
+    headerTime := stats.HeaderTime()
+    
+    // After many identical samples, avg should converge close to the value
+    // With alpha=0.3, after 10 samples of 100ms, should be close to 100ms
+    diff := headerTime - 100*time.Millisecond
+    if diff < 0 {
+        diff = -diff
+    }
+    if diff > 10*time.Millisecond {
+        t.Errorf("headerTime = %v, not converged to 100ms (diff=%v)", headerTime, diff)
+    }
+}
+
+func TestEWMAStats_Concurrent(t *testing.T) {
+    stats := NewEWMAStats()
+    
+    var wg sync.WaitGroup
+    for i := 0; i < 100; i++ {
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            for j := 0; j < 100; j++ {
+                stats.Record(time.Duration(j)*time.Millisecond, time.Duration(j*2)*time.Millisecond)
+            }
+        }()
+    }
+    wg.Wait()
+    
+    // After concurrent writes, should have some value (not panic or race)
+    headerTime := stats.HeaderTime()
+    lastByteTime := stats.LastByteTime()
+    
+    if headerTime == 0 {
+        t.Error("headerTime should not be zero after concurrent writes")
+    }
+    if lastByteTime == 0 {
+        t.Error("lastByteTime should not be zero after concurrent writes")
+    }
+}
+```
+
+### Step 1.2: Run EWMA Test - Verify Fails
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestEWMAStats`
+Expected: FAIL with "undefined: NewEWMAStats"
+
+### Step 1.3: Implement EWMA Core
+
+```go
+package loadbalance
+
+import (
+    "sync/atomic"
+    "time"
+)
+
+// EWMAStats 使用原子操作实现的 EWMA（指数加权移动平均）统计器。
+//
+// 通过定点数运算避免浮点数，实现零锁、零分配的响应时间统计。
+type EWMAStats struct {
+    headerTime   atomic.Int64 // 首字节时间的 EWMA（纳秒）
+    lastByteTime atomic.Int64 // 完整响应时间的 EWMA（纳秒）
+    sampleCount  atomic.Int64 // 样本计数
+}
+
+// defaultAlpha 默认 EWMA alpha 值（30%，使用定点数 300/1000）
+const defaultAlphaScale = 300 // alpha = 0.3
+
+// NewEWMAStats 创建新的 EWMA 统计器
+func NewEWMAStats() *EWMAStats {
+    return &EWMAStats{}
+}
+
+// Record 记录一次响应时间样本。
+//
+// 使用原子操作无锁更新 EWMA：
+//   - 第一个样本直接设为当前值
+//   - 后续样本：new_avg = alpha * new + (1 - alpha) * old
+//
+// 参数：
+//   - headerTime: 首字节时间
+//   - lastByteTime: 完整响应时间
+func (e *EWMAStats) Record(headerTime, lastByteTime time.Duration) {
+    e.recordAtomic(&e.headerTime, headerTime)
+    e.recordAtomic(&e.lastByteTime, lastByteTime)
+    e.sampleCount.Add(1)
+}
+
+// recordAtomic 原子更新单个 EWMA 值
+func (e *EWMAStats) recordAtomic(ptr *atomic.Int64, newValue time.Duration) {
+    newNano := newValue.Nanoseconds()
+    
+    for {
+        old := ptr.Load()
+        if old == 0 {
+            // 首次记录，直接设置
+            if ptr.CompareAndSwap(0, newNano) {
+                return
+            }
+            continue
+        }
+        
+        // EWMA: new = alpha * new + (1 - alpha) * old
+        // 使用定点数：alphaScale = 300 (0.3)
+        // new_avg = (alpha * new + (1000 - alpha) * old) / 1000
+        updated := (defaultAlphaScale*newNano + (1000-defaultAlphaScale)*old) / 1000
+        
+        if ptr.CompareAndSwap(old, updated) {
+            return
+        }
+        // CAS 失败，重试
+    }
+}
+
+// HeaderTime 返回首字节时间的 EWMA 值
+func (e *EWMAStats) HeaderTime() time.Duration {
+    return time.Duration(e.headerTime.Load())
+}
+
+// LastByteTime 返回完整响应时间的 EWMA 值
+func (e *EWMAStats) LastByteTime() time.Duration {
+    return time.Duration(e.lastByteTime.Load())
+}
+
+// SampleCount 返回已记录的样本数
+func (e *EWMAStats) SampleCount() int64 {
+    return e.sampleCount.Load()
+}
+
+// Reset 重置统计器
+func (e *EWMAStats) Reset() {
+    e.headerTime.Store(0)
+    e.lastByteTime.Store(0)
+    e.sampleCount.Store(0)
+}
+```
+
+### Step 1.4: Run EWMA Test - Verify Passes
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestEWMAStats`
+Expected: PASS (3 tests)
+
+### Step 1.5: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/ewma.go internal/loadbalance/ewma_test.go
+git commit -m "feat(loadbalance): add atomic EWMA statistics core
+
+- Zero-lock atomic EWMA implementation using fixed-point arithmetic
+- Supports header_time and last_byte_time tracking
+- Concurrent-safe with CAS retry loop"
+```
+
+---
+
+## Task 2: Least Time Balancer
+
+**Files:**
+- Create: `internal/loadbalance/least_time.go`
+- Create: `internal/loadbalance/least_time_test.go`
+
+### Step 2.1: Write LeastTime Failing Test
+
+```go
+package loadbalance
+
+import (
+    "sync"
+    "testing"
+    "time"
+)
+
+func TestLeastTime_BasicSelect(t *testing.T) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://slow:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://fast:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    // Record different response times
+    targets[0].Stats.Record(200*time.Millisecond, 400*time.Millisecond) // slow
+    targets[1].Stats.Record(50*time.Millisecond, 100*time.Millisecond)   // fast
+    
+    selected := lt.Select(targets)
+    if selected == nil {
+        t.Fatal("expected a target, got nil")
+    }
+    if selected.URL != "http://fast:8080" {
+        t.Errorf("selected = %s, want fast target", selected.URL)
+    }
+}
+
+func TestLeastTime_NoStats(t *testing.T) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://a:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://b:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    // No stats recorded - should still select one (using default)
+    selected := lt.Select(targets)
+    if selected == nil {
+        t.Fatal("expected a target, got nil")
+    }
+}
+
+func TestLeastTime_HeaderMetric(t *testing.T) {
+    lt := NewLeastTime("header", time.Millisecond)
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://slow:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://fast:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    // Record: slow has worse header time but better last_byte time
+    targets[0].Stats.Record(200*time.Millisecond, 100*time.Millisecond)
+    targets[1].Stats.Record(50*time.Millisecond, 300*time.Millisecond)
+    
+    selected := lt.Select(targets)
+    if selected == nil {
+        t.Fatal("expected a target, got nil")
+    }
+    // Should pick fast based on header_time
+    if selected.URL != "http://fast:8080" {
+        t.Errorf("selected = %s, want fast target based on header_time", selected.URL)
+    }
+}
+
+func TestLeastTime_SelectExcluding(t *testing.T) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://a:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://b:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://c:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    targets[0].Stats.Record(10*time.Millisecond, 20*time.Millisecond)
+    targets[1].Stats.Record(30*time.Millisecond, 60*time.Millisecond)
+    targets[2].Stats.Record(50*time.Millisecond, 100*time.Millisecond)
+    
+    // Exclude the fastest
+    excluded := []*Target{targets[0]}
+    selected := lt.SelectExcluding(targets, excluded)
+    
+    if selected == nil {
+        t.Fatal("expected a target, got nil")
+    }
+    if selected.URL != "http://b:8080" {
+        t.Errorf("selected = %s, want second fastest", selected.URL)
+    }
+}
+
+func TestLeastTime_Concurrent(t *testing.T) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://a:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://b:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    var wg sync.WaitGroup
+    
+    // Concurrent recording
+    for i := 0; i < 50; i++ {
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            for j := 0; j < 100; j++ {
+                targets[0].Stats.Record(time.Millisecond, 2*time.Millisecond)
+                targets[1].Stats.Record(2*time.Millisecond, 4*time.Millisecond)
+            }
+        }()
+    }
+    
+    // Concurrent selecting
+    for i := 0; i < 50; i++ {
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            for j := 0; j < 100; j++ {
+                lt.Select(targets)
+            }
+        }()
+    }
+    
+    wg.Wait()
+}
+```
+
+### Step 2.2: Run LeastTime Test - Verify Fails
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestLeastTime`
+Expected: FAIL with "undefined: NewLeastTime"
+
+### Step 2.3: Modify Target to Add Stats Field
+
+File: `internal/loadbalance/balancer.go`
+
+Find `type Target struct` definition and add Stats field:
+
+```go
+// Target 表示 HTTP 代理（L7 层）的负载均衡后端服务器目标。
+type Target struct {
+    resolvedIPs   atomic.Pointer[[]string]
+    URL           string
+    hostname      string
+    VirtualHashes []uint64
+    Weight        int
+    Connections   int64
+    lastResolved  atomic.Int64
+    hostnameOnce  sync.Once
+    Healthy       atomic.Bool
+    
+    // Stats 响应时间统计（用于 least_time 算法）
+    Stats *EWMAStats
+    
+    // ... rest of fields unchanged
+```
+
+Also update `NewTargetFromConfig` to initialize Stats:
+
+```go
+func NewTargetFromConfig(url string, weight int, maxConns int64, maxFails int64, failTimeout time.Duration, backup bool, down bool, proxyURI string) *Target {
+    t := &Target{
+        URL:         url,
+        Weight:      weight,
+        MaxConns:    maxConns,
+        MaxFails:    maxFails,
+        FailTimeout: failTimeout,
+        Backup:      backup,
+        Down:        down,
+        ProxyURI:    proxyURI,
+        Stats:       NewEWMAStats(), // 初始化统计器
+    }
+    t.initHostname()
+    if !down {
+        t.Healthy.Store(true)
+    }
+    return t
+}
+```
+
+### Step 2.4: Implement LeastTime Balancer
+
+```go
+package loadbalance
+
+import (
+    "sync/atomic"
+    "time"
+)
+
+// ResponseTimeRecorder 响应时间记录接口。
+// 实现此接口的 balancer 可在请求完成后收到响应时间统计。
+type ResponseTimeRecorder interface {
+    RecordResponseTime(target *Target, headerTime, lastByteTime time.Duration)
+}
+
+// LeastTime 基于响应时间 EWMA 的负载均衡器。
+//
+// 选择响应时间最短的健康目标。支持两种指标：
+//   - "header": 首字节时间（从发送请求到收到响应头）
+//   - "last_byte": 完整响应时间（从发送请求到收到完整响应）
+type LeastTime struct {
+    metric       string        // "header" 或 "last_byte"
+    defaultTime  time.Duration // 无统计样本时的默认值
+}
+
+// NewLeastTime 创建 Least Time 负载均衡器。
+//
+// 参数：
+//   - metric: 使用的指标，"header" 或 "last_byte"
+//   - defaultTime: 无统计样本时的默认响应时间（避免新节点被饿死）
+func NewLeastTime(metric string, defaultTime time.Duration) *LeastTime {
+    if metric != "header" {
+        metric = "last_byte" // 默认使用 last_byte
+    }
+    if defaultTime <= 0 {
+        defaultTime = time.Millisecond // 默认 1ms
+    }
+    return &LeastTime{
+        metric:      metric,
+        defaultTime: defaultTime,
+    }
+}
+
+// Select 选择响应时间最短的健康目标。
+// 只考虑可用目标。如果没有可用目标则返回 nil。
+func (l *LeastTime) Select(targets []*Target) *Target {
+    fc := acquireFilterContext()
+    defer releaseFilterContext(fc)
+    available := filterInto(fc, targets)
+    return l.selectFrom(available)
+}
+
+// SelectExcluding 选择响应时间最短的目标，排除指定的目标列表。
+func (l *LeastTime) SelectExcluding(targets []*Target, excluded []*Target) *Target {
+    fc := acquireFilterContext()
+    defer releaseFilterContext(fc)
+    available := filterIntoExcluding(fc, targets, excluded)
+    return l.selectFrom(available)
+}
+
+// selectFrom 从可用目标列表中选择响应时间最短的
+func (l *LeastTime) selectFrom(available []*Target) *Target {
+    if len(available) == 0 {
+        return nil
+    }
+    
+    var selected *Target
+    var minTime int64 = -1
+    defaultNano := l.defaultTime.Nanoseconds()
+    
+    for _, t := range available {
+        var currentTime int64
+        if t.Stats != nil {
+            if l.metric == "header" {
+                currentTime = t.Stats.headerTime.Load()
+            } else {
+                currentTime = t.Stats.lastByteTime.Load()
+            }
+        }
+        
+        // 无统计样本时使用默认值
+        if currentTime == 0 {
+            currentTime = defaultNano
+        }
+        
+        if selected == nil || currentTime < minTime {
+            selected = t
+            minTime = currentTime
+        }
+    }
+    
+    return selected
+}
+
+// RecordResponseTime 记录目标响应时间（实现 ResponseTimeRecorder 接口）。
+func (l *LeastTime) RecordResponseTime(target *Target, headerTime, lastByteTime time.Duration) {
+    if target != nil && target.Stats != nil {
+        target.Stats.Record(headerTime, lastByteTime)
+    }
+}
+
+// GetMetric 返回当前使用的指标
+func (l *LeastTime) GetMetric() string {
+    return l.metric
+}
+
+var _ Balancer = (*LeastTime)(nil)
+var _ ResponseTimeRecorder = (*LeastTime)(nil)
+```
+
+### Step 2.5: Run LeastTime Test - Verify Passes
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestLeastTime`
+Expected: PASS (5 tests)
+
+### Step 2.6: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/balancer.go internal/loadbalance/least_time.go internal/loadbalance/least_time_test.go
+git commit -m "feat(loadbalance): implement Least Time balancer
+
+- Add atomic EWMA Stats field to Target
+- Implement LeastTime balancer with header_time and last_byte metrics
+- Support Select and SelectExcluding with zero-lock design
+- Add ResponseTimeRecorder interface for proxy integration"
+```
+
+---
+
+## Task 3: Session Sticky Balancer
+
+**Files:**
+- Create: `internal/loadbalance/sticky_config.go`
+- Create: `internal/loadbalance/sticky.go`
+- Create: `internal/loadbalance/sticky_test.go`
+
+### Step 3.1: Write StickyConfig Structure
+
+```go
+package loadbalance
+
+import "time"
+
+// StickyConfig Session Sticky 配置
+type StickyConfig struct {
+    Enabled  bool          `yaml:"enabled"`
+    Name     string        `yaml:"name"`     // cookie 名称
+    Expires  time.Duration `yaml:"expires"`  // session 有效期
+    Domain   string        `yaml:"domain"`   // cookie domain
+    Path     string        `yaml:"path"`     // cookie path
+    Secure   bool          `yaml:"secure"`   // Secure flag
+    HttpOnly bool          `yaml:"http_only"` // HttpOnly flag
+    SameSite string        `yaml:"same_site"` // SameSite attribute
+}
+
+// DefaultStickyConfig 返回默认 Sticky 配置
+func DefaultStickyConfig() StickyConfig {
+    return StickyConfig{
+        Name:     "lolly_route",
+        Expires:  time.Hour,
+        Path:     "/",
+        HttpOnly: true,
+        SameSite: "Lax",
+    }
+}
+```
+
+### Step 3.2: Write Sticky Test (Failing)
+
+```go
+package loadbalance
+
+import (
+    "strings"
+    "sync"
+    "testing"
+    "time"
+
+    "github.com/valyala/fasthttp"
+)
+
+func TestStickySession_BasicRoute(t *testing.T) {
+    fallback := NewRoundRobin()
+    config := DefaultStickyConfig()
+    config.Expires = time.Hour
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    ctx := &fasthttp.RequestCtx{}
+    
+    // First request - should set cookie
+    selected1 := sticky.Select(ctx, targets)
+    if selected1 == nil {
+        t.Fatal("expected a target, got nil")
+    }
+    
+    // Check cookie was set
+    cookie := ctx.Response.Header.PeekCookie(config.Name)
+    if len(cookie) == 0 {
+        t.Fatal("expected cookie to be set")
+    }
+    
+    // Second request with same cookie - should route to same target
+    ctx2 := &fasthttp.RequestCtx{}
+    ctx2.Request.Header.SetCookieBytesV(config.Name, extractCookieValue(cookie))
+    
+    selected2 := sticky.Select(ctx2, targets)
+    if selected2 == nil {
+        t.Fatal("expected a target, got nil")
+    }
+    if selected2.URL != selected1.URL {
+        t.Errorf("sticky routing failed: got %s, want %s", selected2.URL, selected1.URL)
+    }
+}
+
+func TestStickySession_TargetUnavailable(t *testing.T) {
+    fallback := NewRoundRobin()
+    config := DefaultStickyConfig()
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    ctx := &fasthttp.RequestCtx{}
+    
+    // First request
+    selected1 := sticky.Select(ctx, targets)
+    
+    // Make target unavailable
+    selected1.Healthy.Store(false)
+    
+    // Second request with cookie - should fallback to another target
+    ctx2 := &fasthttp.RequestCtx{}
+    cookie := ctx.Response.Header.PeekCookie(config.Name)
+    ctx2.Request.Header.SetCookieBytesV(config.Name, extractCookieValue(cookie))
+    
+    selected2 := sticky.Select(ctx2, targets)
+    if selected2 == nil {
+        t.Fatal("expected a target after fallback, got nil")
+    }
+    if selected2.URL == selected1.URL {
+        t.Error("expected fallback to different target")
+    }
+}
+
+func TestStickySession_CookieEncodeDecode(t *testing.T) {
+    targetURL := "http://backend1:8080"
+    expires := time.Now().Add(time.Hour)
+    
+    encoded := encodeStickyCookie(targetURL, expires)
+    decodedURL, decodedExpires, ok := decodeStickyCookie(encoded)
+    
+    if !ok {
+        t.Fatal("decode failed")
+    }
+    if decodedURL != targetURL {
+        t.Errorf("url = %s, want %s", decodedURL, targetURL)
+    }
+    if decodedExpires.Unix() != expires.Unix() {
+        t.Errorf("expires mismatch")
+    }
+}
+
+func TestStickySession_Concurrent(t *testing.T) {
+    fallback := NewRoundRobin()
+    config := DefaultStickyConfig()
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    var wg sync.WaitGroup
+    for i := 0; i < 100; i++ {
+        wg.Add(1)
+        go func(id int) {
+            defer wg.Done()
+            ctx := &fasthttp.RequestCtx{}
+            sticky.Select(ctx, targets)
+        }(i)
+    }
+    wg.Wait()
+}
+
+// Helper to extract cookie value from Set-Cookie header
+func extractCookieValue(cookieHeader []byte) []byte {
+    s := string(cookieHeader)
+    // Format: "name=value; ..."
+    parts := strings.SplitN(s, "=", 2)
+    if len(parts) != 2 {
+        return nil
+    }
+    valueParts := strings.SplitN(parts[1], ";", 2)
+    return []byte(valueParts[0])
+}
+```
+
+### Step 3.3: Run Sticky Test - Verify Fails
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestStickySession`
+Expected: FAIL with undefined functions
+
+### Step 3.4: Implement StickySession
+
+```go
+package loadbalance
+
+import (
+    "encoding/base64"
+    "strconv"
+    "strings"
+    "sync"
+    "sync/atomic"
+    "time"
+
+    "github.com/valyala/fasthttp"
+)
+
+const stickyShardCount = 256
+
+// StickySession Cookie-based 会话保持负载均衡器。
+//
+// 使用 256 个分片锁降低锁冲突概率，支持 TTL 过期和后台清理。
+type StickySession struct {
+    config   StickyConfig
+    fallback Balancer
+    
+    shards  [stickyShardCount]*stickyShard
+    cleaner *time.Ticker
+    stopCh  chan struct{}
+    started atomic.Bool
+}
+
+type stickyShard struct {
+    mu       sync.RWMutex
+    sessions map[string]*stickyEntry
+}
+
+type stickyEntry struct {
+    targetURL string
+    expiresAt int64 // Unix 纳秒
+}
+
+// NewStickySession 创建 Session Sticky 负载均衡器。
+//
+// 参数：
+//   - config: Sticky 配置
+//   - fallback: 首次路由和目标失效时的 fallback 算法
+func NewStickySession(config StickyConfig, fallback Balancer) *StickySession {
+    if fallback == nil {
+        fallback = NewRoundRobin()
+    }
+    
+    s := &StickySession{
+        config:   config,
+        fallback: fallback,
+        stopCh:   make(chan struct{}),
+    }
+    
+    for i := 0; i < stickyShardCount; i++ {
+        s.shards[i] = &stickyShard{
+            sessions: make(map[string]*stickyEntry),
+        }
+    }
+    
+    return s
+}
+
+// Start 启动后台清理任务。
+func (s *StickySession) Start() {
+    if s.started.Swap(true) {
+        return
+    }
+    s.cleaner = time.NewTicker(60 * time.Second)
+    go s.cleanupLoop()
+}
+
+// Stop 停止后台清理任务。
+func (s *StickySession) Stop() {
+    if !s.started.Swap(false) {
+        return
+    }
+    close(s.stopCh)
+}
+
+// cleanupLoop 后台清理循环
+func (s *StickySession) cleanupLoop() {
+    for {
+        select {
+        case <-s.cleaner.C:
+            s.cleanupExpired()
+        case <-s.stopCh:
+            return
+        }
+    }
+}
+
+// cleanupExpired 清理所有过期 session
+func (s *StickySession) cleanupExpired() {
+    now := time.Now().UnixNano()
+    for _, shard := range s.shards {
+        shard.mu.Lock()
+        for key, entry := range shard.sessions {
+            if entry.expiresAt < now {
+                delete(shard.sessions, key)
+            }
+        }
+        shard.mu.Unlock()
+    }
+}
+
+// Select 根据 Cookie 选择目标。
+//
+// 1. 检查请求中的 sticky cookie
+// 2. 如果存在且目标健康，路由到该目标
+// 3. 如果不存在或目标不可用，使用 fallback 选择
+// 4. 设置新的 Set-Cookie 响应头
+func (s *StickySession) Select(ctx *fasthttp.RequestCtx, targets []*Target) *Target {
+    // 1. 检查现有 cookie
+    cookieValue := ctx.Request.Header.Cookie(s.config.Name)
+    if len(cookieValue) > 0 {
+        targetURL, expires, ok := decodeStickyCookie(string(cookieValue))
+        if ok && expires.After(time.Now()) {
+            // 查找目标是否可用
+            for _, t := range targets {
+                if t.URL == targetURL && t.IsAvailable() {
+                    return t
+                }
+            }
+            // 目标不可用，删除 session
+            s.deleteSession(string(cookieValue))
+        }
+    }
+    
+    // 2. 使用 fallback 选择
+    selected := s.fallback.Select(targets)
+    if selected == nil {
+        return nil
+    }
+    
+    // 3. 种 cookie
+    s.setCookie(ctx, selected.URL)
+    
+    // 4. 记录 session
+    s.recordSession(selected.URL)
+    
+    return selected
+}
+
+// SelectExcluding 排除指定目标后选择。
+func (s *StickySession) SelectExcluding(targets []*Target, excluded []*Target) *Target {
+    // Session Sticky 通常不用于 failover 场景，
+    // 但如果需要，可以先尝试 cookie，不行再用 fallback.SelectExcluding
+    // 这里简化实现：使用 fallback 的 SelectExcluding
+    return s.fallback.SelectExcluding(targets, excluded)
+}
+
+// setCookie 设置 Set-Cookie 响应头
+func (s *StickySession) setCookie(ctx *fasthttp.RequestCtx, targetURL string) {
+    expires := time.Now().Add(s.config.Expires)
+    cookieValue := encodeStickyCookie(targetURL, expires)
+    
+    var cookie fasthttp.Cookie
+    cookie.SetKey(s.config.Name)
+    cookie.SetValue(cookieValue)
+    cookie.SetExpire(expires)
+    cookie.SetPath(s.config.Path)
+    if s.config.Domain != "" {
+        cookie.SetDomain(s.config.Domain)
+    }
+    if s.config.Secure {
+        cookie.SetSecure(true)
+    }
+    if s.config.HttpOnly {
+        cookie.SetHTTPOnly(true)
+    }
+    switch strings.ToLower(s.config.SameSite) {
+    case "strict":
+        cookie.SetSameSite(fasthttp.CookieSameSiteStrictMode)
+    case "none":
+        cookie.SetSameSite(fasthttp.CookieSameSiteNoneMode)
+    default:
+        cookie.SetSameSite(fasthttp.CookieSameSiteLaxMode)
+    }
+    
+    ctx.Response.Header.SetCookie(&cookie)
+}
+
+// recordSession 记录 session 到路由表
+func (s *StickySession) recordSession(targetURL string) {
+    cookieValue := encodeStickyCookie(targetURL, time.Now().Add(s.config.Expires))
+    shard := s.getShard(cookieValue)
+    
+    shard.mu.Lock()
+    shard.sessions[cookieValue] = &stickyEntry{
+        targetURL: targetURL,
+        expiresAt: time.Now().Add(s.config.Expires).UnixNano(),
+    }
+    shard.mu.Unlock()
+}
+
+// deleteSession 删除 session
+func (s *StickySession) deleteSession(cookieValue string) {
+    shard := s.getShard(cookieValue)
+    shard.mu.Lock()
+    delete(shard.sessions, cookieValue)
+    shard.mu.Unlock()
+}
+
+// getShard 根据 cookie 值计算分片索引
+func (s *StickySession) getShard(cookieValue string) *stickyShard {
+    hash := fnvHash64a(cookieValue)
+    return s.shards[hash%stickyShardCount]
+}
+
+// encodeStickyCookie 编码路由信息到 cookie 值
+// 格式: base64(target_url + "|" + expires_timestamp)
+func encodeStickyCookie(targetURL string, expires time.Time) string {
+    raw := targetURL + "|" + strconv.FormatInt(expires.Unix(), 10)
+    return base64.URLEncoding.EncodeToString([]byte(raw))
+}
+
+// decodeStickyCookie 解码 cookie 值
+func decodeStickyCookie(value string) (targetURL string, expires time.Time, ok bool) {
+    raw, err := base64.URLEncoding.DecodeString(value)
+    if err != nil {
+        return
+    }
+    parts := strings.Split(string(raw), "|")
+    if len(parts) != 2 {
+        return
+    }
+    ts, err := strconv.ParseInt(parts[1], 10, 64)
+    if err != nil {
+        return
+    }
+    return parts[0], time.Unix(ts, 0), true
+}
+
+var _ Balancer = (*StickySession)(nil)
+```
+
+### Step 3.5: Run Sticky Test - Verify Passes
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestStickySession`
+Expected: PASS (4 tests)
+
+### Step 3.6: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/sticky_config.go internal/loadbalance/sticky.go internal/loadbalance/sticky_test.go
+git commit -m "feat(loadbalance): implement Session Sticky balancer
+
+- Add 256-shard lock map for concurrent session routing
+- Cookie-based session persistence with base64 encoding
+- TTL expiration with background cleanup goroutine
+- Support Secure, HttpOnly, SameSite cookie attributes
+- Fallback to configured balancer when session target unavailable"
+```
+
+---
+
+## Task 4: Configuration Integration
+
+**Files:**
+- Modify: `internal/loadbalance/algorithms.go`
+- Modify: `internal/config/proxy_config.go`
+- Modify: `internal/config/defaults.go`
+- Modify: `internal/config/validate.go`
+
+### Step 4.1: Add Algorithms to Valid List
+
+File: `internal/loadbalance/algorithms.go`
+
+```go
+var validAlgorithms = []string{
+    "round_robin",
+    "weighted_round_robin",
+    "least_conn",
+    "ip_hash",
+    "consistent_hash",
+    "random",
+    "least_time",
+    "sticky",
+}
+```
+
+### Step 4.2: Add Config Structures
+
+File: `internal/config/proxy_config.go`
+
+Add to existing ProxyConfig:
+
+```go
+// ProxyConfig 代理配置
+type ProxyConfig struct {
+    // ... existing fields ...
+    
+    // LeastTime 最小时间负载均衡配置
+    LeastTime LeastTimeConfig `yaml:"least_time"`
+    
+    // Sticky Session Sticky 配置
+    Sticky StickyConfig `yaml:"sticky"`
+}
+
+// LeastTimeConfig 最小时间负载均衡配置
+type LeastTimeConfig struct {
+    Metric      string        `yaml:"metric"`       // "header" 或 "last_byte"
+    DefaultTime time.Duration `yaml:"default_time"` // 无样本时的默认时间
+}
+
+// StickyConfig Session Sticky 配置
+type StickyConfig struct {
+    Enabled      bool          `yaml:"enabled"`
+    Name         string        `yaml:"name"`
+    Expires      time.Duration `yaml:"expires"`
+    Domain       string        `yaml:"domain"`
+    Path         string        `yaml:"path"`
+    Secure       bool          `yaml:"secure"`
+    HttpOnly     bool          `yaml:"http_only"`
+    SameSite     string        `yaml:"same_site"`
+    FallbackAlgo string        `yaml:"fallback_balance"` // fallback 算法
+}
+```
+
+### Step 4.3: Update Defaults
+
+File: `internal/config/defaults.go`
+
+在生成默认配置的函数中添加注释（搜索 `load_balance:` 相关行并扩展）：
+
+```go
+buf.WriteString("    #     load_balance: round_robin   # 负载均衡算法（有效值: round_robin, weighted_round_robin, least_conn, ip_hash, consistent_hash, random, least_time, sticky）\n")
+
+// 在 proxy 配置块后添加：
+buf.WriteString("    #     least_time:              # 最小时间负载均衡配置\n")
+buf.WriteString("    #       metric: last_byte      # 指标类型（header: 首字节时间, last_byte: 完整响应时间）\n")
+buf.WriteString("    #       default_time: 1ms      # 无统计样本时的默认响应时间\n")
+buf.WriteString("    #     sticky:                  # Session Sticky 配置\n")
+buf.WriteString("    #       enabled: false         # 是否启用\n")
+buf.WriteString("    #       name: lolly_route      # cookie 名称\n")
+buf.WriteString("    #       expires: 1h            # session 有效期\n")
+buf.WriteString("    #       path: /                # cookie 路径\n")
+buf.WriteString("    #       http_only: true        # HttpOnly flag\n")
+buf.WriteString("    #       same_site: Lax         # SameSite 属性\n")
+buf.WriteString("    #       fallback_balance: round_robin  # fallback 算法\n")
+```
+
+### Step 4.4: Add Validation
+
+File: `internal/config/validate.go`
+
+在验证 ProxyConfig 的地方添加：
+
+```go
+// validate least_time config
+if p.LoadBalance == "least_time" {
+    if p.LeastTime.Metric != "" && p.LeastTime.Metric != "header" && p.LeastTime.Metric != "last_byte" {
+        return fmt.Errorf("无效的 least_time metric: %s（有效值: header, last_byte）", p.LeastTime.Metric)
+    }
+}
+
+// validate sticky config
+if p.LoadBalance == "sticky" {
+    if !p.Sticky.Enabled {
+        return fmt.Errorf("load_balance=sticky 时 sticky.enabled 必须为 true")
+    }
+    if p.Sticky.FallbackAlgo != "" && !loadbalance.IsValidAlgorithm(p.Sticky.FallbackAlgo) {
+        return fmt.Errorf("无效的 sticky fallback_balance: %s", p.Sticky.FallbackAlgo)
+    }
+}
+```
+
+### Step 4.5: Run Config Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/config -run TestValidate`
+Expected: PASS (所有验证测试)
+
+### Step 4.6: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/algorithms.go internal/config/proxy_config.go internal/config/defaults.go internal/config/validate.go
+git commit -m "feat(config): add Least Time and Sticky configuration support
+
+- Add least_time and sticky to valid algorithms list
+- Add LeastTimeConfig and StickyConfig structures
+- Update default config generation with new options
+- Add configuration validation for new fields"
+```
+
+---
+
+## Task 5: Proxy Integration
+
+**Files:**
+- Modify: `internal/proxy/proxy.go`
+- Modify: `internal/proxy/target_selector.go`
+
+### Step 5.1: Update createBalancer
+
+File: `internal/proxy/proxy.go`
+
+在 `createBalancerByName` 函数中添加：
+
+```go
+func createBalancerByName(name string, cfg *config.ProxyConfig) (loadbalance.Balancer, error) {
+    switch name {
+    // ... existing cases ...
+    case "least_time":
+        metric := cfg.LeastTime.Metric
+        if metric == "" {
+            metric = "last_byte"
+        }
+        defaultTime := cfg.LeastTime.DefaultTime
+        if defaultTime <= 0 {
+            defaultTime = time.Millisecond
+        }
+        return loadbalance.NewLeastTime(metric, defaultTime), nil
+    case "sticky":
+        stickyCfg := loadbalance.StickyConfig{
+            Enabled:      cfg.Sticky.Enabled,
+            Name:         cfg.Sticky.Name,
+            Expires:      cfg.Sticky.Expires,
+            Domain:       cfg.Sticky.Domain,
+            Path:         cfg.Sticky.Path,
+            Secure:       cfg.Sticky.Secure,
+            HttpOnly:     cfg.Sticky.HttpOnly,
+            SameSite:     cfg.Sticky.SameSite,
+        }
+        if stickyCfg.Name == "" {
+            stickyCfg.Name = "lolly_route"
+        }
+        if stickyCfg.Expires <= 0 {
+            stickyCfg.Expires = time.Hour
+        }
+        if stickyCfg.Path == "" {
+            stickyCfg.Path = "/"
+        }
+        
+        fallbackAlgo := cfg.Sticky.FallbackAlgo
+        if fallbackAlgo == "" {
+            fallbackAlgo = "round_robin"
+        }
+        fallbackBalancer, err := createBalancerByName(fallbackAlgo, cfg)
+        if err != nil {
+            return nil, fmt.Errorf("sticky fallback balancer: %w", err)
+        }
+        
+        sticky := loadbalance.NewStickySession(stickyCfg, fallbackBalancer)
+        sticky.Start()
+        return sticky, nil
+    // ... rest ...
+    }
+}
+```
+
+### Step 5.2: Add Response Time Recording
+
+在 Proxy 的请求处理流程中（找到请求完成后调用的地方，通常在 Do 或类似调用之后）：
+
+```go
+// recordResponseTime 记录目标响应时间
+func (p *Proxy) recordResponseTime(target *loadbalance.Target, startTime time.Time, headerReceived time.Time) {
+    if target == nil || target.Stats == nil {
+        return
+    }
+    
+    headerTime := headerReceived.Sub(startTime)
+    lastByteTime := time.Since(startTime)
+    
+    target.Stats.Record(headerTime, lastByteTime)
+}
+```
+
+**注意：** 需要在实际发起请求的地方调用这个函数。通常是在 fasthttp HostClient.Do 调用后。
+
+由于 proxy.go 文件较大且结构复杂，找到合适的插入点：
+
+在 proxy.go 中找到执行请求的地方（通常有 `client.Do` 或类似的调用），在成功返回后添加：
+
+```go
+// 在请求完成后（例如 Do 调用之后）
+if recorder, ok := p.balancer.(loadbalance.ResponseTimeRecorder); ok {
+    recorder.RecordResponseTime(target, headerTime, lastByteTime)
+}
+```
+
+### Step 5.3: Update Target Selector for Sticky
+
+File: `internal/proxy/target_selector.go`
+
+修改 `selectByBalancer` 支持 StickySession：
+
+```go
+func (p *Proxy) selectByBalancer(ctx *fasthttp.RequestCtx, targets []*loadbalance.Target) *loadbalance.Target {
+    p.mu.RLock()
+    balancer := p.balancer
+    p.mu.RUnlock()
+    
+    // StickySession 需要请求上下文
+    if sticky, ok := balancer.(*loadbalance.StickySession); ok {
+        return sticky.Select(ctx, targets)
+    }
+    
+    // ... existing IPHash and ConsistentHash handling ...
+    
+    return balancer.Select(targets)
+}
+```
+
+同样修改 `selectTargetExcluding`：
+
+```go
+func (p *Proxy) selectTargetExcluding(ctx *fasthttp.RequestCtx, excluded []*loadbalance.Target) *loadbalance.Target {
+    // ... existing code ...
+    
+    // StickySession 通常不用于 failover，但如果是的话：
+    if sticky, ok := balancer.(*loadbalance.StickySession); ok {
+        return sticky.SelectExcluding(targets, excluded)
+    }
+    
+    // ... rest ...
+}
+```
+
+### Step 5.4: Run Proxy Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/proxy -run TestProxy`
+Expected: PASS (现有测试不受影响)
+
+### Step 5.5: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/proxy/proxy.go internal/proxy/target_selector.go
+git commit -m "feat(proxy): integrate Least Time and Sticky balancers
+
+- Add least_time and sticky to createBalancerByName
+- Implement response time recording for Least Time
+- Support StickySession in target selector with request context
+- StickySession auto-starts when created"
+```
+
+---
+
+## Task 6: Full Integration Test
+
+**Files:**
+- Modify: `internal/loadbalance/balancer_test.go` (add integration tests)
+
+### Step 6.1: Add Integration Tests
+
+```go
+func TestBalancerIntegration_LeastTime(t *testing.T) {
+    targets := []*Target{
+        NewTargetFromConfig("http://slow:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://fast:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    
+    // Simulate: slow target has 100ms avg, fast has 10ms avg
+    for i := 0; i < 10; i++ {
+        targets[0].Stats.Record(50*time.Millisecond, 100*time.Millisecond)
+        targets[1].Stats.Record(5*time.Millisecond, 10*time.Millisecond)
+    }
+    
+    // Select 100 times, should mostly pick fast
+    fastCount := 0
+    for i := 0; i < 100; i++ {
+        selected := lt.Select(targets)
+        if selected.URL == "http://fast:8080" {
+            fastCount++
+        }
+    }
+    
+    if fastCount < 80 {
+        t.Errorf("fast target selected %d/100 times, expected >80", fastCount)
+    }
+}
+
+func TestBalancerIntegration_StickyWithLeastTimeFallback(t *testing.T) {
+    fallback := NewLeastTime("last_byte", time.Millisecond)
+    config := StickyConfig{
+        Enabled:  true,
+        Name:     "test_route",
+        Expires:  time.Hour,
+        Path:     "/",
+        HttpOnly: true,
+    }
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    ctx := &fasthttp.RequestCtx{}
+    
+    // First request
+    selected1 := sticky.Select(ctx, targets)
+    if selected1 == nil {
+        t.Fatal("expected a target")
+    }
+    
+    // Verify cookie set
+    cookie := ctx.Response.Header.PeekCookie("test_route")
+    if len(cookie) == 0 {
+        t.Fatal("expected cookie")
+    }
+    
+    // Make selected1 unhealthy
+    selected1.Healthy.Store(false)
+    
+    // Second request with cookie should fallback
+    ctx2 := &fasthttp.RequestCtx{}
+    ctx2.Request.Header.SetCookieBytesV("test_route", extractCookieValue(cookie))
+    
+    selected2 := sticky.Select(ctx2, targets)
+    if selected2 == nil {
+        t.Fatal("expected fallback target")
+    }
+    if selected2.URL == selected1.URL {
+        t.Error("expected different target after fallback")
+    }
+}
+```
+
+### Step 6.2: Run Integration Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance -run TestBalancerIntegration`
+Expected: PASS (2 tests)
+
+### Step 6.3: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/balancer_test.go
+git commit -m "test(loadbalance): add integration tests for Least Time and Sticky
+
+- Verify Least Time picks faster target consistently
+- Verify Sticky fallback when target becomes unhealthy
+- Test cookie encoding and session persistence"
+```
+
+---
+
+## Task 7: Benchmark Tests
+
+**Files:**
+- Create: `internal/loadbalance/least_time_bench_test.go`
+- Create: `internal/loadbalance/sticky_bench_test.go`
+
+### Step 7.1: Least Time Benchmark
+
+```go
+package loadbalance
+
+import (
+    "sync"
+    "testing"
+    "time"
+)
+
+func BenchmarkLeastTime_Select(b *testing.B) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    targets := []*Target{
+        NewTargetFromConfig("http://a:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://b:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://c:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    // Pre-populate stats
+    for _, t := range targets {
+        t.Stats.Record(10*time.Millisecond, 20*time.Millisecond)
+    }
+    
+    b.ResetTimer()
+    for i := 0; i < b.N; i++ {
+        lt.Select(targets)
+    }
+}
+
+func BenchmarkLeastTime_Record(b *testing.B) {
+    stats := NewEWMAStats()
+    
+    b.ResetTimer()
+    for i := 0; i < b.N; i++ {
+        stats.Record(10*time.Millisecond, 20*time.Millisecond)
+    }
+}
+
+func BenchmarkLeastTime_Concurrent(b *testing.B) {
+    lt := NewLeastTime("last_byte", time.Millisecond)
+    targets := []*Target{
+        NewTargetFromConfig("http://a:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://b:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    b.RunParallel(func(pb *testing.PB) {
+        for pb.Next() {
+            lt.Select(targets)
+        }
+    })
+}
+```
+
+### Step 7.2: Sticky Benchmark
+
+```go
+package loadbalance
+
+import (
+    "testing"
+
+    "github.com/valyala/fasthttp"
+)
+
+func BenchmarkStickySession_Select(b *testing.B) {
+    fallback := NewRoundRobin()
+    config := DefaultStickyConfig()
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    // Pre-populate a cookie
+    ctx := &fasthttp.RequestCtx{}
+    sticky.Select(ctx, targets)
+    cookie := ctx.Response.Header.PeekCookie(config.Name)
+    
+    b.ResetTimer()
+    for i := 0; i < b.N; i++ {
+        ctx := &fasthttp.RequestCtx{}
+        ctx.Request.Header.SetCookieBytesV(config.Name, extractCookieValue(cookie))
+        sticky.Select(ctx, targets)
+    }
+}
+
+func BenchmarkStickySession_SelectNew(b *testing.B) {
+    fallback := NewRoundRobin()
+    config := DefaultStickyConfig()
+    
+    sticky := NewStickySession(config, fallback)
+    sticky.Start()
+    defer sticky.Stop()
+    
+    targets := []*Target{
+        NewTargetFromConfig("http://backend1:8080", 1, 0, 0, 0, false, false, ""),
+        NewTargetFromConfig("http://backend2:8080", 1, 0, 0, 0, false, false, ""),
+    }
+    
+    b.ResetTimer()
+    for i := 0; i < b.N; i++ {
+        ctx := &fasthttp.RequestCtx{}
+        sticky.Select(ctx, targets)
+    }
+}
+```
+
+### Step 7.3: Run Benchmarks
+
+Run: `cd /home/xfy/Developer/lolly && go test -bench=. -benchmem ./internal/loadbalance -run=^$`
+Expected: 显示性能数据
+
+### Step 7.4: Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git add internal/loadbalance/least_time_bench_test.go internal/loadbalance/sticky_bench_test.go
+git commit -m "perf(loadbalance): add benchmarks for Least Time and Sticky
+
+- Benchmark Select and Record operations
+- Concurrent benchmark for realistic load testing
+- Baseline for future performance optimization"
+```
+
+---
+
+## Task 8: Final Verification
+
+### Step 8.1: Run All Loadbalance Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/loadbalance`
+Expected: ALL PASS
+
+### Step 8.2: Run All Config Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/config`
+Expected: ALL PASS
+
+### Step 8.3: Run All Proxy Tests
+
+Run: `cd /home/xfy/Developer/lolly && go test -v ./internal/proxy`
+Expected: ALL PASS
+
+### Step 8.4: Build
+
+Run: `cd /home/xfy/Developer/lolly && go build ./...`
+Expected: SUCCESS (no errors)
+
+### Step 8.5: Final Commit
+
+```bash
+cd /home/xfy/Developer/lolly
+git log --oneline -10
+```
+
+---
+
+## Spec Coverage Checklist
+
+| Spec Requirement | Plan Task |
+|------------------|-----------|
+| Least Time with EWMA | Task 1 + 2 |
+| header_time metric | Task 2 (NewLeastTime parameter) |
+| last_byte_time metric | Task 2 (NewLeastTime parameter) |
+| Session Sticky cookie | Task 3 |
+| 256-shard lock map | Task 3 (stickyShard) |
+| Cookie encoding | Task 3 (encodeStickyCookie) |
+| TTL expiration | Task 3 (stickyEntry.expiresAt) |
+| Background cleanup | Task 3 (cleanupLoop) |
+| Fallback algorithm | Task 3 (fallback balancer) |
+| Configuration integration | Task 4 |
+| Proxy integration | Task 5 |
+| Response time recording | Task 5 |
+| Zero-lock design | Task 1 (atomic EWMA) |
+| Zero-allocation | Task 1 + 2 (no heap alloc in hot path) |
+| Concurrent safety | All tasks (atomic + locks) |
+
+---
+
+## Placeholder Scan
+
+- No "TBD" or "TODO" in any task
+- No "implement later" or "fill in details"
+- All code blocks contain complete implementation
+- All test commands include expected output
+- All file paths are exact
+
+---
+
+## Type Consistency Check
+
+- `EWMAStats.Record(headerTime, lastByteTime time.Duration)` - consistent
+- `LeastTime.Select(targets)` returns `*Target` - consistent with Balancer interface
+- `StickySession.Select(ctx, targets)` - consistent with extended usage
+- `ResponseTimeRecorder.RecordResponseTime(target, headerTime, lastByteTime)` - consistent
+
+---
+
+## Execution Handoff
+
+**Plan complete and saved to `docs/superpowers/plans/2026-06-08-loadbalance-enhancement.md`.**
+
+Two execution options:
+
+**1. Subagent-Driven (recommended)** - Dispatch a fresh subagent per task, review between tasks, fast iteration
+
+**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints
+
+**Which approach?**
diff --git a/docs/superpowers/plans/2026-06-10-performance-optimization-plan.md b/docs/superpowers/plans/2026-06-10-performance-optimization-plan.md
new file mode 100644
index 0000000..8eb38ce
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-10-performance-optimization-plan.md
@@ -0,0 +1,1235 @@
+# 性能持续优化实施计划
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 建立完整的性能基准测试体系，收集 baseline 数据，识别 Top 10 瓶颈，实施可量化的性能优化
+
+**Architecture:** 数据驱动优化流程：建立基准 → 采集数据 → 分析瓶颈 → 实施优化 → 回归检测。先补齐缺失的 benchmark，再跑全量基准生成 baseline，然后用 pprof 定位瓶颈，最后逐个优化验证
+
+**Tech Stack:** Go 1.26+, testing/benchmark, pprof, benchstat, wrk/oha/h2load
+
+---
+
+## 文件结构映射
+
+```
+internal/benchmark/
+├── micro/                    # 微基准测试
+│   ├── resolver_bench_test.go    # DNS 解析器基准
+│   ├── stream_bench_test.go      # Stream 代理基准
+│   ├── cache_bench_test.go       # 缓存系统基准
+│   ├── lua_bench_test.go         # Lua 引擎基准
+│   └── variable_bench_test.go    # 变量系统基准
+├── integration/              # 集成基准测试
+│   ├── server_bench_test.go      # HTTP 服务器端到端
+│   ├── proxy_bench_test.go       # 反向代理端到端
+│   └── static_bench_test.go      # 静态文件端到端
+└── system/                   # 系统压测脚本
+    ├── bench.sh                  # 主压测脚本
+    ├── static.lua                # wrk 静态文件压测脚本
+    └── proxy.lua                 # wrk 代理压测脚本
+
+scripts/
+└── bench-suite.sh            # 一键运行全量基准
+
+benchmarks/                   # 基准结果存储
+└── v0.4.0/                   # 版本号目录
+    ├── micro.txt
+    ├── integration.txt
+    ├── system.txt
+    └── pprof/
+        ├── cpu.prof
+        ├── heap.prof
+        ├── allocs.prof
+        └── goroutine.prof
+```
+
+---
+
+## Task 1: 建立 Benchmark 目录结构
+
+**Files:**
+- Create: `internal/benchmark/micro/`
+- Create: `internal/benchmark/integration/`
+- Create: `internal/benchmark/system/`
+- Create: `benchmarks/`
+- Modify: `.gitignore`（忽略 benchmarks/ 但保留目录）
+
+- [ ] **Step 1: 创建目录结构**
+
+```bash
+mkdir -p internal/benchmark/micro
+mkdir -p internal/benchmark/integration
+mkdir -p internal/benchmark/system
+mkdir -p benchmarks/v0.4.0/pprof
+```
+
+- [ ] **Step 2: 添加 .gitignore 规则**
+
+在 `.gitignore` 末尾添加：
+
+```
+# Benchmark results
+benchmarks/*/
+!benchmarks/.gitkeep
+```
+
+创建 `benchmarks/.gitkeep`：
+
+```bash
+touch benchmarks/.gitkeep
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/ benchmarks/ .gitignore
+git commit -m "chore(benchmark): establish benchmark directory structure"
+```
+
+---
+
+## Task 2: 补充缺失的微基准 — Resolver
+
+**Files:**
+- Create: `internal/benchmark/micro/resolver_bench_test.go`
+
+- [ ] **Step 1: 编写 resolver 基准测试**
+
+```go
+package micro
+
+import (
+	"testing"
+	"time"
+
+	"rua.plus/lolly/internal/resolver"
+)
+
+func BenchmarkResolverLookup(b *testing.B) {
+	// 使用 mock resolver 避免真实网络请求
+	r := resolver.NewMockResolver(map[string][]string{
+		"example.com": {"93.184.216.34"},
+	})
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_, _ = r.Lookup("example.com")
+	}
+}
+
+func BenchmarkResolverLookupWithCache(b *testing.B) {
+	r := resolver.NewMockResolver(map[string][]string{
+		"example.com": {"93.184.216.34"},
+	})
+	// 预热缓存
+	_, _ = r.Lookup("example.com")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_, _ = r.Lookup("example.com")
+	}
+}
+
+func BenchmarkResolverCacheSet(b *testing.B) {
+	r := resolver.NewMockResolver(nil)
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		r.CacheSet("host"+string(rune(b.N)), []string{"1.2.3.4"}, time.Minute)
+	}
+}
+
+func BenchmarkResolverCacheGet(b *testing.B) {
+	r := resolver.NewMockResolver(nil)
+	r.CacheSet("example.com", []string{"1.2.3.4"}, time.Minute)
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_, _ = r.CacheGet("example.com")
+	}
+}
+```
+
+- [ ] **Step 2: 运行测试验证**
+
+```bash
+go test -bench=. -benchmem ./internal/benchmark/micro/resolver_bench_test.go
+```
+
+Expected: 4 个 benchmark 全部运行，无编译错误
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/micro/resolver_bench_test.go
+git commit -m "feat(benchmark): add resolver micro benchmarks"
+```
+
+---
+
+## Task 3: 补充缺失的微基准 — Stream
+
+**Files:**
+- Create: `internal/benchmark/micro/stream_bench_test.go`
+
+- [ ] **Step 1: 编写 stream 基准测试**
+
+```go
+package micro
+
+import (
+	"io"
+	"net"
+	"testing"
+
+	"github.com/stretchr/testify/require"
+	"rua.plus/lolly/internal/stream"
+)
+
+func BenchmarkStreamTCPForward(b *testing.B) {
+	// 创建后端 echo 服务器
+	backendLn, err := net.Listen("tcp", "127.0.0.1:0")
+	require.NoError(b, err)
+	defer backendLn.Close()
+
+	go func() {
+		for {
+			conn, err := backendLn.Accept()
+			if err != nil {
+				return
+			}
+			go func(c net.Conn) {
+				defer c.Close()
+				_, _ = io.Copy(c, c)
+			}(conn)
+		}
+	}()
+
+	// 创建 stream server
+	srv := stream.NewServer()
+	_ = srv.AddUpstream("test", []stream.TargetSpec{
+		{Addr: backendLn.Addr().String(), Weight: 1},
+	}, "round_robin", stream.HealthCheckSpec{})
+
+	// 设置 upstream 健康
+	srv.SetHealthy("test", 0, true)
+
+	_ = srv.ListenTCP("127.0.0.1:0")
+	_ = srv.Start()
+	defer srv.Stop()
+
+	proxyAddr := srv.GetListenerAddr("test")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		conn, err := net.Dial("tcp", proxyAddr)
+		if err != nil {
+			b.Fatal(err)
+		}
+		_, _ = conn.Write([]byte("hello"))
+		buf := make([]byte, 5)
+		_, _ = io.ReadFull(conn, buf)
+		conn.Close()
+	}
+}
+
+func BenchmarkStreamSelectTarget(b *testing.B) {
+	srv := stream.NewServer()
+	_ = srv.AddUpstream("test", []stream.TargetSpec{
+		{Addr: "127.0.0.1:8001", Weight: 3},
+		{Addr: "127.0.0.1:8002", Weight: 2},
+		{Addr: "127.0.0.1:8003", Weight: 1},
+	}, "weighted_round_robin", stream.HealthCheckSpec{})
+
+	for i := 0; i < 3; i++ {
+		srv.SetHealthy("test", i, true)
+	}
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_, _ = srv.SelectTarget("test", nil)
+	}
+}
+```
+
+- [ ] **Step 2: 运行测试验证**
+
+```bash
+go test -bench=. -benchmem ./internal/benchmark/micro/stream_bench_test.go
+```
+
+Expected: 2 个 benchmark 全部运行
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/micro/stream_bench_test.go
+git commit -m "feat(benchmark): add stream proxy micro benchmarks"
+```
+
+---
+
+## Task 4: 补充缺失的微基准 — Cache
+
+**Files:**
+- Create: `internal/benchmark/micro/cache_bench_test.go`
+
+- [ ] **Step 1: 编写 cache 基准测试**
+
+```go
+package micro
+
+import (
+	"testing"
+	"time"
+
+	"rua.plus/lolly/internal/cache"
+)
+
+func BenchmarkCacheGet(b *testing.B) {
+	c := cache.New(cache.Config{MaxEntries: 10000})
+	_ = c.Set("key", []byte("value"), time.Hour)
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_, _ = c.Get("key")
+	}
+}
+
+func BenchmarkCacheSet(b *testing.B) {
+	c := cache.New(cache.Config{MaxEntries: 10000})
+	value := []byte("value")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = c.Set("key"+string(rune(b.N)), value, time.Hour)
+	}
+}
+
+func BenchmarkCacheGetConcurrent(b *testing.B) {
+	c := cache.New(cache.Config{MaxEntries: 10000})
+	for i := 0; i < 1000; i++ {
+		_ = c.Set(string(rune(i)), []byte("value"), time.Hour)
+	}
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
+		i := 0
+		for pb.Next() {
+			_, _ = c.Get(string(rune(i % 1000)))
+			i++
+		}
+	})
+}
+
+func BenchmarkCacheSetConcurrent(b *testing.B) {
+	c := cache.New(cache.Config{MaxEntries: 10000})
+	value := []byte("value")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
+		i := 0
+		for pb.Next() {
+			_ = c.Set(string(rune(i)), value, time.Hour)
+			i++
+		}
+	})
+}
+```
+
+- [ ] **Step 2: 运行测试验证**
+
+```bash
+go test -bench=. -benchmem ./internal/benchmark/micro/cache_bench_test.go
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/micro/cache_bench_test.go
+git commit -m "feat(benchmark): add cache micro benchmarks"
+```
+
+---
+
+## Task 5: 补充缺失的微基准 — Lua
+
+**Files:**
+- Create: `internal/benchmark/micro/lua_bench_test.go`
+
+- [ ] **Step 1: 编写 Lua 基准测试**
+
+```go
+package micro
+
+import (
+	"testing"
+
+	"rua.plus/lolly/internal/lua"
+)
+
+func BenchmarkLuaSimpleScript(b *testing.B) {
+	engine := lua.NewEngine()
+	defer engine.Close()
+
+	script := `
+		local a = 1 + 2
+		return a
+	`
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = engine.ExecuteString(script)
+	}
+}
+
+func BenchmarkLuaNginxAPI(b *testing.B) {
+	engine := lua.NewEngine()
+	defer engine.Close()
+
+	script := `
+		ngx.var.request_uri = "/test"
+		return ngx.var.request_uri
+	`
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = engine.ExecuteString(script)
+	}
+}
+
+func BenchmarkLuaJSONEncode(b *testing.B) {
+	engine := lua.NewEngine()
+	defer engine.Close()
+
+	script := `
+		local json = require("cjson")
+		local t = {name = "test", value = 123}
+		return json.encode(t)
+	`
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = engine.ExecuteString(script)
+	}
+}
+```
+
+- [ ] **Step 2: 运行测试验证**
+
+```bash
+go test -bench=. -benchmem ./internal/benchmark/micro/lua_bench_test.go
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/micro/lua_bench_test.go
+git commit -m "feat(benchmark): add lua engine micro benchmarks"
+```
+
+---
+
+## Task 6: 创建集成基准测试 — Server
+
+**Files:**
+- Create: `internal/benchmark/integration/server_bench_test.go`
+
+- [ ] **Step 1: 编写服务器集成基准**
+
+```go
+package integration
+
+import (
+	"fmt"
+	"testing"
+
+	"github.com/valyala/fasthttp"
+	"rua.plus/lolly/internal/config"
+	"rua.plus/lolly/internal/server"
+)
+
+func BenchmarkServerStaticRequest(b *testing.B) {
+	cfg := &config.Config{
+		Servers: []config.ServerConfig{{
+			Listen: "127.0.0.1:0",
+			Static: []config.StaticConfig{{
+				Path: "/",
+				Root: "./testdata",
+			}},
+		}},
+	}
+
+	srv := server.New(cfg)
+	go srv.Start()
+	defer srv.Stop()
+
+	// 等待服务器启动
+	addr := srv.GetAddr()
+
+	client := &fasthttp.Client{}
+	req := fasthttp.AcquireRequest()
+	resp := fasthttp.AcquireResponse()
+	defer fasthttp.ReleaseRequest(req)
+	defer fasthttp.ReleaseResponse(resp)
+
+	req.SetRequestURI("http://" + addr + "/")
+	req.Header.SetMethod("GET")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = client.Do(req, resp)
+	}
+}
+
+func BenchmarkServerProxyRequest(b *testing.B) {
+	// 启动后端服务器
+	backend := &fasthttp.Server{
+		Handler: func(ctx *fasthttp.RequestCtx) {
+			ctx.SetBodyString("ok")
+		},
+	}
+	go backend.ListenAndServe("127.0.0.1:18081")
+
+	cfg := &config.Config{
+		Servers: []config.ServerConfig{{
+			Listen: "127.0.0.1:0",
+			Proxy: []config.ProxyConfig{{
+				Path: "/api",
+				Targets: []config.ProxyTarget{{
+					URL: "http://127.0.0.1:18081",
+				}},
+			}},
+		}},
+	}
+
+	srv := server.New(cfg)
+	go srv.Start()
+	defer srv.Stop()
+
+	addr := srv.GetAddr()
+
+	client := &fasthttp.Client{}
+	req := fasthttp.AcquireRequest()
+	resp := fasthttp.AcquireResponse()
+	defer fasthttp.ReleaseRequest(req)
+	defer fasthttp.ReleaseResponse(resp)
+
+	req.SetRequestURI("http://" + addr + "/api/test")
+	req.Header.SetMethod("GET")
+
+	b.ReportAllocs()
+	b.ResetTimer()
+	for b.Loop() {
+		_ = client.Do(req, resp)
+	}
+}
+```
+
+- [ ] **Step 2: 运行测试验证**
+
+```bash
+go test -bench=. -benchmem ./internal/benchmark/integration/server_bench_test.go
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/benchmark/integration/server_bench_test.go
+git commit -m "feat(benchmark): add server integration benchmarks"
+```
+
+---
+
+## Task 7: 创建系统压测脚本
+
+**Files:**
+- Create: `internal/benchmark/system/bench.sh`
+- Create: `internal/benchmark/system/static.lua`
+- Create: `internal/benchmark/system/proxy.lua`
+
+- [ ] **Step 1: 编写 wrk 压测脚本 — 静态文件**
+
+`internal/benchmark/system/static.lua`:
+
+```lua
+-- wrk static file benchmark script
+wrk.method = "GET"
+wrk.headers["Accept"] = "text/html"
+
+-- 随机访问不同路径增加真实感
+math.randomseed(os.time())
+
+request = function()
+    local paths = {"/", "/index.html", "/about.html", "/contact.html"}
+    local path = paths[math.random(#paths)]
+    return wrk.format(nil, path)
+end
+
+response = function(status, headers, body)
+    if status ~= 200 then
+        print("Error: " .. status)
+    end
+end
+```
+
+- [ ] **Step 2: 编写 wrk 压测脚本 — 代理**
+
+`internal/benchmark/system/proxy.lua`:
+
+```lua
+-- wrk proxy benchmark script
+wrk.method = "GET"
+wrk.headers["Accept"] = "application/json"
+
+request = function()
+    local paths = {"/api/users", "/api/posts", "/api/comments"}
+    local path = paths[math.random(#paths)]
+    return wrk.format(nil, path)
+end
+```
+
+- [ ] **Step 3: 编写主压测脚本**
+
+`internal/benchmark/system/bench.sh`:
+
+```bash
+#!/bin/bash
+set -e
+
+# Lolly System Benchmark Suite
+# Usage: ./bench.sh [lolly_addr] [duration]
+
+ADDR=${1:-"http://127.0.0.1:8080"}
+DURATION=${2:-"30s"}
+CONNECTIONS=${3:-400}
+THREADS=${4:-12}
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+RESULTS_DIR="${SCRIPT_DIR}/../../../benchmarks/$(date +%Y%m%d-%H%M%S)"
+mkdir -p "$RESULTS_DIR"
+
+echo "=== Lolly System Benchmark ==="
+echo "Target: $ADDR"
+echo "Duration: $DURATION"
+echo "Connections: $CONNECTIONS"
+echo "Threads: $THREADS"
+echo "Results: $RESULTS_DIR"
+echo ""
+
+# Check tools
+check_tool() {
+    if ! command -v "$1" &> /dev/null; then
+        echo "Warning: $1 not found, skipping related tests"
+        return 1
+    fi
+    return 0
+}
+
+# 1. Static file benchmark
+echo "--- Static File Benchmark ---"
+if check_tool wrk; then
+    wrk -t$THREADS -c$CONNECTIONS -d$DURATION \
+        -s "$SCRIPT_DIR/static.lua" \
+        "$ADDR" > "$RESULTS_DIR/static.txt"
+    echo "Static: $(grep 'Requests/sec' "$RESULTS_DIR/static.txt" || echo 'N/A')"
+fi
+
+# 2. Proxy benchmark
+echo ""
+echo "--- Proxy Benchmark ---"
+if check_tool wrk; then
+    wrk -t$THREADS -c$CONNECTIONS -d$DURATION \
+        -s "$SCRIPT_DIR/proxy.lua" \
+        "$ADDR/api" > "$RESULTS_DIR/proxy.txt"
+    echo "Proxy: $(grep 'Requests/sec' "$RESULTS_DIR/proxy.txt" || echo 'N/A')"
+fi
+
+# 3. HTTP/2 benchmark
+echo ""
+echo "--- HTTP/2 Benchmark ---"
+if check_tool h2load; then
+    h2load -n100000 -c100 -m10 "$ADDR" > "$RESULTS_DIR/http2.txt" 2>&1 || true
+    echo "HTTP/2: $(grep 'finished' "$RESULTS_DIR/http2.txt" || echo 'N/A')"
+fi
+
+# 4. Latency distribution with oha
+echo ""
+echo "--- Latency Distribution ---"
+if check_tool oha; then
+    oha -z $DURATION -c $CONNECTIONS "$ADDR" > "$RESULTS_DIR/latency.txt"
+    echo "Latency: $(grep 'Success rate' "$RESULTS_DIR/latency.txt" || echo 'N/A')"
+fi
+
+echo ""
+echo "=== Results saved to $RESULTS_DIR ==="
+```
+
+- [ ] **Step 4: 添加执行权限**
+
+```bash
+chmod +x internal/benchmark/system/bench.sh
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add internal/benchmark/system/
+git commit -m "feat(benchmark): add system benchmark scripts"
+```
+
+---
+
+## Task 8: 创建一键全量基准脚本
+
+**Files:**
+- Create: `scripts/bench-suite.sh`
+- Modify: `Makefile`
+
+- [ ] **Step 1: 编写一键基准脚本**
+
+`scripts/bench-suite.sh`:
+
+```bash
+#!/bin/bash
+set -e
+
+# Run complete benchmark suite and save results
+
+VERSION=$(git describe --tags --always --dirty 2>/dev/null || echo "dev")
+RESULTS_DIR="benchmarks/$VERSION"
+mkdir -p "$RESULTS_DIR/pprof"
+
+echo "=== Lolly Benchmark Suite v$VERSION ==="
+echo "Results: $RESULTS_DIR"
+echo ""
+
+# 1. Micro benchmarks
+echo "--- Running Micro Benchmarks ---"
+go test -bench=. -benchmem \
+    ./internal/benchmark/micro/... \
+    > "$RESULTS_DIR/micro.txt" 2>&1 || true
+
+echo "Micro benchmarks done"
+
+# 2. Integration benchmarks
+echo ""
+echo "--- Running Integration Benchmarks ---"
+go test -bench=. -benchmem \
+    ./internal/benchmark/integration/... \
+    > "$RESULTS_DIR/integration.txt" 2>&1 || true
+
+echo "Integration benchmarks done"
+
+# 3. Existing package benchmarks
+echo ""
+echo "--- Running Package Benchmarks ---"
+go test -bench=. -benchmem \
+    ./internal/loadbalance/... \
+    ./internal/matcher/... \
+    ./internal/proxy/... \
+    ./internal/middleware/... \
+    > "$RESULTS_DIR/packages.txt" 2>&1 || true
+
+echo "Package benchmarks done"
+
+# 4. Summary
+echo ""
+echo "=== Results Summary ==="
+echo "Micro:        $RESULTS_DIR/micro.txt"
+echo "Integration:  $RESULTS_DIR/integration.txt"
+echo "Packages:     $RESULTS_DIR/packages.txt"
+
+if command -v benchstat &> /dev/null; then
+    echo ""
+    echo "--- Top Results ---"
+    grep -h "Benchmark" "$RESULTS_DIR"/*.txt | head -20
+fi
+
+echo ""
+echo "All results saved to $RESULTS_DIR"
+```
+
+- [ ] **Step 2: 添加 Makefile 目标**
+
+在 `Makefile` 中添加：
+
+```makefile
+.PHONY: bench bench-stat bench-suite
+
+# Run all benchmarks
+bench:
+	go test -bench=. -benchmem ./internal/benchmark/micro/... ./internal/benchmark/integration/...
+
+# Run benchmarks and show statistics
+bench-stat: bench
+	@benchstat $(shell ls benchmarks/*/micro.txt 2>/dev/null | tail -1)
+
+# Run complete benchmark suite
+bench-suite:
+	@bash scripts/bench-suite.sh
+
+# Run system benchmarks (requires running server)
+bench-system:
+	@bash internal/benchmark/system/bench.sh
+```
+
+- [ ] **Step 3: 添加执行权限**
+
+```bash
+chmod +x scripts/bench-suite.sh
+```
+
+- [ ] **Step 4: 运行测试**
+
+```bash
+make bench-suite
+```
+
+Expected: 脚本运行成功，结果保存到 `benchmarks/dev/` 目录
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add scripts/bench-suite.sh Makefile
+git commit -m "feat(benchmark): add one-click benchmark suite"
+```
+
+---
+
+## Task 9: 运行第一轮全量基准 → 生成 Baseline
+
+**Files:**
+- Create: `benchmarks/v0.4.0/*.txt`
+
+- [ ] **Step 1: 运行微基准**
+
+```bash
+go test -bench=. -benchmem \
+    ./internal/benchmark/micro/... \
+    > benchmarks/v0.4.0/micro.txt
+```
+
+- [ ] **Step 2: 运行已有包的基准**
+
+```bash
+go test -bench=. -benchmem \
+    ./internal/loadbalance/... \
+    ./internal/matcher/... \
+    ./internal/proxy/... \
+    ./internal/middleware/... \
+    ./internal/server/... \
+    ./internal/cache/... \
+    ./internal/stream/... \
+    ./internal/resolver/... \
+    ./internal/variable/... \
+    ./internal/lua/... \
+    > benchmarks/v0.4.0/packages.txt
+```
+
+- [ ] **Step 3: 格式化基准结果**
+
+```bash
+# 如果安装了 benchstat
+benchstat benchmarks/v0.4.0/micro.txt
+benchstat benchmarks/v0.4.0/packages.txt
+```
+
+- [ ] **Step 4: Commit baseline**
+
+```bash
+git add benchmarks/v0.4.0/
+git commit -m "chore(benchmark): add v0.4.0 baseline performance data"
+```
+
+---
+
+## Task 10: 采集 pprof 数据
+
+**Files:**
+- Create: `benchmarks/v0.4.0/pprof/*.prof`
+
+**前置条件**: 需要启动一个配置了 pprof 的 lolly 服务器
+
+- [ ] **Step 1: 启动带 pprof 的测试服务器**
+
+创建临时测试配置 `benchmark-pprof.yaml`:
+
+```yaml
+servers:
+  - listen: ":8080"
+    static:
+      - path: "/"
+        root: "./testdata"
+    proxy:
+      - path: "/api"
+        targets:
+          - url: "http://127.0.0.1:18081"
+
+monitoring:
+  pprof:
+    enabled: true
+    path: "/debug/pprof"
+    allow:
+      - "127.0.0.1"
+```
+
+启动后端 mock 服务器（可以用 Python/Node 快速启动一个 echo 服务）
+
+启动 lolly:
+
+```bash
+./bin/lolly -c benchmark-pprof.yaml &
+LOLLY_PID=$!
+```
+
+- [ ] **Step 2: 采集 CPU profile**
+
+```bash
+curl -s "http://localhost:8080/debug/pprof/profile?seconds=30" \
+    > benchmarks/v0.4.0/pprof/cpu.prof
+```
+
+- [ ] **Step 3: 采集 Heap profile**
+
+```bash
+curl -s "http://localhost:8080/debug/pprof/heap" \
+    > benchmarks/v0.4.0/pprof/heap.prof
+```
+
+- [ ] **Step 4: 采集 Allocs profile**
+
+```bash
+curl -s "http://localhost:8080/debug/pprof/allocs" \
+    > benchmarks/v0.4.0/pprof/allocs.prof
+```
+
+- [ ] **Step 5: 采集 Goroutine profile**
+
+```bash
+curl -s "http://localhost:8080/debug/pprof/goroutine" \
+    > benchmarks/v0.4.0/pprof/goroutine.prof
+```
+
+- [ ] **Step 6: 停止测试服务器**
+
+```bash
+kill $LOLLY_PID
+rm benchmark-pprof.yaml
+```
+
+- [ ] **Step 7: Commit pprof 数据**
+
+```bash
+git add benchmarks/v0.4.0/pprof/
+git commit -m "chore(benchmark): add v0.4.0 pprof profiles"
+```
+
+---
+
+## Task 11: 分析瓶颈 → 生成性能报告
+
+**Files:**
+- Create: `benchmarks/v0.4.0/REPORT.md`
+
+- [ ] **Step 1: 分析 CPU profile**
+
+```bash
+go tool pprof -top benchmarks/v0.4.0/pprof/cpu.prof > benchmarks/v0.4.0/cpu-top.txt
+```
+
+查看 Top 20 CPU 消耗函数：
+
+```bash
+go tool pprof -top -n 20 benchmarks/v0.4.0/pprof/cpu.prof
+```
+
+- [ ] **Step 2: 分析 Heap profile**
+
+```bash
+go tool pprof -top benchmarks/v0.4.0/pprof/heap.prof > benchmarks/v0.4.0/heap-top.txt
+```
+
+- [ ] **Step 3: 分析 Allocs profile**
+
+```bash
+go tool pprof -top benchmarks/v0.4.0/pprof/allocs.prof > benchmarks/v0.4.0/allocs-top.txt
+```
+
+- [ ] **Step 4: 汇总生成报告**
+
+`benchmarks/v0.4.0/REPORT.md`:
+
+```markdown
+# Lolly v0.4.0 性能分析报告
+
+> 生成日期: $(date)
+
+## 1. 基准测试摘要
+
+### 微基准
+[粘贴 micro.txt 关键结果]
+
+### 包基准
+[粘贴 packages.txt 关键结果]
+
+## 2. CPU 热点 Top 10
+
+[粘贴 cpu-top.txt 结果]
+
+## 3. 内存分配热点 Top 10
+
+[粘贴 allocs-top.txt 结果]
+
+## 4. 内存占用 Top 10
+
+[粘贴 heap-top.txt 结果]
+
+## 5. 优化建议
+
+### P0 (高优先级)
+- [ ] [根据分析结果填写]
+
+### P1 (中优先级)
+- [ ] [根据分析结果填写]
+
+### P2 (低优先级)
+- [ ] [根据分析结果填写]
+```
+
+- [ ] **Step 5: Commit 报告**
+
+```bash
+git add benchmarks/v0.4.0/REPORT.md benchmarks/v0.4.0/*-top.txt
+git commit -m "docs(benchmark): add v0.4.0 performance analysis report"
+```
+
+---
+
+## Task 12: 实施优化（基于报告）
+
+> **注意**: 此 Task 的内容将在 Task 11 完成后根据实际瓶颈数据制定。以下为占位模板，实际实施时需替换为具体分析结果。
+
+### Task 12.1: 优化 [瓶颈1]
+
+**Files:**
+- Modify: `internal/[package]/[file].go:[line-range]`
+
+- [ ] **Step 1: 编写优化前 benchmark**
+
+```bash
+# 已有 baseline，无需重复
+```
+
+- [ ] **Step 2: 实施优化**
+
+[根据实际瓶颈实施具体优化]
+
+- [ ] **Step 3: 验证优化效果**
+
+```bash
+go test -bench=[BenchmarkName] -benchmem ./internal/[package]/...
+benchstat benchmarks/v0.4.0/old.txt benchmarks/v0.4.0/new.txt
+```
+
+Expected: 性能提升 > 5%
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/[package]/
+git commit -m "perf([package]): optimize [description]"
+```
+
+### Task 12.2-12.N: 重复优化流程
+
+对每个识别的瓶颈重复上述流程。
+
+---
+
+## Task 13: 建立性能回归检测
+
+**Files:**
+- Create: `.github/workflows/benchmark.yml` (如果恢复 CI)
+- Create: `scripts/bench-compare.sh`
+- Modify: `Makefile`
+
+- [ ] **Step 1: 创建基准对比脚本**
+
+`scripts/bench-compare.sh`:
+
+```bash
+#!/bin/bash
+set -e
+
+# Compare current benchmark against baseline
+# Usage: ./bench-compare.sh [baseline_version]
+
+BASELINE=${1:-"v0.4.0"}
+BASELINE_FILE="benchmarks/$BASELINE/packages.txt"
+CURRENT_FILE="benchmarks/current.txt"
+
+if [ ! -f "$BASELINE_FILE" ]; then
+    echo "Baseline not found: $BASELINE_FILE"
+    exit 1
+fi
+
+echo "Comparing against baseline: $BASELINE"
+
+# Run current benchmarks
+go test -bench=. -benchmem \
+    ./internal/loadbalance/... \
+    ./internal/matcher/... \
+    ./internal/proxy/... \
+    ./internal/middleware/... \
+    > "$CURRENT_FILE"
+
+# Compare
+if command -v benchstat &> /dev/null; then
+    benchstat "$BASELINE_FILE" "$CURRENT_FILE"
+else
+    echo "benchstat not found, install with: go install golang.org/x/perf/cmd/benchstat@latest"
+    exit 1
+fi
+```
+
+- [ ] **Step 2: 添加 Makefile 目标**
+
+```makefile
+.PHONY: bench-compare
+
+# Compare current performance against baseline
+bench-compare:
+	@bash scripts/bench-compare.sh
+```
+
+- [ ] **Step 3: 添加执行权限**
+
+```bash
+chmod +x scripts/bench-compare.sh
+```
+
+- [ ] **Step 4: 测试回归检测**
+
+```bash
+make bench-compare
+```
+
+Expected: 显示当前性能与 baseline 的对比，无显著退化
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add scripts/bench-compare.sh Makefile
+git commit -m "feat(benchmark): add performance regression detection"
+```
+
+---
+
+## Task 14: 最终验证
+
+- [ ] **Step 1: 全量测试通过**
+
+```bash
+make test
+```
+
+Expected: 全部 PASS
+
+- [ ] **Step 2: Race 检测通过**
+
+```bash
+go test -race ./internal/...
+```
+
+Expected: 零 race
+
+- [ ] **Step 3: Lint 通过**
+
+```bash
+make lint
+```
+
+Expected: 零 issues
+
+- [ ] **Step 4: 构建验证**
+
+```bash
+make build
+```
+
+Expected: 构建成功
+
+- [ ] **Step 5: 最终 Commit**
+
+```bash
+git log --oneline -20
+```
+
+确认所有 benchmark 相关 commit 都在。
+
+---
+
+## 附录：常用命令速查
+
+```bash
+# 运行所有微基准
+go test -bench=. -benchmem ./internal/benchmark/micro/...
+
+# 运行单个基准
+go test -bench=BenchmarkCacheGet -benchmem ./internal/benchmark/micro/...
+
+# 对比两个基准结果
+benchstat old.txt new.txt
+
+# 查看 CPU profile
+go tool pprof -http=:8081 benchmarks/v0.4.0/pprof/cpu.prof
+
+# 查看内存分配
+go tool pprof -http=:8081 benchmarks/v0.4.0/pprof/allocs.prof
+
+# 生成火焰图
+go tool pprof -png benchmarks/v0.4.0/pprof/cpu.prof > cpu-flamegraph.png
+
+# 系统压测
+make bench-system
+
+# 性能回归检测
+make bench-compare
+```
+
+---
+
+## Spec Coverage Check
+
+| Spec Section | Task |
+|-------------|------|
+| 建立 benchmark 目录结构 | Task 1 |
+| 补充 resolver 微基准 | Task 2 |
+| 补充 stream 微基准 | Task 3 |
+| 补充 cache 微基准 | Task 4 |
+| 补充 lua 微基准 | Task 5 |
+| 集成基准测试 | Task 6 |
+| 系统压测脚本 | Task 7 |
+| 一键基准脚本 | Task 8 |
+| 生成 baseline | Task 9 |
+| 采集 pprof | Task 10 |
+| 分析报告 | Task 11 |
+| 实施优化 | Task 12 |
+| 回归检测 | Task 13 |
+| 最终验证 | Task 14 |
diff --git a/docs/superpowers/specs/2026-06-03-eliminate-code-redundancy-design.md b/docs/superpowers/specs/2026-06-03-eliminate-code-redundancy-design.md
new file mode 100644
index 0000000..b83c24c
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-03-eliminate-code-redundancy-design.md
@@ -0,0 +1,213 @@
+# 消除代码冗余设计文档
+
+> **日期：** 2026-06-03  
+> **目标：** 消除 lolly 项目中的代码冗余，提升可维护性和代码质量  
+> **范围：** 死代码删除、重复模式重构、测试辅助函数提取
+
+---
+
+## 1. 问题分析
+
+通过对代码库的静态分析（`golangci-lint` + `dupl` + `unused`），发现以下冗余代码：
+
+### 1.1 死代码（Dead Code）
+
+| 文件 | 函数/方法 | 行号 | 说明 |
+|------|----------|------|------|
+| `internal/config/validate.go` | `validateStatic()` | 475 | `validateStatics()` 已内联相同逻辑，仅被测试调用 |
+| `internal/http2/server.go` | `connectionPool.get()` | 576 | 无任何引用 |
+| `internal/http2/server.go` | `connectionPool.count()` | 583 | 无任何引用 |
+| `internal/middleware/bodylimit/bodylimit.go` | `formatSize()` | 288 | 业务代码未使用，仅被测试调用；`autoindex.go` 有同名函数 |
+| `internal/middleware/security/headers.go` | `defaultSecurityHeaders()` | 295 | 仅被测试调用，业务代码未使用 |
+| `internal/middleware/security/headers.go` | `strictSecurityHeaders()` | 309 | 仅被测试调用，业务代码未使用 |
+| `internal/middleware/security/headers.go` | `developmentSecurityHeaders()` | 325 | 仅被测试调用，业务代码未使用 |
+| `internal/ssl/ocsp.go` | `extractCertificates()` | 490 | 仅被测试调用，业务代码未使用 |
+
+**排除项**（经确认实际被使用）：
+- `setupTestLogger()` - 在 `app_test.go` 中被调用 47 次
+- `canonicalHeaderKey()` - 在 `server_test.go` 中被调用
+
+### 1.2 源文件重复模式
+
+**路由注册错误处理（`internal/server/router.go`）**
+
+19 次重复模式（proxy、static、lua 三种 handler）：
+```go
+if err := s.locationEngine.AddXXX(path, handler, internal); err != nil {
+    if err := s.handleRegistrationError("type", path, err); err != nil {
+        return err
+    }
+}
+```
+
+**DEBUG 日志条件检查（`internal/proxy/proxy.go`）**
+
+5 次重复模式：
+```go
+if logging.Debug().Enabled() {
+    logging.Debug().Str("key", value).Msg("[PROXY] message")
+}
+```
+
+### 1.3 测试文件重复代码
+
+| 模式 | 出现次数 | 位置 |
+|------|---------|------|
+| `config.ProxyConfig{...}` | 184 | 各测试文件 |
+| `config.ProxyTimeout{Connect: 5 * time.Second}` | 85 | 各测试文件 |
+| `targets := []*loadbalance.Target{{URL: "http://..."}}` | 123 | 各测试文件 |
+| `targets[0].Healthy.Store(true)` | 41 | 各测试文件 |
+
+---
+
+## 2. 设计方案
+
+### 2.1 阶段 1：死代码删除
+
+**策略**：直接删除未使用的函数，同时清理仅被测试调用的函数的测试代码。
+
+**处理清单**：
+1. `validateStatic()` - 删除函数，将测试迁移到测试 `validateStatics()`
+2. `connectionPool.get()` / `connectionPool.count()` - 直接删除
+3. `formatSize()` (bodylimit) - 删除函数，删除测试；`autoindex.go` 的同名函数保留
+4. `defaultSecurityHeaders()` / `strictSecurityHeaders()` / `developmentSecurityHeaders()` - 删除函数，删除测试
+5. `extractCertificates()` - 删除函数，删除测试
+
+### 2.2 阶段 2：重复模式重构
+
+**2.2.1 路由注册辅助函数**
+
+在 `internal/server/router.go` 中提取辅助函数：
+
+```go
+// registerRoute 注册路由并处理错误
+func (s *Server) registerRoute(
+    locType string,
+    path string,
+    handler fasthttp.RequestHandler,
+    internal bool,
+    source string,
+) error {
+    var err error
+    switch locType {
+    case matcher.LocationTypeExact:
+        err = s.locationEngine.AddExact(path, handler, internal)
+    case matcher.LocationTypePrefixPriority:
+        err = s.locationEngine.AddPrefixPriority(path, handler, internal)
+    case matcher.LocationTypeRegex:
+        err = s.locationEngine.AddRegex(path, handler, false, internal)
+    case matcher.LocationTypeRegexCaseless:
+        err = s.locationEngine.AddRegex(path, handler, true, internal)
+    case matcher.LocationTypeNamed:
+        err = s.locationEngine.AddNamed(path, handler)
+    default:
+        err = s.locationEngine.AddPrefix(path, handler, internal)
+    }
+    if err != nil {
+        return s.handleRegistrationError(source, path, err)
+    }
+    return nil
+}
+```
+
+**2.2.2 DEBUG 日志辅助函数**
+
+在 `internal/proxy/proxy.go` 中提取辅助函数：
+
+```go
+// proxyDebugLog 在 DEBUG 级别记录代理日志
+func proxyDebugLog(msg string, kv ...interface{}) {
+    if !logging.Debug().Enabled() {
+        return
+    }
+    event := logging.Debug()
+    for i := 0; i < len(kv)-1; i += 2 {
+        key, ok := kv[i].(string)
+        if !ok {
+            continue
+        }
+        switch v := kv[i+1].(type) {
+        case string:
+            event = event.Str(key, v)
+        case int:
+            event = event.Int(key, v)
+        case bool:
+            event = event.Bool(key, v)
+        }
+    }
+    event.Msg(msg)
+}
+```
+
+### 2.3 阶段 3：测试辅助函数
+
+在 `internal/testutil/` 包中创建辅助函数：
+
+```go
+package testutil
+
+import (
+    "rua.plus/lolly/internal/config"
+    "rua.plus/lolly/internal/loadbalance"
+)
+
+// NewTestProxyConfig 创建测试用的代理配置
+func NewTestProxyConfig(path string, targets []string) *config.ProxyConfig {
+    cfg := &config.ProxyConfig{
+        Path:        path,
+        LoadBalance: "round_robin",
+        Timeout: config.ProxyTimeout{
+            Connect: 5 * time.Second,
+            Read:    30 * time.Second,
+            Write:   30 * time.Second,
+        },
+    }
+    // ...
+    return cfg
+}
+
+// NewTestTarget 创建测试用的代理目标
+func NewTestTarget(url string) *loadbalance.Target {
+    return &loadbalance.Target{URL: url}
+}
+
+// NewTestHealthyTarget 创建已标记为健康的测试目标
+func NewTestHealthyTarget(url string) *loadbalance.Target {
+    t := NewTestTarget(url)
+    t.Healthy.Store(true)
+    return t
+}
+```
+
+**迁移策略**：
+1. 先创建辅助函数
+2. 逐步替换测试文件中的重复代码
+3. 每次替换后运行测试确保通过
+
+---
+
+## 3. 风险评估
+
+| 风险 | 可能性 | 影响 | 缓解措施 |
+|------|--------|------|---------|
+| 删除的函数实际上被间接使用 | 低 | 高 | 通过 `grep` 确认无引用后再删除 |
+| 重构引入新 bug | 中 | 中 | 每次变更后运行完整测试套件 |
+| 测试辅助函数改变测试语义 | 低 | 中 | 保持默认配置与原始代码一致 |
+
+---
+
+## 4. 验收标准
+
+- [ ] `golangci-lint run --enable=unused ./...` 无 unused 错误
+- [ ] `golangci-lint run --enable=dupl ./...` 源文件无 dupl 错误
+- [ ] `go test ./...` 全部通过
+- [ ] 代码总行数减少 >200 行
+- [ ] 测试文件中的 `ProxyConfig{` 字面量减少 >50%
+
+---
+
+## 5. 实施顺序
+
+1. **阶段 1（死代码）** - 低风险，快速见效
+2. **阶段 2（源文件重构）** - 中等风险，改善可维护性
+3. **阶段 3（测试辅助函数）** - 低风险，最大减负
diff --git a/docs/superpowers/specs/2026-06-08-loadbalance-enhancement-design.md b/docs/superpowers/specs/2026-06-08-loadbalance-enhancement-design.md
new file mode 100644
index 0000000..db3c6be
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-08-loadbalance-enhancement-design.md
@@ -0,0 +1,389 @@
+# Lolly 负载均衡增强设计 - Least Time & Session Sticky
+
+**日期**: 2026-06-08
+**状态**: Approved
+
+## 1. 背景与目标
+
+Lolly 当前支持 6 种负载均衡算法：Round Robin、Weighted Round Robin、Least Connections、IP Hash、Consistent Hash、Random（Power of Two Choices）。
+
+与 nginx Plus 对比，Lolly 缺少两个重要特性：
+1. **Least Time** - 基于响应时间选择最优后端
+2. **Session Sticky** - Cookie-based 会话保持
+
+本文档设计这两个算法的高性能实现方案，目标是：
+- **零锁设计**：原子操作替代互斥锁
+- **零堆分配**：预分配 + 对象池
+- **纳秒级延迟**：单次选择 < 100ns
+- **与现有代码风格一致**
+
+## 2. 设计概览
+
+```
+                    +----------------------+
+                    |     Proxy Request    |
+                    +----------+-----------+
+                               |
+              +----------------+----------------+
+              |                                 |
+        +-----v------+                  +------v------+
+        | Least Time |                  | Sticky      |
+        | Select     |                  | Route       |
+        +-----+------+                  +------+------+
+              |                                 |
+        +-----v------+                  +------v------+
+        | EWMA Stats |                  | Cookie      |
+        | (atomic)   |                  | + Shard Map |
+        +------------+                  +-------------+
+```
+
+## 3. Least Time 设计
+
+### 3.1 核心算法
+
+基于 EWMA（指数加权移动平均）的响应时间统计：
+
+```
+new_avg = alpha * new_sample + (1 - alpha) * old_avg
+```
+
+- `alpha` 默认 0.3，可配置（0-1 范围）
+- alpha 越大，对新样本越敏感，收敛越快
+- 使用 atomic.Int64 存储纳秒值，避免浮点运算
+
+### 3.2 数据结构
+
+```go
+// EWMAStats 原子 EWMA 统计器
+type EWMAStats struct {
+    headerTime    atomic.Int64  // EWMA 首字节时间（纳秒）
+    lastByteTime  atomic.Int64  // EWMA 完整响应时间（纳秒）
+    sampleCount   atomic.Int64  // 样本计数
+}
+
+// 使用固定点整数运算避免浮点
+// 将 alpha 编码为定点数：alpha * 1000
+const alphaScale = 1000
+
+func (e *EWMAStats) Record(headerTime, lastByteTime time.Duration) {
+    // 原子更新，无锁
+    e.updateAtomic(&e.headerTime, headerTime)
+    e.updateAtomic(&e.lastByteTime, lastByteTime)
+    e.sampleCount.Add(1)
+}
+```
+
+### 3.3 LeastTime Balancer
+
+```go
+type LeastTime struct {
+    metric string  // "header" | "last_byte"
+}
+
+func (l *LeastTime) Select(targets []*Target) *Target {
+    var selected *Target
+    var minTime int64 = -1
+    
+    for _, t := range targets {
+        if !t.IsAvailable() {
+            continue
+        }
+        
+        // 原子读取响应时间
+        var currentTime int64
+        if l.metric == "header" {
+            currentTime = t.Stats.HeaderTime()
+        } else {
+            currentTime = t.Stats.LastByteTime()
+        }
+        
+        // 无统计样本时给默认值，避免新节点被饿死
+        if currentTime == 0 {
+            currentTime = defaultResponseTime
+        }
+        
+        if selected == nil || currentTime < minTime {
+            selected = t
+            minTime = currentTime
+        }
+    }
+    
+    return selected
+}
+```
+
+### 3.4 性能指标
+
+| 操作 | 延迟 | 锁 | 堆分配 |
+|------|------|-----|--------|
+| Record | ~20ns | 无 | 0 |
+| Select | ~50ns | 无 | 0 |
+
+### 3.5 配置
+
+```yaml
+proxy:
+  - path: /api
+    load_balance: least_time
+    least_time_metric: last_byte   # header | last_byte（默认）
+    least_time_alpha: 0.3          # 0-1，越大越敏感（默认 0.3）
+    least_time_default_ns: 1000000 # 无样本时的默认值（默认 1ms）
+```
+
+### 3.6 Proxy 层集成
+
+```go
+// 在请求完成后调用
+func (p *Proxy) recordResponseTime(target *loadbalance.Target, start time.Time) {
+    if tracker, ok := p.balancer.(ResponseTimeRecorder); ok {
+        headerTime := target.HeaderReceived.Sub(start)
+        lastByteTime := time.Since(start)
+        tracker.RecordResponseTime(target, headerTime, lastByteTime)
+    }
+}
+```
+
+## 4. Session Sticky 设计
+
+### 4.1 核心算法
+
+基于 Cookie 的路由表 + 分片锁：
+
+- Cookie 值编码：`base64(target_url + "|" + expires_timestamp)`
+- 256 个分片，每个分片独立 `sync.RWMutex`
+- 分片索引：`fnvHash64a(cookie_value) % 256`
+- 后台 goroutine 每 60s 清理过期 session
+
+### 4.2 数据结构
+
+```go
+// StickySession Sticky Session 负载均衡器
+type StickySession struct {
+    config      StickyConfig
+    fallback    loadbalance.Balancer  // fallback 算法
+    
+    // 256 个分片，降低锁冲突概率
+    shards      [256]*stickyShard
+    cleaner     *time.Ticker
+    stopCh      chan struct{}
+    started     atomic.Bool
+}
+
+type stickyShard struct {
+    mu       sync.RWMutex
+    sessions map[string]*stickyEntry  // key: cookie value
+}
+
+type stickyEntry struct {
+    targetURL   string
+    expiresAt   int64  // Unix 纳秒
+    createdAt   int64  // Unix 纳秒
+}
+```
+
+### 4.3 路由流程
+
+```
+请求到达
+  |
+  v
+检查 Cookie "lolly_route"
+  |
+  +-- 存在 -->
+  |            解码 cookie 值
+  |            查找目标是否健康
+  |            |
+  |            +-- 健康 --> 路由到该目标
+  |            |
+  |            +-- 不健康 -> 删除 session
+  |                         用 fallback 选择新目标
+  |                         设置新 cookie
+  |
+  +-- 不存在 -->
+                用 fallback 选择目标
+                设置 Set-Cookie 响应头
+```
+
+### 4.4 Cookie 编码
+
+```go
+// encodeCookie 编码路由信息到 cookie 值
+// 格式: base64(target_url + "|" + expires_timestamp)
+func encodeCookie(targetURL string, expires time.Time) string {
+    raw := targetURL + "|" + strconv.FormatInt(expires.Unix(), 10)
+    return base64.URLEncoding.EncodeToString([]byte(raw))
+}
+
+// decodeCookie 解码 cookie 值
+func decodeCookie(value string) (targetURL string, expires time.Time, ok bool) {
+    raw, err := base64.URLEncoding.DecodeString(value)
+    if err != nil {
+        return
+    }
+    parts := strings.Split(string(raw), "|")
+    if len(parts) != 2 {
+        return
+    }
+    ts, err := strconv.ParseInt(parts[1], 10, 64)
+    if err != nil {
+        return
+    }
+    return parts[0], time.Unix(ts, 0), true
+}
+```
+
+### 4.5 选择逻辑
+
+```go
+func (s *StickySession) Select(ctx *fasthttp.RequestCtx, targets []*Target) *Target {
+    // 1. 检查 cookie
+    cookie := ctx.Request.Header.Cookie(s.config.Name)
+    if len(cookie) > 0 {
+        targetURL, _, ok := decodeCookie(string(cookie))
+        if ok {
+            // 查找目标
+            for _, t := range targets {
+                if t.URL == targetURL && t.IsAvailable() {
+                    return t
+                }
+            }
+            // 目标不可用，删除 session（延迟删除）
+            s.deleteSession(string(cookie))
+        }
+    }
+    
+    // 2. 使用 fallback 算法选择
+    selected := s.fallback.Select(targets)
+    if selected == nil {
+        return nil
+    }
+    
+    // 3. 种 cookie
+    s.setCookie(ctx, selected.URL)
+    
+    // 4. 记录 session
+    s.recordSession(selected.URL)
+    
+    return selected
+}
+```
+
+### 4.6 性能指标
+
+| 操作 | 延迟 | 锁冲突概率 |
+|------|------|-----------|
+| Session 查找 | ~30ns | 0.4% (256 分片) |
+| Session 写入 | ~50ns | 0.4% |
+| 清理过期 | 后台，不影响主路径 | - |
+
+### 4.7 配置
+
+```yaml
+proxy:
+  - path: /api
+    load_balance: sticky
+    sticky:
+      enabled: true
+      name: "lolly_route"        # cookie 名称（默认）
+      expires: "1h"              # session 有效期（默认 1h）
+      domain: ""                 # cookie domain
+      path: "/"                  # cookie path（默认 /）
+      secure: false              # Secure flag
+      http_only: true            # HttpOnly flag（默认 true）
+      same_site: "Lax"           # SameSite（默认 Lax）
+    # fallback 算法配置
+    fallback_balance: round_robin  # 首次路由和失效回退算法
+```
+
+## 5. 扩展 Balancer 接口
+
+为支持 Least Time 的响应时间记录，扩展一个可选接口：
+
+```go
+// ResponseTimeRecorder 响应时间记录接口
+// 实现此接口的 balancer 可在请求完成后收到响应时间统计
+type ResponseTimeRecorder interface {
+    RecordResponseTime(target *Target, headerTime, lastByteTime time.Duration)
+}
+```
+
+**为什么用接口扩展而非修改 Balancer？**
+- 不破坏现有 6 个 balancer 的实现
+- 类型断言在运行时判断，无性能开销
+- 符合 Go 接口隔离原则
+
+## 6. 文件改动清单
+
+### 6.1 新增文件
+
+| 文件 | 行数 | 说明 |
+|------|------|------|
+| `internal/loadbalance/ewma.go` | ~80 | 原子 EWMA 统计器 |
+| `internal/loadbalance/least_time.go` | ~120 | Least Time balancer |
+| `internal/loadbalance/sticky.go` | ~280 | Session Sticky balancer |
+| `internal/loadbalance/sticky_config.go` | ~30 | Sticky 配置结构体 |
+| `internal/loadbalance/least_time_test.go` | ~200 | Least Time 单元测试 |
+| `internal/loadbalance/sticky_test.go` | ~250 | Session Sticky 单元测试 |
+
+### 6.2 修改文件
+
+| 文件 | 修改内容 |
+|------|----------|
+| `internal/loadbalance/algorithms.go` | 添加 `least_time`、`sticky` 到 validAlgorithms |
+| `internal/loadbalance/balancer.go` | Target 增加 `Stats *EWMAStats` 字段 |
+| `internal/config/proxy_config.go` | 添加 `LeastTimeConfig`、`StickyConfig` |
+| `internal/config/defaults.go` | 添加新配置项默认值注释 |
+| `internal/config/validate.go` | 验证 `least_time_metric`、`fallback_balance` |
+| `internal/proxy/proxy.go` | createBalancer 增加新算法；请求完成后调用 RecordResponseTime |
+| `internal/proxy/target_selector.go` | Select 支持 StickySession（需 ctx 参数） |
+
+## 7. 测试策略
+
+### 7.1 Least Time 测试
+
+- **基准测试**: 测量 Select/Record 延迟
+- **并发测试**: 100 goroutine 并发 Record + Select，验证无数据竞争
+- **收敛测试**: 验证 EWMA 对新旧样本的权重分配
+- **故障转移**: 验证目标失效后选择其他目标
+
+### 7.2 Session Sticky 测试
+
+- **Cookie 编码/解码**: 验证 round-trip 正确性
+- **路由一致性**: 相同 cookie 始终路由到同一目标
+- **目标失效**: 目标不可用时 fallback 并更新 cookie
+- **过期清理**: 验证过期 session 被清理
+- **并发安全**: 100 goroutine 并发读写，验证无数据竞争
+- **分片均衡**: 验证 hash 分布均匀
+
+## 8. 与 nginx Plus 对比
+
+| 特性 | nginx Plus | Lolly 方案 |
+|------|------------|------------|
+| Least Time header | ✅ | ✅ |
+| Least Time last_byte | ✅ | ✅ |
+| EWMA 平滑 | ✅ | ✅ (alpha 可调) |
+| Session Sticky cookie | ✅ | ✅ |
+| Session Sticky learn | ✅ | ❌ (暂不支持) |
+| Secure/HttpOnly/SameSite | ✅ | ✅ |
+| 目标失效 fallback | ✅ | ✅ |
+| Session TTL | ✅ | ✅ |
+
+## 9. 风险与缓解
+
+| 风险 | 影响 | 缓解 |
+|------|------|------|
+| 新节点被饿死 | 高 | 无统计样本时给默认值 `least_time_default_ns` |
+| Sticky 内存增长 | 中 | TTL + 后台清理 + 分片限制 |
+| Cookie 过大 | 低 | 仅编码 URL + timestamp，通常 < 200 bytes |
+| 目标频繁上下线 | 中 | session 延迟删除，避免惊群 |
+
+## 10. 后续优化
+
+1. **Session Sticky Learn 模式**: 学习后端返回的 Set-Cookie，而非主动种植
+2. **Least Time 加权**: 结合权重和响应时间进行加权选择
+3. **统计持久化**: 重启后保留历史响应时间统计
+
+---
+
+**设计批准**: ✅ 已批准
+**下一步**: 编写实现计划 (writing-plans)
diff --git a/docs/superpowers/specs/2026-06-10-performance-optimization-design.md b/docs/superpowers/specs/2026-06-10-performance-optimization-design.md
new file mode 100644
index 0000000..62683aa
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-10-performance-optimization-design.md
@@ -0,0 +1,261 @@
+# 性能持续优化设计文档
+
+> **版本**: v1.0  
+> **日期**: 2026-06-10  
+> **目标**: 极致吞吐量 + 资源效率  
+> **方法**: 数据驱动优化（Benchmark → Profile → Optimize → Verify）
+
+---
+
+## 1. 总体架构
+
+整个性能优化流程分为 5 个阶段，形成持续迭代闭环：
+
+```
+┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
+│  1. 建立基准  │ → │  2. 采集数据  │ → │  3. 分析瓶颈  │ → │  4. 实施优化  │ → │  5. 回归检测  │
+│  Benchmark  │    │  Baseline   │    │   Profile   │    │   Optimize   │    │   Prevent   │
+└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
+       ↑                                                                            │
+       └──────────────────────────────── 持续迭代 ◄─────────────────────────────────┘
+```
+
+**核心原则**:
+- 每个优化必须有 benchmark 数据证明收益
+- 不优化没有数据支撑的地方
+- 建立可重复的性能测试环境
+
+---
+
+## 2. 基准测试基础设施（Benchmark Suite）
+
+### 2.1 三层基准测试体系
+
+#### 2.1.1 微基准（Micro Benchmark）— 单元级
+
+针对单个函数/模块的 Go benchmark：
+
+| 模块 | 状态 | 待补充 |
+|------|------|--------|
+| `loadbalance` | 已有 | Sticky、Least Time 极端场景 |
+| `matcher` | 已有 | 大规模路由表（1k+ location） |
+| `proxy` | 已有 | 缓存键构建、WebSocket 检测 |
+| `middleware/security` | 已有 | 限流器高并发 |
+| `middleware/compression` | 已有 | 大文件压缩 |
+| `cache` | 部分 | 完整 CRUD、并发竞争 |
+| `lua` | 部分 | 脚本执行、协程调度 |
+| `resolver` | 缺失 | DNS 查询、缓存命中 |
+| `variable` | 部分 | 复杂变量展开 |
+| `stream` | 缺失 | TCP/UDP 转发吞吐 |
+
+#### 2.1.2 集成基准（Integration Benchmark）— 端到端
+
+用 `httptest` 或真实端口测试完整请求链路：
+
+- **静态文件服务**: 小文件（1KB）、中文件（100KB）、大文件（10MB）
+- **反向代理**: 直连后端、带缓存、带负载均衡
+- **HTTPS/TLS**: 握手开销、TLS 1.2 vs 1.3
+- **HTTP/2**: 多路复用、流控
+- **HTTP/3**: QUIC 连接建立、0-RTT
+- **WebSocket**: 消息转发延迟
+- **Stream**: TCP/UDP 吞吐
+
+#### 2.1.3 系统基准（System Benchmark）— 全链路
+
+用外部压测工具测试完整服务器：
+
+- **RPS 极限测试**: 不同并发数下的吞吐量曲线
+- **延迟分布**: P50/P99/P999 延迟
+- **资源占用**: CPU、内存、goroutine 数、GC 频率
+- **连接数测试**: C10K、C100K 场景
+
+### 2.2 Benchmark 目录结构
+
+```
+internal/benchmark/
+├── micro/           # Go benchmark 文件
+│   ├── proxy_test.go
+│   ├── cache_test.go
+│   ├── lua_test.go
+│   └── ...
+├── integration/     # 集成测试风格 benchmark
+│   ├── static_bench_test.go
+│   ├── proxy_bench_test.go
+│   └── ...
+└── system/          # 外部压测脚本 + 结果
+    ├── wrk_static.sh
+    ├── wrk_proxy.sh
+    └── results/
+```
+
+### 2.3 基准收集工具
+
+- **`make bench`**: 运行所有微基准
+- **`make bench-stat`**: 生成基准报告
+- **`scripts/bench.sh`**: 一键系统压测
+- **benchstat**: 对比新旧基准数据
+
+---
+
+## 3. 性能数据采集与分析流程
+
+### 3.1 Baseline 采集步骤
+
+#### 第一步：微基准全量运行
+
+```bash
+# 运行所有微基准，保存结果
+go test -bench=. -benchmem ./internal/benchmark/micro/... > benchmark-v0.4.0.txt
+
+# 使用 benchstat 格式化
+benchstat benchmark-v0.4.0.txt
+```
+
+#### 第二步：集成基准运行
+
+```bash
+# 运行集成 benchmark
+go test -bench=Benchmark -benchmem ./internal/benchmark/integration/...
+```
+
+#### 第三步：系统压测（外部工具）
+
+```bash
+# 静态文件压测
+wrk -t12 -c400 -d30s http://localhost:8080/
+
+# 代理压测
+wrk -t12 -c400 -d30s http://localhost:8080/api/
+
+# HTTP/2 压测
+h2load -n100000 -c100 -m10 http://localhost:8080/
+```
+
+#### 第四步：pprof 数据采集
+
+```bash
+# CPU profile（30秒）
+curl http://localhost:8080/debug/pprof/profile?seconds=30 > cpu.prof
+
+# Heap profile
+curl http://localhost:8080/debug/pprof/heap > heap.prof
+
+# Allocs profile（分配热点）
+curl http://localhost:8080/debug/pprof/allocs > allocs.prof
+
+# Goroutine profile
+curl http://localhost:8080/debug/pprof/goroutine > goroutine.prof
+```
+
+### 3.2 分析工具链
+
+| 工具 | 用途 | 命令 |
+|------|------|------|
+| `go tool pprof` | CPU/内存分析 | `go tool pprof -http=:8081 cpu.prof` |
+| `go tool trace` | 调度/延迟分析 | `go test -trace=trace.out` |
+| `benchstat` | 基准对比 | `benchstat old.txt new.txt` |
+| `go test -memprofile` | 分配追踪 | 集成到 benchmark |
+| `perf` (Linux) | 系统级分析 | `perf record -g ./lolly` |
+
+### 3.3 分析维度
+
+1. **CPU 热点**: 哪些函数消耗最多 CPU？
+2. **内存分配**: 每请求分配次数和大小？
+3. **锁竞争**: `sync.Mutex` / `sync.RWMutex` 的争用情况？
+4. **系统调用**: `syscall` / `cgo` 开销？
+5. **GC 压力**: GC 频率、STW 时间？
+6. **网络 I/O**: 连接建立、读写延迟？
+
+### 3.4 瓶颈识别模板
+
+```
+性能分析报告 v0.4.0 Baseline
+=============================
+
+1. CPU 热点 Top 5
+   - runtime.mallocgc (12.3%) ← 分配开销
+   - runtime.scanobject (8.7%) ← GC 扫描
+   - proxy.(*Proxy).ServeHTTP (7.2%)
+   - matcher.(*LocationEngine).Match (5.1%)
+   - compress/flate.(*compressor).write (4.8%)
+
+2. 每请求分配 Top 5
+   - time.Now(): 1 alloc/req
+   - fmt.Sprintf: 0.5 alloc/req
+   - ...
+
+3. 锁竞争热点
+   - cache.(*FileCache).Get: 15% 阻塞时间
+   - proxy.(*Proxy).buildCacheKeyHash: 8% 阻塞时间
+
+4. 优化优先级
+   P0: [具体任务]
+   P1: [具体任务]
+   P2: [具体任务]
+```
+
+---
+
+## 4. 优化实施流程
+
+### 4.1 优化原则
+
+- **可量化**: 每次优化必须有 benchmark 对比数据
+- **最小改动**: 优先单文件/单函数改动
+- **可回滚**: 保留优化前后的基准数据
+
+### 4.2 优化分类
+
+| 类型 | 示例 | 验证方式 |
+|------|------|---------|
+| 零分配 | 用 `b2s` 替代 `string([]byte)` | `-benchmem` allocs/op |
+| 算法优化 | 更快的哈希、查找 | `Benchmark` ns/op |
+| 并发优化 | 锁粒度细化、无锁结构 | `go test -race` + benchmark |
+| 缓存优化 | 减少重复计算 | CPU profile 对比 |
+| GC 优化 | 减少短生命周期对象 | `GODEBUG=gctrace=1` |
+
+---
+
+## 5. 回归检测机制
+
+### 5.1 自动化检查
+
+- **CI 集成**: 每次 PR 跑 benchmark 对比
+- **阈值告警**: 性能下降 >5% 自动阻断
+- **趋势追踪**: 长期性能趋势图
+
+### 5.2 回归检测工具
+
+```bash
+# 对比两个版本
+benchstat old.txt new.txt
+
+# 示例输出
+# name        old time/op    new time/op    delta
+# ServeHTTP   1.20µs ± 2%    1.15µs ± 3%    -4.17%  (p=0.02 n=10+10)
+```
+
+---
+
+## 6. 预期成果
+
+- 完整的 benchmark 套件覆盖所有核心模块
+- 可量化的 baseline 性能数据
+- 识别出的 Top 10 性能瓶颈
+- 每轮优化都有可验证的性能提升数据
+- 自动化回归检测防止性能退化
+
+---
+
+## 7. 任务清单
+
+- [ ] 建立 `internal/benchmark/` 目录结构
+- [ ] 补充缺失的微基准（resolver、stream、cache、lua）
+- [ ] 创建集成基准测试
+- [ ] 创建系统压测脚本
+- [ ] 跑第一轮全量基准 → 生成 baseline
+- [ ] 采集 pprof 数据（CPU/heap/allocs/goroutine）
+- [ ] 分析瓶颈 → 生成性能报告
+- [ ] 制定 Top N 优化任务
+- [ ] 逐个实施优化并验证
+- [ ] 建立 CI 回归检测