Abuse detection + auto-revoke
Scope: volumetric abuse only
TL;DR
18 Go tests added (120 total), all PASS under go test -race. 102 Sprint 1–7 tests kept identical. go vet, staticcheck, govulncheck: clean. 0 reachable CVE.
Threat model — before / after
Attacker ignoring the rate limit
Before
429 in a loop, CPU cycles burned, shared buckets saturated for other users.
After
Threshold of 5000 denials/min → auto-revoke + close of active tunnels, instantly.
Amplification vector (score shared across users)
Before
n/a
After
Strict isolation per tokenID, dedicated test. An attacker cannot DoS a legitimate user through their own behavior.
Evasion by patience (revoke lost via TTL/window)
Before
n/a
After
GC NEVER evicts a revoked state, even TTL-old. Dedicated tests.
DoS of the tracker via invented tokenIDs
Before
n/a
After
Cap MaxTrackedTokens=100k + fail-secure. Memory bounded even under unforeseen attack.
1. The internal/abuse module — Tracker
Minimal, isolated API. No external dependency (just time + sync) — auditing the code takes 15 minutes.
type Tracker struct { /* in-memory state per tokenID */ }
func (t *Tracker) RecordDenial(tokenID string) Decision
func (t *Tracker) IsRevoked(tokenID string) bool
func (t *Tracker) IsThrottled(tokenID string) bool
func (t *Tracker) Score(tokenID string) int
func (t *Tracker) Reset(tokenID string)
func (t *Tracker) SetOnAction(f func(Action, string, int))2. Input signal — proxy.rate_limit scope=token
The tracker is fed by the Sprint 7 TraversingLimiter.SetOnDenied callback, only on scope=token. The reasoning:
scope=ipis not attributable to a user (anonymous DDoS);scope=tunnelis, but only indirectly (a user can have several tunnels with different behaviors);scope=tokenis the natural unit of identity.
3. Graduated thresholds
| Threshold | V1 action | Future action |
|---|---|---|
| ThrottleScore = 2000 | abuse.threshold event, tracking-only | quota reduction (MAR-48 plans) |
| RevokeScore = 5000 | IsRevoked, validator invalidates, close tunnels | same (revoke = final) |
Calibration: with RPSPerToken=200 (Sprint 7), an attacker spamming 1000 rps generates ~800 denials/s, i.e. 48k/min. The 5000 threshold is hit in 6 seconds. A legit user does not exceed 200 denials/min (RetryAfter bounds the retry).
4. Decorator pattern on the validator
Zero cross-import between the tokens and abuse packages:
type abuseAwareValidator struct {
inner tokens.Validator
tracker *abuse.Tracker
}
func (a *abuseAwareValidator) Validate(secret string) (tokens.Info, bool) {
info, ok := a.inner.Validate(secret)
if !ok { return info, false }
if a.tracker.IsRevoked(info.TokenID) { return tokens.Info{}, false }
return info, true
}Enabled only if the tracker exists (nil otherwise). A revoked token is invalidated at the next handshake: the yamux session is never even opened.
5. Instant teardown of active tunnels
registry.closeTunnelsByToken(tokenID): under lock, collect the sessions; outside the lock, call Session.Close(). Cleanup (map removal, counter decrement) is then triggered by the handleTunnelConnect defer.
Closing outside the lock matters: the yamux GoAway frame can briefly block on send — under lock, that would block every other registry operation.
6. Audit events
// Throttle threshold (V1, tracking-only)
{
"level": "WARN",
"event": "abuse.threshold",
"action": "throttle",
"token_id": "tok_abc",
"score": 2500
}
// Auto-revoke
{
"level": "WARN",
"event": "abuse.token_revoked",
"token_id": "tok_attacker",
"score": 5234,
"closed_tunnels": 3
}Always WARN (never INFO): a repeated denial at this level is never “normal”. closed_tunnels lets us quantify the user impact of a revocation.
Security properties — dedicated tests
The brief asked for “security tests to the max of the max”. Each critical property has its test.
Cardinal property: strict per-token isolation
TestIsolation_TokensIndependent: 10 denials on tok_attacker. Verifies that tok_victim stays at score=0, not revoked, not throttled. Otherwise: a trivial amplification vector where an attacker who knows they are watched could DoS a legitimate user by trapping them.| Property | Test |
|---|---|
| Strict per-token isolation | TestIsolation_TokensIndependent |
| Idempotent actions under race (100×100 goroutines) | TestRace_ConcurrentRecord |
| Revoke survives window rollovers | TestIsRevoked_StaysTrueAfterDenialsExpire |
| GC NEVER evicts a revoked entry | TestCap_GCNeverEvictsRevoked |
| Fail-secure when cap is reached | TestCap_MaxTrackedTokens_FailSecure |
| nil tracker = nil-safe on all methods | TestNilTracker_AllMethodsSafe |
| Callback outside the lock — re-entrancy OK | TestOnAction_InvokedOutsideLock |
| Score frozen after revoke (anti O(n²)) | TestRevokeFreezesScore |
| Full disable = no-op (zero regression) | TestNew_DisabledReturnsNil |
| No panic on exotic tokenID | TestRecordDenial_NeverPanicOnWeirdTokenID |
Zero regression — how it is guaranteed
- 1Existing tests kept identical: 102 Sprint 1–7 tests all PASS under
go test -race -count=1 ./.... - 2Tracker disableable via env vars:
TUNLD_ABUSE_REVOKE_SCORE=0andTUNLD_ABUSE_THROTTLE_SCORE=0→Newreturns nil → wiring fully skipped. - 3Conditional validator wrapping:
if validator != nil && abuseTracker != nil— no tracker means the validator is used as-is. - 4Additive OnDenied callback: the Sprint 7 callback (audit) is kept as-is; the Sprint 8 code is appended after, replacing nothing.
- 5Conservative defaults: 5000 denials/min to revoke. A legit user doing 100 rps of traffic generates on the order of 50–100 denials/min at peak. ×50 margin.
Explicit scope — delivered vs deferred
Ticket MAR-94 is broad. The sprint delivered the functional core (detection + auto-revoke + tunnel teardown). Explicitly deferred, to land in a single sprint without regression:
- Email (Resend) + Discord webhook (notifications) — dedicated ticket post Resend setup
- Admin reset/unblock endpoint — coupled to the future admin dashboard
- Supabase persistence — table + sync at boot
- Effective throttle (quota reduction) — coupled to MAR-48 (Stripe + plans)
- Tunnel cycling detection < 10s — separate signal
- Phishing URL patterns — fragile heuristic, dedicated ticket if observed
- Multi-signup from the same IP — signal on the Supabase auth side
Configuration
| Variable | Default |
|---|---|
| TUNLD_ABUSE_WINDOW | 60s |
| TUNLD_ABUSE_THROTTLE_SCORE | 2000 |
| TUNLD_ABUSE_REVOKE_SCORE | 5000 |
| TUNLD_ABUSE_TTL | 10m |
Disable entirely
TUNLD_ABUSE_THROTTLE_SCORE=0
TUNLD_ABUSE_REVOKE_SCORE=0Strict scope — what this sprint does NOT address
This sprint addresses volumetric abuse: pass-through DDoS, spam, abusive scraping, free proxying. Everything rests on the accumulation of denials from the traversing rate limit.
Qualitative abuse — phishing, malware distributed through a tunnel, illegal content — is out of scope and will be handled differently. A tunnel serving a fake bank page typically generates low traffic, under every volumetric threshold, and stays invisible to this mechanism.
Naming this limit is deliberate: it is in fact the #1 reputational risk of a tunneling service, and the answer requires a separate strategy (abuse reports, ToS, possibly an ML side-channel). Tracked under MAR-110.
Accepted operational tensions
Irreversibility has an operational cost
On the security side, a revoke that is not reversible without intervention is healthy (an attacker does not un-ban themselves). On the operational side, it is dangerous: a false positive — a misconfigured webhook retrying in a loop, a mobile client behind carrier-grade NAT, an aggressive CI script — can only be repaired today by restarting tunld, which cuts every active tunnel of every user to unblock one. Individual sanction, collective cost.
Explicit band-aid to ship before the paid public launch: MAR-108 — authenticated admin reset/unblock endpoint, High priority.
The in-memory model breaks when going multi-node
Each tunld node has its own tracker isolated in RAM. The day we add redundancy (cf. the Multi-node & failover milestone), an attacker revoked on node A reconnects on node B with a zero score. The whole auto-revoke mechanism becomes bypassable by a simple reconnect.
This is not a V1 defect — it is correct for the current single-node setup — but it is an architectural debt that will wake up exactly when we scale. Resolution: MAR-107 — persistence of revocations in Supabase, High priority. Double benefit: it also fixes the revoke lost on restart.
Residual risks — accepted and tracked
Missing persistence
Revoke lost on tunld restart + trivial bypass as soon as multi-node lands (see tension below). Tracking: MAR-107 (High).
No admin reset endpoint
The only repair today is restarting tunld. Tracking: MAR-108 (High).
No user notification
A revoked user finds out at the next handshake, with no context or visible recourse. Tracking: MAR-109 (Medium).
Empirical calibration pending
The 5000 denials/min threshold is calibrated on a 'solo, human, one tunnel' assumption. Pathological-but-legitimate cases (carrier-grade NAT, distributed webhooks) are not covered by the ×50 margin. To observe in prod and tighten or relax based on the first real cases.
Throttle not effective in V1
Tracking-only. Per-plan mechanism coupled to MAR-48.
Qualitative abuse not covered
See strict scope above. Tracking: MAR-110 (Medium).
Verify locally
Reproduce the test battery:
go test -race -v ./internal/abuse/Abuse simulation with a tunl client:
TUNLD_ABUSE_REVOKE_SCORE=50 ./tunld
# in another shell, after establishing a tunnel "myapp":
for i in {1..200}; do
curl -s -o /dev/null https://myapp.localhost:8080/path-$i
done
# tunnel closed, journalctl:
# event=abuse.token_revoked closed_tunnels=1