Security engineering log
Sprint 82026-05-29MAR-94

Abuse detection + auto-revoke

abuseauto-revokebehavioraudit

Scope: volumetric abuse only

This sprint penalizes anomalous traffic on the quantitative axis (pass-through DDoS, scraping, cycling). Qualitative abuse — phishing, malware, illegal content served at low volume — is not covered by any automatic mechanism on the tunld side and goes through human reporting. See the abuse policy for details.

TL;DR

18 Go tests added (120 total), all PASS under go test -race. 102 Sprint 1–7 tests kept identical. go vet, staticcheck, govulncheck: clean. 0 reachable CVE.

Threat model — before / after

Attacker ignoring the rate limit

Before

429 in a loop, CPU cycles burned, shared buckets saturated for other users.

After

Threshold of 5000 denials/min → auto-revoke + close of active tunnels, instantly.

Amplification vector (score shared across users)

Before

n/a

After

Strict isolation per tokenID, dedicated test. An attacker cannot DoS a legitimate user through their own behavior.

Evasion by patience (revoke lost via TTL/window)

Before

n/a

After

GC NEVER evicts a revoked state, even TTL-old. Dedicated tests.

DoS of the tracker via invented tokenIDs

Before

n/a

After

Cap MaxTrackedTokens=100k + fail-secure. Memory bounded even under unforeseen attack.

1. The internal/abuse module — Tracker

Minimal, isolated API. No external dependency (just time + sync) — auditing the code takes 15 minutes.

internal/abuse/tracker.go
type Tracker struct { /* in-memory state per tokenID */ }

func (t *Tracker) RecordDenial(tokenID string) Decision
func (t *Tracker) IsRevoked(tokenID string) bool
func (t *Tracker) IsThrottled(tokenID string) bool
func (t *Tracker) Score(tokenID string) int
func (t *Tracker) Reset(tokenID string)
func (t *Tracker) SetOnAction(f func(Action, string, int))

2. Input signal — proxy.rate_limit scope=token

The tracker is fed by the Sprint 7 TraversingLimiter.SetOnDenied callback, only on scope=token. The reasoning:

  • scope=ip is not attributable to a user (anonymous DDoS);
  • scope=tunnel is, but only indirectly (a user can have several tunnels with different behaviors);
  • scope=token is the natural unit of identity.

3. Graduated thresholds

ThresholdV1 actionFuture action
ThrottleScore = 2000abuse.threshold event, tracking-onlyquota reduction (MAR-48 plans)
RevokeScore = 5000IsRevoked, validator invalidates, close tunnelssame (revoke = final)

Calibration: with RPSPerToken=200 (Sprint 7), an attacker spamming 1000 rps generates ~800 denials/s, i.e. 48k/min. The 5000 threshold is hit in 6 seconds. A legit user does not exceed 200 denials/min (RetryAfter bounds the retry).

4. Decorator pattern on the validator

Zero cross-import between the tokens and abuse packages:

abuse_aware_validator.go
type abuseAwareValidator struct {
    inner   tokens.Validator
    tracker *abuse.Tracker
}

func (a *abuseAwareValidator) Validate(secret string) (tokens.Info, bool) {
    info, ok := a.inner.Validate(secret)
    if !ok { return info, false }
    if a.tracker.IsRevoked(info.TokenID) { return tokens.Info{}, false }
    return info, true
}

Enabled only if the tracker exists (nil otherwise). A revoked token is invalidated at the next handshake: the yamux session is never even opened.

5. Instant teardown of active tunnels

registry.closeTunnelsByToken(tokenID): under lock, collect the sessions; outside the lock, call Session.Close(). Cleanup (map removal, counter decrement) is then triggered by the handleTunnelConnect defer.

Closing outside the lock matters: the yamux GoAway frame can briefly block on send — under lock, that would block every other registry operation.

6. Audit events

audit.jsonl
// Throttle threshold (V1, tracking-only)
{
  "level": "WARN",
  "event": "abuse.threshold",
  "action": "throttle",
  "token_id": "tok_abc",
  "score": 2500
}

// Auto-revoke
{
  "level": "WARN",
  "event": "abuse.token_revoked",
  "token_id": "tok_attacker",
  "score": 5234,
  "closed_tunnels": 3
}

Always WARN (never INFO): a repeated denial at this level is never “normal”. closed_tunnels lets us quantify the user impact of a revocation.

Security properties — dedicated tests

The brief asked for “security tests to the max of the max”. Each critical property has its test.

Cardinal property: strict per-token isolation

TestIsolation_TokensIndependent: 10 denials on tok_attacker. Verifies that tok_victim stays at score=0, not revoked, not throttled. Otherwise: a trivial amplification vector where an attacker who knows they are watched could DoS a legitimate user by trapping them.
PropertyTest
Strict per-token isolationTestIsolation_TokensIndependent
Idempotent actions under race (100×100 goroutines)TestRace_ConcurrentRecord
Revoke survives window rolloversTestIsRevoked_StaysTrueAfterDenialsExpire
GC NEVER evicts a revoked entryTestCap_GCNeverEvictsRevoked
Fail-secure when cap is reachedTestCap_MaxTrackedTokens_FailSecure
nil tracker = nil-safe on all methodsTestNilTracker_AllMethodsSafe
Callback outside the lock — re-entrancy OKTestOnAction_InvokedOutsideLock
Score frozen after revoke (anti O(n²))TestRevokeFreezesScore
Full disable = no-op (zero regression)TestNew_DisabledReturnsNil
No panic on exotic tokenIDTestRecordDenial_NeverPanicOnWeirdTokenID

Zero regression — how it is guaranteed

  1. 1Existing tests kept identical: 102 Sprint 1–7 tests all PASS under go test -race -count=1 ./....
  2. 2Tracker disableable via env vars: TUNLD_ABUSE_REVOKE_SCORE=0 and TUNLD_ABUSE_THROTTLE_SCORE=0 New returns nil → wiring fully skipped.
  3. 3Conditional validator wrapping: if validator != nil && abuseTracker != nil — no tracker means the validator is used as-is.
  4. 4Additive OnDenied callback: the Sprint 7 callback (audit) is kept as-is; the Sprint 8 code is appended after, replacing nothing.
  5. 5Conservative defaults: 5000 denials/min to revoke. A legit user doing 100 rps of traffic generates on the order of 50–100 denials/min at peak. ×50 margin.

Explicit scope — delivered vs deferred

Ticket MAR-94 is broad. The sprint delivered the functional core (detection + auto-revoke + tunnel teardown). Explicitly deferred, to land in a single sprint without regression:

  • Email (Resend) + Discord webhook (notifications) — dedicated ticket post Resend setup
  • Admin reset/unblock endpoint — coupled to the future admin dashboard
  • Supabase persistence — table + sync at boot
  • Effective throttle (quota reduction) — coupled to MAR-48 (Stripe + plans)
  • Tunnel cycling detection < 10s — separate signal
  • Phishing URL patterns — fragile heuristic, dedicated ticket if observed
  • Multi-signup from the same IP — signal on the Supabase auth side

Configuration

VariableDefault
TUNLD_ABUSE_WINDOW60s
TUNLD_ABUSE_THROTTLE_SCORE2000
TUNLD_ABUSE_REVOKE_SCORE5000
TUNLD_ABUSE_TTL10m

Disable entirely

TUNLD_ABUSE_THROTTLE_SCORE=0
TUNLD_ABUSE_REVOKE_SCORE=0

Strict scope — what this sprint does NOT address

This sprint addresses volumetric abuse: pass-through DDoS, spam, abusive scraping, free proxying. Everything rests on the accumulation of denials from the traversing rate limit.

Qualitative abuse — phishing, malware distributed through a tunnel, illegal content — is out of scope and will be handled differently. A tunnel serving a fake bank page typically generates low traffic, under every volumetric threshold, and stays invisible to this mechanism.

Naming this limit is deliberate: it is in fact the #1 reputational risk of a tunneling service, and the answer requires a separate strategy (abuse reports, ToS, possibly an ML side-channel). Tracked under MAR-110.

Accepted operational tensions

Irreversibility has an operational cost

On the security side, a revoke that is not reversible without intervention is healthy (an attacker does not un-ban themselves). On the operational side, it is dangerous: a false positive — a misconfigured webhook retrying in a loop, a mobile client behind carrier-grade NAT, an aggressive CI script — can only be repaired today by restarting tunld, which cuts every active tunnel of every user to unblock one. Individual sanction, collective cost.

Explicit band-aid to ship before the paid public launch: MAR-108 — authenticated admin reset/unblock endpoint, High priority.

The in-memory model breaks when going multi-node

Each tunld node has its own tracker isolated in RAM. The day we add redundancy (cf. the Multi-node & failover milestone), an attacker revoked on node A reconnects on node B with a zero score. The whole auto-revoke mechanism becomes bypassable by a simple reconnect.

This is not a V1 defect — it is correct for the current single-node setup — but it is an architectural debt that will wake up exactly when we scale. Resolution: MAR-107 — persistence of revocations in Supabase, High priority. Double benefit: it also fixes the revoke lost on restart.

Residual risks — accepted and tracked

Missing persistence

Revoke lost on tunld restart + trivial bypass as soon as multi-node lands (see tension below). Tracking: MAR-107 (High).

No admin reset endpoint

The only repair today is restarting tunld. Tracking: MAR-108 (High).

No user notification

A revoked user finds out at the next handshake, with no context or visible recourse. Tracking: MAR-109 (Medium).

Empirical calibration pending

The 5000 denials/min threshold is calibrated on a 'solo, human, one tunnel' assumption. Pathological-but-legitimate cases (carrier-grade NAT, distributed webhooks) are not covered by the ×50 margin. To observe in prod and tighten or relax based on the first real cases.

Throttle not effective in V1

Tracking-only. Per-plan mechanism coupled to MAR-48.

Qualitative abuse not covered

See strict scope above. Tracking: MAR-110 (Medium).

Verify locally

Reproduce the test battery:

go test -race -v ./internal/abuse/

Abuse simulation with a tunl client:

bash
TUNLD_ABUSE_REVOKE_SCORE=50 ./tunld
# in another shell, after establishing a tunnel "myapp":
for i in {1..200}; do
  curl -s -o /dev/null https://myapp.localhost:8080/path-$i
done
# tunnel closed, journalctl:
# event=abuse.token_revoked closed_tunnels=1
Audit verdict: passAll sprints