Husky โ Task Tracker
Local-First Job Scheduler with Dependency Graphs
Phase 0 โ Project Bootstrapโ
Weeks 0โ1 ยท Set up the repository, toolchain, and all dependencies before any feature work begins.
Repository & Toolingโ
- Initialise Git repository and add
.gitignore(Go, binaries,.env) - Create top-level directory structure (
cmd/,internal/,web/,docs/,scripts/,tests/) - Initialise Go module (
go mod init github.com/husky-scheduler/husky) - Pin Go toolchain version in
go.mod/.tool-versions(go 1.24.0) - Add
Makefilewith targets:build,test,lint,clean,run - Configure
golangci-lint(.golangci.yml) - Add
pre-commithooks (lint +go veton staged files) - Set up GitHub Actions CI skeleton (lint + test on push)
Dependenciesโ
- Add YAML parser โ
gopkg.in/yaml.v3 - Add SQLite driver โ
modernc.org/sqlite(pure-Go, CGO-free) - Add JSON Schema validator for
husky.yamlvalidation โsanthosh-tekuri/jsonschema/v5 - Add structured logger โ
log/slog(stdlib, Go 1.21+) - Add CLI framework โ
spf13/cobra - Add test assertion library โ
testify/assert+testify/require - Run
go mod tidyand commitgo.sum
Skeletonโ
- Create embedded daemon runtime entry point (initially
cmd/huskyd/main.go, later invoked viahusky daemon run) - Create
cmd/husky/main.goentry point (CLI) - Add placeholder
VERSIONconstant and--versionflag (internal/versionpackage) - Create
internal/config/package stub - Create
internal/store/package stub - Verify
make buildproduces thehuskybinary without errors
Phase 1 โ Core Engineโ
Weeks 1โ3 ยท YAML parser, schedule evaluator, SQLite schema, basic executor, and core CLI commands.
Step 1: YAML Parser & Configโ
- Define
ConfigandJobGo structs matching thehusky.yamlschema (ยง2.1, ยง2.4) - Implement YAML unmarshalling into
Configstruct - Implement JSON Schema validation of
husky.yamlat load time - Validate
frequencyenum values (ยง2.2.1) - Validate
timefield โ 4-char military format, valid hour/minute range (ยง2.2.2) - Validate
on_failureenum (alert | skip | stop | ignore) - Validate
retry_delayformat (exponentialorfixed:<duration>) - Validate
concurrencyenum (allow | forbid | replace) - Interpolate
${env:HOST_VAR}references inenvmap at runtime (ยง7.3) - Apply
defaultsblock to all jobs that do not override a field (ยง2.1) - Return structured parse errors with file location context
- Add
timezonefield toJobstruct โ IANA timezone identifier string (Feature 7) - Add
timezonefield toDefaultsstruct as global fallback (Feature 7) - Validate timezone identifier at parse time using Go's embedded
time/tzdatapackage; reject unknown identifiers (Feature 7) - Add
slafield toJobstruct โ optional duration string (Feature 1) - Validate at parse time that
sla < timeoutwhen both are set; return structured error if not (Feature 1) - Add
tagsfield toJobstruct โ list of strings (Feature 4) - Validate tags: lowercase alphanumeric with hyphens only, max 10 tags per job, max 32 characters per tag (Feature 4)
- Expand
Notifystruct to full rich notification schema โon_failure,on_success,on_sla_breach,on_retrysub-objects each withchannel,message,attach_logs, andonly_after_failurefields (Feature 3) - Maintain backward compatibility: shorthand string form (
notify.on_failure: slack:#channel) continues to parse correctly (Feature 3) - Add
healthcheckblock toJobstruct:commandstring,timeoutduration string,on_failenum (mark_failed|warn_only) (Feature 6) - Add
outputblock toJobstruct: map of variable name โ capture mode string (last_line,first_line,json_field:<key>,regex:<pattern>,exit_code) (Feature 2) - Add
Integrationstruct with provider-specific credential fields (webhook_url,routing_key,host,port,username,password,from) (Feature 3) - Add
Integrations map[string]*Integrationtop-level field toConfigstruct (Feature 3) - Infer provider from map key when key matches a known provider name (
slack,pagerduty,discord,smtp,webhook); require explicitproviderfield for custom key names (Feature 3) - Validate each integration: required credential fields per provider (
webhook_urlfor slack/discord/webhook,routing_keyfor pagerduty,host+fromfor smtp), SMTP port range 1โ65535 (Feature 3) - Extend
${env:VAR}interpolation to cover all integration credential fields (webhook_url,routing_key,username,password) (Feature 3) - Implement
LoadDotEnv(dir)โ source.envfile from same directory ashusky.yamlbefore env interpolation; non-destructive (process env vars take precedence) (Feature 3) - Update JSON Schema to allow
integrations:top-level block withintegration$def (Feature 3)
Step 2: Schedule Evaluatorโ
- Implement
NextRunTime(job Job, now time.Time) time.Timefor allfrequencyvalues-
hourlyโ next:00 -
daily,weekly,monthly,weekdays,weekendsโ next matching wall-clock tick usingtimefield -
manualโ never schedules automatically -
after:<job>โ dependency-driven (no time calculation)
-
- Log resolved UTC equivalent of every scheduled time at daemon startup
- Implement 1-second ticker loop that evaluates all jobs each tick
- Resolve
timefield relative to per-jobtimezone; fall back todefaults.timezone, then system timezone (Feature 7) - Use Go's embedded
time/tzdatafor timezone resolution โ no system tzdata dependency (Feature 7) - Handle DST gap: when the scheduled wall-clock time does not exist, run at the next valid time after the gap and log a warning (Feature 7)
- Handle DST overlap: when the scheduled time occurs twice, run once on the first occurrence and log a warning (Feature 7)
- Include timezone abbreviation (e.g.
EST,JST) alongside the UTC equivalent in daemon startup log (Feature 7)
Step 3: SQLite State Storeโ
- Implement
storepackage with WAL-mode SQLite connection - Create schema migrations for
job_runs,job_state,run_logs,alertstables (ยง4.1) - Add
reason TEXTandtriggered_by TEXT DEFAULT "scheduler"columns tojob_runs(Feature 5) - Add
sla_breached INTEGER DEFAULT 0column tojob_runs(Feature 1) - Add
hc_status TEXTcolumn tojob_runsโ values:null | pass | fail | warn(Feature 6) - Create
run_outputstable:(id, run_id, job_name, var_name, value, cycle_id, created_at)with index oncycle_id(Feature 2) - Implement serialised writer goroutine channel for all writes
- Implement
RecordRunStart,RecordRunEnd,RecordLog,GetJobStateoperations - Implement
UpdateJobState(last_success, last_failure, next_run, lock_pid)
Step 4: Executorโ
- Implement bounded goroutine pool (default: 8 concurrent jobs) (ยง3.2.3)
- Launch each job as a subprocess via
/bin/sh -c(or direct exec for array form) - Set
Setpgid=trueon all subprocesses (ยง7.4) - Capture
stdoutandstderrline-by-line and write torun_logsin real time - Implement timeout: send
SIGTERM, thenSIGKILLafter grace period - Kill entire process group on timeout/cancel
Step 5: Core CLI (husky)โ
-
husky startโ launch the embedded Husky daemon in background, writedaemon.pid, detach - Hidden
husky daemon runentry point for direct foreground daemon execution by service managers and development workflows -
husky stopโ graceful shutdown (drain running jobs) -
husky stop --forceโ immediateSIGKILLto all running jobs -
husky statusโ tabular display of all job states fromjob_state -
husky run <job>โ trigger job immediately, bypassing schedule - Unix socket IPC between
husky(CLI) and the embedded daemon runtime
Step 6: Testingโ
- Unit tests for YAML parser (valid configs, missing required fields, bad enum values)
- Unit tests for
timefield validation (all rows from Table ยง2.2.2) - Unit tests for schedule evaluator (
NextRunTimefor each frequency value) - Unit tests for SQLite store (CRUD, WAL concurrency)
- Integration test: parse full example
husky.yaml(ยง2.3) without errors
Phase 2 โ DAG + Reliabilityโ
Weeks 4โ6 ยท Dependency resolution, retry state machine, crash recovery, hot-reload, catchup.
DAG Resolverโ
- Implement topological sort using Kahn's algorithm on
depends_ondeclarations (ยง3.2.2) - Detect cycles at startup and emit full cycle path in error message
- Reject daemon start if any cycle is found
- Resolve
after:<job>frequency as an implicitdepends_onedge - At runtime, check all upstream job states are
SUCCESSbefore dispatching a dependent job -
husky dagโ print ASCII DAG of all jobs and their dependencies -
husky dag --jsonโ emit machine-readable DAG structure
Retry FSMโ
- Implement job run state machine:
PENDING โ RUNNING โ SUCCESS | FAILED โ RETRYING(ยง3.2.4) - Implement exponential backoff with ยฑ25% jitter (doubles each attempt: 30s, 60s, 120sโฆ)
- Implement
fixed:<duration>retry delay - On max retries exceeded, execute
on_failureaction:-
alertโ fire configured notify channels -
skipโ mark runSKIPPED, continue other jobs -
stopโ halt the entire pipeline (cancel all downstream dependents) -
ignoreโ log failure silently, take no action
-
- Implement
concurrency: forbidโ skip run if previous is still running (ยง2.4) - Implement
concurrency: replaceโ kill running instance and start fresh -
husky retry <job>โ retry last failed run from scratch -
husky cancel <job>โ sendSIGTERMto running job -
husky skip <job>โ mark pending run asSKIPPED
Healthchecksโ
- After main command exits with code 0, run
healthcheck.commandas a separate process (Feature 6) - Apply
healthcheck.timeout(default: 30s); sendSIGTERM+SIGKILLto healthcheck if exceeded (Feature 6) -
on_fail: mark_failed(default): mark runFAILED, append healthcheck stderr torun_logs, trigger retry policy (Feature 6) -
on_fail: warn_only: mark runSUCCESSwithhc_status = warn, fireon_sla_breachnotify event (Feature 6) - Skip healthcheck entirely when main command exits non-zero (Feature 6)
- Capture healthcheck stdout/stderr into
run_logswithstream = "healthcheck"stream tag (Feature 6) - Expose
husky logs <job> --include-healthcheckflag to include healthcheck log lines (Feature 6)
Output Passingโ
- After job completion, evaluate the job's
outputblock and write each captured value torun_outputs(Feature 2) - Implement all capture modes:
last_line,first_line,json_field:<key>,regex:<pattern>,exit_code(Feature 2) - Generate a
cycle_idUUID at the root of each trigger chain (schedule tick or manual run); propagate to all downstream dependent jobs (Feature 2) - At job dispatch time, render
{{ outputs.<job_name>.<var_name> }}template expressions incommand,envvalues (Feature 2) - Return a descriptive error at dispatch time if a referenced output variable has no recorded value for the current
cycle_id(Feature 2) - Scope all output values to a single cycle; never bleed values across independent trigger chains (Feature 2)
Crash Recoveryโ
- On startup, acquire
daemon.pidlock (ยง6 step 2) - Detect stale lock (PID dead) vs. live daemon (exit with "daemon already running")
- Reconcile orphaned runs: query
job_state WHERE lock_pid IS NOT NULL, probe each PID withkill -0(ยง6 step 3) - For dead PIDs: mark run
FAILED, increment attempt, schedule retry - Reconcile missed schedules based on
catchupflag (ยง6 step 4)-
catchup: trueโ trigger missed run immediately -
catchup: falseโ log skip, advance to next scheduled tick
-
Config Hot-Reload (SIGHUP)โ
- Handle
SIGHUPโ parse full new config before applying - Atomic swap of running config under mutex
- Ensure running jobs are never interrupted during reload
- Re-run cycle detection on new config; reject reload (keep old config) if invalid
-
husky reloadโ sendSIGHUPto daemon via Unix socket
Testingโ
- Unit tests for Kahn's algorithm (linear chain, fan-in, fan-out, cycle detection)
- Unit tests for Retry FSM state transitions
- Unit tests for exponential backoff timing + jitter bounds
- Unit tests for each
on_failurehandler - Integration test: crash daemon mid-run, restart, verify orphan reconciliation
- Integration test: introduce a cycle into config, verify daemon rejects it
- Unit tests for healthcheck execution flow: main success + healthcheck pass, main success + healthcheck fail (
mark_failed), main success + healthcheck fail (warn_only), main fail skips healthcheck (Feature 6) - Unit tests for healthcheck timeout enforcement (Feature 6)
- Unit tests for each output capture mode:
last_line,first_line,json_field,regex,exit_code(Feature 2) - Unit tests for
{{ outputs.<job>.<var> }}template rendering incommandandenv(Feature 2) - Unit tests for
cycle_idscoping โ values from cycle A are not visible in cycle B (Feature 2) - Unit tests for timezone scheduling: correct wall-clock resolution for a sample of IANA zones, DST gap and overlap behaviour (Feature 7)
- Unit tests for
sla < timeoutvalidation at parse time (Feature 1)
Phase 3 โ Observabilityโ
Weeks 7โ9 ยท REST API, WebSocket log streaming, embedded web dashboard, webhook alerting.
REST APIโ
- Bind API server to
127.0.0.1:8420by default (ยง7.1) -
GET /api/jobsโ list all jobs with current state -
GET /api/jobs?tag=<tag>โ filter job list by tag (Feature 4) -
GET /api/jobs/:nameโ single job detail (config + last N runs) -
POST /api/jobs/:name/runโ trigger manual run; accept optionalreasonbody field (Feature 5) -
POST /api/jobs/:name/cancelโ cancel running job -
GET /api/runs/:idโ run detail with exit code, duration,sla_breached,hc_statusfields (Feature 1, 6) -
GET /api/runs/:id/logsโ paginated log lines for a run -
GET /api/runs/:id/outputsโ output variables captured for a run (Feature 2) -
GET /api/auditโ searchable run history; query params:job,trigger,status,since,reason,tag(Feature 5) -
GET /api/tagsโ list all defined tags (Feature 4) -
GET /api/dagโ DAG structure as JSON -
GET /api/statusโ daemon health + uptime
WebSocket Log Streamingโ
-
GET /ws/logs/:run_idโ stream liverun_logslines over WebSocket as they are written - Backfill existing log lines on connection open, then stream new lines
- Close WebSocket cleanly when run reaches terminal state (
SUCCESS | FAILED | SKIPPED) -
husky logs <job> --tailโ consume WebSocket stream in terminal
CLI Observability Commandsโ
-
husky logs <job>โ print last run logs -
husky logs <job> --run=<id>โ logs for a specific run ID -
husky logs <job> --include-healthcheckโ include healthcheck stream in output (Feature 6) -
husky logs --tag <tag> --tailโ stream live logs for all jobs matching a tag (Feature 4) -
husky history <job>โ table of last N runs (status, duration, trigger, reason, vs-SLA columns) (Feature 1, 5) -
husky history <job> --last=<n>โ configurable run count -
husky validateโ linthusky.yaml(schema, cycles, field types, timezone identifiers, sla < timeout) -
husky validate --strictโ also warn on missing descriptions and notify configs -
husky config showโ print effective config with defaults applied -
husky export --format=jsonโ export full state snapshot -
husky status --tag <tag>โ filter job status table by tag (Feature 4) -
husky pause --tag <tag>โ pause all jobs matching a tag (Feature 4) -
husky resume --tag <tag>โ resume all paused jobs matching a tag (Feature 4) -
husky run --tag <tag>โ trigger all jobs matching a tag immediately (Feature 4) -
husky tags listโ print all defined tags with the count of jobs carrying each one (Feature 4) -
husky run <job> --reason "<text>"โ attach a free-text annotation (max 500 chars) to the manual run record (Feature 5) -
husky auditโ searchable run history across all jobs (Feature 5) -
husky audit --job <name>โ filter by job (Feature 5) -
husky audit --trigger manual|schedule|dependencyโ filter by trigger type (Feature 5) -
husky audit --status failed|success|skippedโ filter by run status (Feature 5) -
husky audit --since "<date>"โ filter runs after a given date (Feature 5) -
husky audit --reason "<text>"โ full-text search on the reason field (Feature 5) -
husky audit --tag <tag>โ filter by job tag (Feature 5) -
husky audit --export csvโ export results as CSV (Feature 5)
Web Dashboardโ
- Scaffold Preact app in
web/directory - DAG view โ dependency edges and topological execution order, renders from
GET /api/dagJSON - Job list view โ all jobs, last status, last run time, next scheduled run; tag filter dropdown; run/cancel actions
- Run history view โ recent runs per job in expandable row detail; run pills colour-coded by status
- Extend run history with
vs SLAcolumn showing โ under / โ +ฮ (Feature 1) - Add third run-state colour: yellow for "running but past SLA" in addition to green/red (Feature 1)
- Tag filter โ tag dropdown on jobs view filters to matching jobs; tag dropdown on audit view (Feature 4)
- Tag filter sidebar with aggregate health counts per tag (Feature 4)
- Audit log view โ searchable, filterable table (job, status, trigger, tag, since) with click-to-expand log viewer (Feature 5)
- Live log viewer โ streams WebSocket log output; stderr tinted red; auto-scroll; historic fallback for completed runs (Feature 6)
- Job detail view โ show captured output variables per run under a collapsible "Outputs" section (Feature 2)
- Healthcheck output โ show healthcheck log lines in a collapsible section below main output in run detail (Feature 6)
- Embed compiled web assets into binary via
go:embed(ยง3.3) - Serve dashboard from
GET /on API server - Add
make webtarget โnpm ci && vite buildwrites Tailwind + Preact bundle tointernal/api/dashboard/;make builddepends onweb - Style dashboard with Tailwind CSS (purged bundle โ 15 KB gzip)
- Add timezone column to job list view (Feature 7)
- Redact
envvar values in job config views (ยง7.3)
Rich Notificationsโ
- Implement notification dispatch for all four events:
on_success,on_failure,on_sla_breach,on_retry(Feature 3) - Render
messagetemplates using{{ job.* }}and{{ run.* }}variables at dispatch time (Feature 3) - Attach last N lines of
run_logsto notification whenattach_logs: last_N_linesis set; send full log whenall(Feature 3) - Implement
only_after_failure: trueonon_successโ suppress notification unless previous completed run wasFAILED(Feature 3) - Support
slack:#channelandslack:@userโ post formatted message via Slack Incoming Webhook URL (Feature 3) - Support
pagerduty:<severity>โ trigger PagerDuty Events API v2; severity values:p1,p2,p3,p4(Feature 3) - Support
discord:#channelโ post plain text message via Discord Incoming Webhook URL (Feature 3) - Support
webhook:<url>โ HTTP POST with JSON payload to custom URL (Feature 3) - Support
email:<address>โ send SMTP email; requiressmtpconfig block inhusky.yamldefaults (Feature 3) - Implement SLA breach timer: when a job stays
RUNNINGpast itssladuration, fireon_sla_breach(falls back toon_failureif not set); setsla_breached = 1on the run record (Feature 1) - Write each dispatched alert to the
alertstable (audit log) (ยง4.1) - Retry alert delivery up to 3 times with backoff on HTTP error
- Backward compatibility: shorthand string form (
notify.on_failure: slack:#channel) continues to dispatch correctly (Feature 3) - Resolve integration credentials at notification dispatch time: look up the integration name embedded in the channel string prefix (e.g.
slack_ops:#alertsโcfg.Integrations["slack_ops"]) (Feature 3) - Fall back to the provider-named key when channel string uses bare provider prefix (e.g.
slack:#channelโcfg.Integrations["slack"]) (Feature 3) - Support
husky integrations listโ display all configured integrations as a table: name, provider, and credential status (set/missing) (Feature 3) - Support
husky integrations test <name>โ send a live test message/event to the named integration and report success or failure (Feature 3)
Testingโ
- Unit tests for each REST endpoint (table-driven, httptest)
- Integration test: trigger run via REST, stream logs via WebSocket, verify terminal state
- Unit tests for notification template rendering โ all
{{ job.* }}and{{ run.* }}variables (Feature 3) - Unit tests for Slack, PagerDuty, Discord, and custom webhook payload construction (Feature 3)
- Unit tests for integration credential resolution at dispatch time โ named key, bare provider fallback, missing integration (Feature 3)
- Integration tests for
husky integrations test slackandhusky integrations test pagerduty(Feature 3) - Unit tests for
only_after_failuresuppression logic (Feature 3) - Unit tests for SLA breach timer: fires
on_sla_breachat correct elapsed time, setssla_breachedflag (Feature 1) - Unit tests for tag filtering on
GET /api/jobsandhusky status(Feature 4) - Unit tests for
husky auditfilter combinations (Feature 5) - Unit tests for
husky run --reasonstores reason injob_runs.reason(Feature 5) - End-to-end test: full pipeline run (ยง2.3 example), verify all alert rows written
- End-to-end test: output-passing pipeline โ ingest captures
file_path, transform receives it via template (Feature 2)
Phase 3.4 โ Dashboard Enhancementsโ
Complete ยท All backend endpoints, Data / Alerts / Config, Run Detail page, Enhanced Jobs tab, DAG SVG, Health tab, Integrations panel, and UI quality-of-life items implemented.
Backend: New API Endpointsโ
-
GET /api/daemon/infoโ return PID, config file path,started_attimestamp, SQLite DB path, active job count, and paused job count (ยง1.3) -
GET /api/db/job_runsโ paginated, sortable, filterable table ofjob_runs; query params:job,status,trigger,since,until,sla_breached,page,page_size(ยง2.1) -
GET /api/db/job_stateโ return the fulljob_statetable (ยง2.2) -
POST /api/db/job_state/:job/clear_lockโ setlock_pid = NULLfor a job; return409if the job is currentlyRUNNING(ยง2.2) -
GET /api/db/run_logsโ full-text searchable log explorer; query params:run_id,job,stream(stdout|stderr|healthcheck),q,limit,offset(ยง2.3) -
GET /api/db/alertsโ paginated alerts history; query params:job,status,limit,offset(ยง2.5) -
POST /api/db/alerts/:id/retryโ re-dispatch a failed alert without re-running the job; return409if alert is alreadypending(ยง2.5) -
POST /api/jobs/:name/pauseโ pause a single job by name (ยง5.1) -
POST /api/jobs/:name/resumeโ resume a single paused job by name (ยง5.1) -
POST /api/jobs/:name/retryโ retry the last failed run for this job (ยง4.2) -
POST /api/jobs/:name/skipโ mark aPENDINGrun asSKIPPED(ยง4.2) -
GET /api/integrationsโ list all configured integrations with name, provider, credential status, last test result and timestamp (ยง8.1) -
POST /api/integrations/:name/testโ fire a test delivery to the named integration; return HTTP status and response payload (ยง8.2)
Dashboard: Daemon Info Panel (ยง1.3)โ
- Extend the top bar with a clickable version badge that toggles an info dropdown panel displaying: PID, uptime, config file path,
started_atfull timestamp, SQLite DB path, total / active / paused job counts - Fetch daemon info from
GET /api/daemon/infoon mount and on each 30-second poll cycle
Dashboard: Data Tab (ยง2)โ
- Add a Data tab to the navigation alongside Jobs | Audit | Outputs | Alerts | DAG | Config
- job_runs sub-tab (ยง2.1): paginated table with columns
id | job_name | status | attempt | trigger | triggered_by | reason | started_at | finished_at | exit_code | sla_breached | hc_status; pagination controls (25 / 50 / 100 rows); filter bar (job, status, trigger, date range,sla_breachedflag) - job_state sub-tab (ยง2.2): table with columns
job_name | last_success | last_failure | next_run | lock_pid; highlight rows with a non-nulllock_pidusing a warning badge; "Clear lock" action button per such row โPOST /api/db/job_state/:job/clear_lock - run_logs sub-tab (ยง2.3): searchable log explorer across all runs; filters: run ID, job name, stream type, full-text keyword; rows colour-coded by stream (stderr = red tint, healthcheck = amber tint)
Dashboard: Alerts Tab Completion (ยง2.5)โ
- Add
statuscolumn with colour-coded badges:delivered= green,failed= red,pending= yellow - Add
attempts,last_attempt_at, anderrorcolumns to the Alerts table - Add a โRetryโ action button on rows where
status = failedโPOST /api/db/alerts/:id/retry; disable the button while the request is in flight - Add pagination / load-more to the Alerts table (currently loads a fixed number)
Dashboard: Config Editor Improvements (ยง3)โ
- Inline error markers (ยง3.2): when โValidateโ returns errors that include line numbers, scroll to and highlight the offending line(s) in the textarea; show the error description below the highlighted line
- Config diff view (ยง3.4): after a successful โSave & Reloadโ, show a unified diff between the previous config text and the newly saved config; highlight added/removed job blocks so the operator can confirm what changed
Dashboard: Run Detail Page (ยง4)โ
- Implement SPA routing using the hash (
#/runs/:id); the rootAppcomponent parseswindow.location.hashand renders aRunDetailViewinstead of the tab panel when a run route is matched - RunDetailView (ยง4.1): full-page view showing โ job name (links back to Jobs tab filtered to that job); status badge, attempt, trigger, triggered_by, reason; started_at / finished_at / duration; exit code; SLA progress bar (green if within budget, amber โ red if over);
hc_statusbadge;cycle_id(hyperlinked to Data โ run_outputs filtered by cycle); full log viewer with stream colouring and healthcheck toggle; output variables table; previous / next run links for the same job - Run actions (ยง4.2): Retry button (shown when status is
FAILED) โPOST /api/jobs/:name/retry; Skip button (shown when status isPENDING) โPOST /api/jobs/:name/skip; Cancel button (shown when status isRUNNING) โPOST /api/jobs/:name/cancel - Deep links (ยง4.3): update run pills in the Jobs tab to navigate to
#/runs/:idinstead of expanding inline; update Audit table rows to link to#/runs/:id; makerun_idvalues in the Data tab clickable links to#/runs/:id
Dashboard: Enhanced Jobs Tab (ยง5)โ
- Per-job pause/resume (ยง5.1): add Pause / Resume button to each job row action group (next to Run / Cancel); show a
PAUSEDbadge in the Status column for paused jobs; wire toPOST /api/jobs/:name/pauseandPOST /api/jobs/:name/resume - Full job config details (ยง5.2): extend the expanded row config panel to include
frequency,on_success,on_retry,on_sla_breachnotification settings, env var keys (values redacted as***), output capture rules, and healthcheck command + timeout - Run history depth control (ยง5.3): add a "Show more" button below the run history pills that fetches the next page; support incremental loading (
?runs=40then?runs=80) - Bulk trigger by tag (ยง5.4): add a "Run all" button to the tag health strip for each tag; show a confirmation dialog listing the jobs that will be triggered; fire one
POST /api/jobs/:name/runper matching job on confirm
Dashboard: DAG Enhancements (ยง6)โ
- Visual graph (ยง6.1): replace the current text edge list with a rendered directed graph using a lightweight library (Dagre-D3, D3-dag, or ELK.js); nodes show job name, current status badge, and last run time; directed edges with arrowheads; node borders colour-coded (green = SUCCESS, red = FAILED, blue = RUNNING, grey = PENDING); support scroll and zoom for large graphs
- Node actions (ยง6.2): clicking a node navigates to
#/runs/:idfor the most recent run of that job; right-click context menu with Trigger / Cancel / View logs options - Cycle trace overlay (ยง6.3): from the Run Detail page or Data tab, a "Show in DAG" button that highlights all nodes and edges that participated in a specific
cycle_id, with each node annotated with the outputs it captured in that cycle
Dashboard: Health & SLA Tab (ยง7)โ
- Add a Health tab to the navigation
- SLA compliance table (ยง7.1): for all jobs with an SLA configured, render
job | SLA budget | last duration | ฮ | breach rate (30 d); ฮ shown in red when positive (breach) and green when negative (headroom); default sort by breach rate descending to surface worst offenders - Timeline swimlane (ยง7.2): chronological chart with one row per job showing the last N runs as coloured bars (bar length = duration, bar colour = status, amber border if SLA breached); hovering a bar shows a full run detail tooltip
Dashboard: Integrations Panel (ยง8)โ
- Add an Integrations section as a sub-tab within Config or as a dedicated Settings tab
- Integration health list (ยง8.1): display all configured integrations with name, provider, credential status (
configured/missing required fields/env var unset), last test result and timestamp; fetched fromGET /api/integrations - Test delivery button (ยง8.2): "Test" button per integration row โ
POST /api/integrations/:name/test; display the test payload sent and the HTTP response code / body returned
Dashboard: UI Quality-of-Life (ยง9)โ
- Dark mode (ยง9.1): (โ Backlog)
- Toast notifications (ยง9.2): ensure every user-initiated action (trigger, cancel, pause/resume, save config, retry alert, clear lock) shows a short-lived success or error toast with descriptive text
- Keyboard shortcuts (ยง9.3):
rrefreshes the current view,/focuses the active filter/search input,Esccloses expanded rows or modals; register via auseEffectondocumentin the rootAppcomponent; show a shortcut hint in the UI - Column and filter persistence (ยง9.4): (โ Backlog)
- Copy to clipboard (ยง9.5): one-click copy button on run IDs, cycle IDs, and inside the log viewer (copies visible log text) via
navigator.clipboard.writeText - Auto-polling control (ยง9.6): toggle button to pause/resume auto-refresh of the jobs table; show "last refreshed N seconds ago" label; when paused, no polling until the user manually refreshes or re-enables
Testingโ
- Unit tests:
GET /api/daemon/infoโ returns correct PID, config path, start time, and job counts (ยง1.3) - Unit tests:
GET /api/db/job_runsโ pagination boundaries, sort by each column, all filter param combinations (ยง2.1) - Unit tests:
POST /api/db/job_state/:job/clear_lockโ clears lock when job is idle, returns 409 when job is RUNNING (ยง2.2) - Unit tests:
GET /api/db/run_logsโ full-textqsearch, stream filter, pagination (ยง2.3) - Unit tests:
POST /api/db/alerts/:id/retryโ re-queues a failed alert; returns 409 if alert is already pending; returns 404 for unknown ID (ยง2.5) - Unit tests:
POST /api/jobs/:name/pauseand/resumeโ correct state transitions, 404 on unknown job (ยง5.1) - Unit tests:
POST /api/jobs/:name/retryand/skipโ correct preconditions enforced (status must be FAILED / PENDING respectively) (ยง4.2) - Unit tests:
GET /api/integrationsโ returns all configured integrations with correctstatusfield (ยง8.1) - Unit tests:
POST /api/integrations/:name/testโ fires test delivery, returns response details, 404 on unknown integration (ยง8.2) - E2E: hash routing โ navigate directly to
#/runs/:id, verifyRunDetailViewrenders with correct data and back-navigation returns to the Jobs tab (ยง4.3) - E2E: bulk trigger by tag โ confirmation dialog lists correct jobs, all matching jobs reach
RUNNINGstate after confirm (ยง5.4) - E2E: dark mode toggle โ
darkclass applied to<html>, preference survives page reload (ยง9.1) (โ Backlog)
Phase 3.5 โ Daemon Runtime Configuration (huskyd.yaml)โ
Complete ยท All
huskyd.yamlparsing, API binding, authentication (bearer, basic auth, RBAC), TLS, CORS, structured logging with rotation, storage retention vacuum, executor shell + global env, scheduler jitter, and full unit + integration test coverage implemented. OIDC, metrics, tracing, secrets backends, and advanced executor resource limits deferred to Phase 5 backlog.
Dashboardโ
- huskyd.yaml subtab: if the daemon exposes a daemon-config path via
GET /api/daemon/info, add a second subtab in the Config view displaying its path and raw YAML content (read-only initially)
Discovery & Parsingโ
- Auto-assign API port with
127.0.0.1:0(OS picks free port); write bound address to<data>/api.addron startup; remove file on clean shutdown -
husky dashโ read<data>/api.addrand open dashboard URL in default system browser - Define
DaemonConfigGo struct in a newinternal/daemoncfgpackage mirroring the fullhuskyd.yamlshape - Load
huskyd.yamlfrom the same directory ashusky.yamlwhen the file exists; silently apply all defaults when it does not - Support
--daemon-config <path>flag onhuskydto override the default discovery path - Validate
huskyd.yamlwith a JSON Schema at load time; emit structured errors with field path context - All fields optional โ absence of any section falls back to the documented default; no breaking change for v1.0 users
-
husky validateextends to also validatehuskyd.yamlwhen present
API Serverโ
-
api.addrโ bind the HTTP server to the specified address when set; override the127.0.0.1:0default (enables fixed-port deployments and binding to0.0.0.0for remote access) - Emit a prominent startup
WARNlog whenapi.addrbinds to0.0.0.0without TLS and auth both enabled -
api.base_pathโ mount the dashboard and all API routes under a URL prefix (e.g./husky); strip prefix transparently so internal handlers see clean paths -
api.tls.enabledโ calltls.Listeninstead ofnet.Listenwhen true; refuse to start if cert/key files are missing or unreadable -
api.tls.cert/api.tls.keyโ absolute paths to PEM-encoded certificate and private key -
api.tls.min_versionโ enforce minimum TLS version; accept"1.2"(default) or"1.3" -
api.tls.client_caโ when set, require mTLS; reject connections whose client certificate is not signed by the given CA -
api.cors.allowed_originsโ inject CORS headers for listed origins; wildcard"*"accepted but emits a startup warning -
api.cors.allow_credentialsโ setAccess-Control-Allow-Credentials: truewhen needed for browser-based OAuth flows -
api.timeouts.read_header,.read,.write,.idleโ pass through tohttp.Serverfields; defaults match current hardcoded values
Authenticationโ
-
auth.type: none(default) โ no authentication; all endpoints open -
auth.type: bearerโ requireAuthorization: Bearer <token>on all REST + WebSocket requests-
auth.bearer.token_fileโ read one token per line; all listed tokens are valid; hot-reload on SIGHUP -
auth.bearer.tokenโ inline token string for simple setups (warn in logs that file-based is preferred)
-
-
auth.type: basicโ HTTP Basic Auth-
auth.basic.usersโ list of{username, password_hash}entries;password_hashis a bcrypt hash; plaintext passwords rejected at load time
-
-
auth.type: oidcโ startup stub: returns a clear error; full OIDC implementation deferred to Phase 5 -
auth.rbacโ define per-role allowed HTTP method + path patterns; built-in rolesviewer,operator,adminwith sensible defaults; custom roles accepted - Auth middleware applied uniformly to all routes served by the API server
- Auth disabled entirely when
auth.type: none; no performance overhead in the default case -
huskyCLI readsHUSKY_TOKENenv var and injectsAuthorization: Bearerheader for all HTTP-based commands (pause,resume,dash) when auth is enabled
Loggingโ
-
log.levelโdebug | info | warn | error; defaultinfo; hot-reloadable via SIGHUP -
log.format: jsonโ swap thesloghandler fromslog.NewTextHandlertoslog.NewJSONHandler; produces structured log lines consumable by Datadog, CloudWatch, Loki, Splunk -
log.output: stdout | stderr | fileโ defaultstdout;filerequireslog.file.path -
log.file.pathโ write logs to this file; create parent directories if absent -
log.file.max_size_mb,.max_backups,.max_age_days,.compressโ log rotation vialumberjack(or equivalent); compressed rotated files whencompress: true -
log.audit_log.enabledโ write a separate newline-delimited JSON audit log tolog.audit_log.path; each line is one job state transition event with full context (job, run ID, status, trigger, duration, reason) -
log.audit_log.max_size_mb,.max_backupsโ independent rotation settings for the audit log file
Storageโ
-
storage.sqlite.pathโ override default<data>/husky.dblocation -
storage.sqlite.wal_autocheckpointโ set SQLitewal_autocheckpointpragma; default 1000 pages -
storage.sqlite.busy_timeoutโ set SQLite busy-handler timeout; default5s -
storage.retention.max_ageโ background vacuum goroutine deletesjob_runs(and cascade:run_logs,run_outputs) older than the specified duration; runs every 24 h -
storage.retention.max_runs_per_jobโ after pruning by age, also enforce a per-job cap; keep only the N most recent completed runs; PENDING and RUNNING runs are never pruned - Background vacuum goroutine: run once at startup (after a short delay to not contend with catchup) then on a 24 h ticker; log number of rows pruned at
INFOlevel -
storage.engine: postgresstub โ parse the config field and emit a clearnot yet supportederror at startup; prevents a confusing failure mode when users copy a future config forward -
husky export --format=jsonhonours the configured storage path when reading state
Schedulerโ
-
scheduler.max_concurrent_jobsโ global ceiling on concurrently executing jobs across all job pools; default32; per-jobconcurrencysetting still applies at the job level -
scheduler.catchup_windowโ maximum look-back window when reconciling missed schedules on restart; overrides per-jobcatchup: truefor runs missed longer ago than this; default24h -
scheduler.shutdown_timeoutโ grace period given to in-flight jobs when a graceful stop is requested before SIGKILL is sent to remaining jobs; default60s -
scheduler.schedule_jitterโ random jitter (up to this duration) added to each scheduled trigger to avoid thundering-herd bursts when many jobs share the same tick; default0s(no jitter)
Executorโ
-
executor.pool_sizeโ size of the bounded goroutine worker pool; default8; surfaced inhuskyd.yamlso production deployments can tune without recompile -
executor.shellโ shell used to run job commands; default/bin/sh; common override/bin/bash -
executor.global_envโ key-value pairs injected into every job's environment; lower priority than per-jobenvinhusky.yaml; higher priority than host environment
Testingโ
- Unit tests: bearer token middleware โ valid token accepted, invalid token โ 401, missing header โ 401, auth disabled โ all requests pass
- Unit tests: basic auth middleware โ valid credentials, wrong password, unknown user
- Unit tests: RBAC โ viewer can GET, viewer cannot POST trigger, operator can trigger, admin can do everything
- Unit tests: storage retention vacuum โ rows older than
max_agedeleted, rows within window kept, RUNNING rows never deleted - Unit tests:
max_runs_per_jobcap โ correct rows pruned, most recent N retained - Unit tests:
executor.global_envโ values present in subprocess env, overridden by per-job env - Integration test: bearer auth enabled โ
husky status(HTTP commands) fail withoutHUSKY_TOKEN, succeed with correct token - Integration test:
huskyd.yamlabsent โ daemon starts with all defaults; no error or warning beyond "no huskyd.yaml found, using defaults"
Phase 4 โ Hardening & Distributionโ
Weeks 10โ12 ยท Cross-compilation, packaging, auth, TLS, integration test suite.
Cross-Compilationโ
- Build matrix in
Makefile:linux/amd64,linux/arm64,darwin/amd64,darwin/arm64,windows/amd64 - Verify
modernc.org/sqliteCGO-free build works on all five targets - Produce reproducible builds (embed
VERSION,COMMIT,BUILD_DATEvialdflags) - Add
make disttarget that produces adist/directory with all platform archives (.tar.gz) โ each archive contains the singlehuskybinary
Packaging & Distributionโ
- Write Homebrew formula
husky.rb(binary download + checksum) - Build
.debpackage for Debian/Ubuntu (usinggoreleaseror hand-rolleddpkg-deb) - Build
.rpmpackage for RHEL/Fedora - Create
install.shone-liner installer (detects platform, downloads correct binary) - Write
systemdunit file example (huskyd.service) โExecStartpoints tohusky daemon run - Write
launchdplist example for macOS autostart (invokeshusky daemon run) - Publish GitHub Release via GoReleaser CI step with checksums and signatures
Token Authenticationโ
Superseded by Phase 3.5 (
auth.type: bearer+auth.type: basic+ RBAC inhuskyd.yaml).
- Implement token auth middleware for all REST + WebSocket endpoints (ยง7.2)
- Read token from
auth.tokeninhusky.yamlorHUSKY_TOKENenv var - Store bcrypt hash of token in
job_statedatabase; never store plaintext - Require
Authorization: Bearer <token>header when auth is enabled - Token auth disabled by default; emit startup log note when disabled
-
huskyreads token fromHUSKY_TOKENenv var for socket/HTTP calls
TLSโ
Superseded by Phase 3.5 (
api.tls.*block inhuskyd.yaml;--daemon-configflag provides override path).
- Support
--tls-certand--tls-keyflags on daemon start (ยง7.1) - Verify cert/key pair at startup; refuse to start on invalid files
- Serve HTTPS when TLS flags are provided; HTTP otherwise
- Document self-signed cert generation steps in
README
External Bindingโ
Superseded by Phase 3.5 (
api.addrinhuskyd.yaml).
- Support
--bindflag to override default127.0.0.1:8420(ยง7.1) - Emit prominent startup warning when bound to
0.0.0.0without TLS + auth enabled
Integration Test Suiteโ
- Test: parse, validate, and execute the full ยง2.3 example pipeline end-to-end
- Test: DAG execution order is correct (ingest โ transform โ report)
- Test:
on_failure: stophalts downstream jobs - Test: retry with exponential backoff reaches max retries and fires alert
- Test:
concurrency: forbidskips overlapping runs - Test: daemon crash mid-run โ restart โ orphan reconciled โ retry fires
- Test:
catchup: truetriggers missed run after restart - Test:
catchup: falseskips missed run after restart - Test:
SIGHUPhot-reload swaps config without interrupting running job - Test: cycle in config is rejected at startup and reload
- Test: token auth rejects requests without valid bearer token
- Test:
husky validatecatches all invalid field combinations - Test: job with
slabreaches threshold,on_sla_breachfires, run completes withsla_breached = 1andSUCCESSstatus (Feature 1) - Test: job with
healthcheck, main succeeds, healthcheck fails โ run markedFAILED, retries triggered (Feature 6) - Test: job with
healthcheck on_fail: warn_only, healthcheck fails โ run markedSUCCESSwithhc_status = warn(Feature 6) - Test: output-passing pipeline, downstream job receives correct
{{ outputs.* }}value (Feature 2) - Test: output values from cycle A are not visible in independently triggered cycle B (Feature 2)
- Test:
husky run <job> --reasonpersists reason;husky auditreturns it;{{ run.reason }}renders in notification template (Feature 3, 5) - Test: per-job timezone โ job scheduled at wall-clock time in
America/New_Yorkfires at correct UTC moment before and after a DST transition (Feature 7) - Test:
husky pause --tag <tag>pauses all matching jobs;husky resume --tag <tag>resumes them (Feature 4) - Benchmark: scheduler tick latency under 100 concurrent jobs < 10 ms
Documentationโ
- Write
README.mdโ quickstart, install,husky.yaml,huskyd.yamlexample, CLI reference - Write
docs/configuration.mdโ full field reference including all v1.0 fields:sla,tags,timezone,healthcheck,output, expandednotifyschema (Features 1โ7) - Write
docs/security.mdโ auth, TLS, process isolation guidance - Write
docs/crash-recovery.mdโ startup sequence walkthrough - Write
docs/output-passing.mdโ guide to capture modes,cycle_idscoping, and template syntax (Feature 2) - Write
docs/notifications.mdโ all channel formats, template variables,only_after_failurebehaviour (Feature 3) - Add
CHANGELOG.mdfor v1.0.0 - Use Docusaurus to create a documentation website
Phase 5: Backlogโ
Deferred from Phase 3.5โ
Items below were descoped from the alpha release. They are fully specified and ready to implement post-launch.
OIDC Authenticationโ
-
auth.type: oidcโ full implementation: fetch and cache JWKS fromauth.oidc.jwks_uri; verify JWT signatures; refresh on key rotation (background goroutine) -
auth.oidc.issuer,.client_id,.audience,.jwks_uriโ standard OIDC discovery fields -
auth.oidc.role_claimโ JWT claim name mapped to Husky roles (viewer,operator,admin) -
auth.oidc.default_roleโ role assigned when claim is absent
Executor Advanced Configโ
-
executor.working_dir: config_dir | <absolute_path>โ working directory for all child processes;config_dir(default) means the directory containinghusky.yaml -
executor.resource_limits.max_memory_mbโ Linux: setRLIMIT_AS; macOS: best-effort viasetrlimit; 0 = unlimited -
executor.resource_limits.max_open_filesโ setRLIMIT_NOFILEon child process; default1024 -
executor.resource_limits.max_pidsโ Linux: setRLIMIT_NPROC; limits forking within a job
Metricsโ
-
metrics.enabledโ expose a Prometheus-compatible scrape endpoint; defaultfalse -
metrics.addrโ bind address for the/metricsHTTP server; default127.0.0.1:9091(separate from dashboard port) -
metrics.pathโ URL path for the scrape endpoint; default/metrics -
metrics.authโ whentrue, protect/metricswith the same bearer token as the main API - Instrument:
husky_job_runs_total{job,status,trigger},husky_job_duration_seconds{job},husky_job_sla_breaches_total{job},husky_job_retries_total{job},husky_scheduler_tick_duration_seconds,husky_running_jobs,husky_daemon_uptime_seconds - Use
prometheus/client_golangโ add togo.mod - Metrics server lifecycle tied to daemon context (shut down cleanly on SIGTERM)
Tracingโ
-
tracing.enabledโ emit OpenTelemetry spans; defaultfalse -
tracing.exporterโotlp | jaeger | zipkin | stdout; defaultotlp -
tracing.endpoint,.service_name,.sample_rate - Instrument spans for: job dispatch, executor subprocess wait, retry backoff sleep, IPC request handling, notification delivery
- Use
go.opentelemetry.io/otelโ add togo.mod - Tracer provider shut down gracefully on daemon context cancellation
Secrets Backendsโ
-
secrets.provider: vaultโ resolve${secret:NAME}placeholders via HashiCorp Vault KV v2 (addr,token_file,mount,path_prefix) -
secrets.provider: aws_ssmโ AWS Systems Manager Parameter Store (ambient IAM) -
secrets.provider: aws_secrets_managerโ AWS Secrets Manager (ambient IAM) -
secrets.provider: gcp_secret_managerโ GCP Secret Manager (Application Default Credentials) - Secret resolution at startup + SIGHUP; never written to disk or logs
- Add
${secret:NAME}interpolation syntax alongside${env:VAR}ininternal/config/env.go
Daemon-Level Alertsโ
-
alerts.on_daemon_startโ fire notification(s) whenhuskydstarts successfully -
alerts.on_sla_breachโ daemon-level fallback for SLA breach notifications when the job does not definenotify.on_sla_breach -
alerts.on_forced_killโ fire notification when a job is killed byshutdown_timeoutSIGKILL
Dashboard Customisationโ
-
dashboard.enabled: falseโ disable dashboard while keeping REST + WebSocket API; respond to/with404 -
dashboard.titleโ custom title injected into<title>and header; served viaGET /api/config -
dashboard.accent_colorโ CSS custom property for primary accent colour -
dashboard.log_backfill_linesโ historical lines sent on WebSocket connect; default200 -
dashboard.poll_intervalโ frontend poll interval; served viaGET /api/config; default5s -
GET /api/configโ new endpoint returning dashboard runtime config (no auth required)
HTTP Client (Outbound)โ
-
http_client.timeoutโ global timeout for all outbound HTTP calls; default15s -
http_client.max_retries/.retry_backoffโ notification delivery retries -
http_client.proxyโ optional HTTP proxy URL for all outbound calls -
http_client.ca_bundleโ PEM CA bundle appended to system cert pool
Process / Systemโ
-
process.user/process.groupโ drop privileges after port bind (Linuxsetuid/setgid; warning on macOS/Windows) -
process.pid_fileโ override PID file location -
process.ulimit_nofileโsetrlimit(RLIMIT_NOFILE)on daemon process at startup -
process.watchdog_intervalโsd_notify WATCHDOG=1pings for systemdWatchdogSec=
Phase 3.5 Tests (deferred)โ
- Unit tests:
DaemonConfigparsing โ all fields, defaults, unknown fields rejected - Unit tests: JSON Schema validation of
huskyd.yamlโ valid, invalid, missing optional sections - Unit tests:
api.base_pathprefix stripping โ all routes reachable under prefix - Unit tests: log format switch โ structured JSON output when
log.format: json - Unit tests:
scheduler.schedule_jitterโ dispatched times fall within[tick, tick + jitter]bounds - Unit tests: Prometheus metrics โ counters increment on run completion, gauge reflects running count
- Unit tests:
${secret:NAME}interpolation โ resolved from mock provider, not leaked to logs - Unit tests:
dashboard.poll_interval+dashboard.titleserved byGET /api/config - Integration test:
api.addr: 127.0.0.1:19999โ verify daemon binds to configured port - Integration test: TLS cert + key โ
https://succeeds,http://rejected - Integration test: SIGHUP with updated
log.level: debugโ debug lines appear without restart - Integration test: storage vacuum fires; pruned row count matches
max_ageconfig
- Distributed execution across multiple machines
- External queue integrations (Kafka, Redis, RabbitMQ)
- Dynamic job creation via API at runtime
- Plugin SDK / extensible executor types
- Per-job log retention config (default: 30 days / 1,000 runs) with background vacuum
- Optional log shipping to external sinks
- DST ambiguous window warnings (ยง9 Risk Register)
Multi-Project Daemon Modeโ
Target: after v1.0 ships and real-world usage patterns are established.
Goal: A single
huskydprocess manages multiple projects, each with its ownhusky.yaml, SQLite store, and job namespace. One binary, one process, one dashboard โ regardless of how many projects are registered.Rationale: v1.0 ships as a per-project daemon (one daemon per
husky.yaml). This is correct for the simple case and for developers using Husky locally. For production deployments โ ops teams, servers, organisations running many pipelines โ a per-project model multiplies resource overhead and eliminates the possibility of a unified operational view. V2 closes this gap without breaking v1.0 users.
Project Registryโ
- Define a top-level
huskyd.yamlserver config file (distinct from per-projecthusky.yaml) โ specifies bind address, TLS, auth, log retention, and the list of registered projects - Add
projects:block tohuskyd.yaml: list of entries withname(unique identifier) andconfig(path to project'shusky.yaml) - Implement
husky register <path>command โ adds a project entry tohuskyd.yamland hot-reloads the daemon - Implement
husky deregister <name>command โ removes project entry; waits for running jobs to complete before tearing down project context - Implement
husky projects listโ tabular output: project name, config path, job count, status (active/paused/error) - Validate that project names are unique, lowercase alphanumeric with hyphens, max 64 characters
- Detect duplicate
husky.yamlpaths at registration time; reject with a clear error
Project Isolationโ
- Each registered project runs in an isolated context: independent scheduler goroutine, independent executor pool, independent SQLite database under
<data-dir>/<project-name>/husky.db - Per-project goroutine pool size โ inherits global default from
huskyd.yamlbut overridable per project - A panic or fatal error in one project context must not crash or stall other projects โ recover and mark project as
errorstate, emit alert - Per-project
SIGHUP-style config reload โ daemon watches eachhusky.yamlfor changes and reloads only the affected project context - Global file-watcher (using
fsnotify) replaces per-project polling; eachhusky.yamlchange triggers isolated reload pipeline
Job Namespacingโ
- All job identifiers are namespaced as
<project-name>::<job-name>internally - CLI commands accept
--project <name>flag to scope operations:husky run --project billing daily-report - Without
--project, CLI auto-detects project by walking up from CWD until ahusky.yamlis found, then resolves to the registered project name โ preserves v1.0 UX in terminal -
after:dependencies within a project remain unqualified (after: ingest) โ no breaking change to v1.0 config syntax - Cross-project
after:dependencies expressed asafter: "billing::daily-report"โ optional v2 feature, gated behind a config flagallow_cross_project_deps: true -
husky statuswithout--projectshows only the auto-detected project;husky status --allshows all projects
Unified Web Dashboardโ
- Single web dashboard serves all projects โ project selector dropdown in top navigation
- Default landing page shows cross-project summary: total jobs, running count, recent failures across all projects
- Per-project view mirrors the v1.0 dashboard (job table, log stream, run history)
- WebSocket log streams are scoped per project; subscribing to a project stream does not receive logs from other projects
-
GET /api/projectsโ list all registered projects with summary stats -
GET /api/projects/<name>/jobsโ scoped equivalent of v1.0GET /api/jobs -
GET /api/projects/<name>/jobs/<job>/runsโ scoped run history - Retain v1.0 unscoped endpoints (
GET /api/jobs) as aliases that resolve via CWD auto-detection orX-Husky-Projectheader
Unified Integration Credentialsโ
- Move Slack, PagerDuty, Discord, SMTP integration credentials to
huskyd.yamlas global integrations โ available to all projects without duplication - Per-project
husky.yamlmay still define local integrations that override or supplement global ones โ global key takes precedence unless project explicitly overrides -
husky integrations listshows global integrations fromhuskyd.yaml;husky integrations list --project <name>shows merged view for that project
Single Unix Socketโ
- Replace per-project Unix sockets with a single socket at a fixed path (e.g.
/var/run/husky/huskyd.sockor~/.husky/huskyd.sock) - All IPC messages include a
projectfield; daemon routes to the correct project context - v1.0 CLI compatibility: if
projectfield is absent in IPC message, daemon resolves project from thecwdfield included in the message
Migration from V1โ
- Document upgrade path in
docs/upgrading-to-v2.md: existing per-projecthuskydusers move to running a singlehuskydwith ahuskyd.yamlthat registers their project -
husky migratecommand โ scans CWD forhusky.yaml, generates a minimalhuskyd.yamlwith the project registered, and prints next steps - v1.0 invocation (
huskyd --config ./husky.yaml) continues to work unchanged as a single-project shorthand โ no breaking change for existing users
Testingโ
- Unit tests: project registry โ register, deregister, duplicate detection, name validation
- Unit tests: job namespacing โ auto-detect project from CWD,
--projectflag override, cross-projectafter:resolution - Integration test: two projects registered, run jobs in both simultaneously, verify isolation (no log cross-contamination, independent retry state)
- Integration test: one project panics โ verify other project continues running unaffected
- Integration test: reload one project's
husky.yamlvia file-watcher โ verify other project is not restarted - Integration test:
husky status --allreturns correct aggregated view across projects - Integration test: cross-project
after:dependency โ downstream job in project B fires after upstream in project A completes - Integration test:
husky migrategenerates validhuskyd.yamlfrom existinghusky.yaml - Benchmark: scheduler tick latency under 10 projects ร 100 jobs (1,000 total) < 20 ms