0.18.2
Changelog
- d214a34 chore: bump version: 0.18.2-rc2 -> 0.18.2
- 10b172b docs: add release notes for 0.18.2 (#4318)
- 38e0efb chore: bump version: 0.18.2-rc1 -> 0.18.2-rc2
- c5cb23f fix: harness azure dependencies (#4319)
- f623c1c chore: bump version: 0.18.2-rc0 -> 0.18.2-rc1
- 350de38 chore: cleanup container terminations resent on agent exit (#4309) [DET-7533]
- f02f11b fix: skip allocation check on checkpoint save to fix pulling preemption (#4308)
- cfc0f81 fix: remove limit of length 4 for task id (#4304)
- 0645ba9 fix: harness dependency class fore azure (#4302)
- 9991528 chore: bump version: 0.18.2-dev0 -> 0.18.2-rc0
- 86b6c28 chore: lock api state for backward compatibility check
- 271ae41 chore: ensure container terminations aren't lost in the event of network failures [DET-7440, DET-7441] (#4272)
- 2d5a118 fix: check for errors from actor asks in createResourcePoolSummary [DET-7494] (#4294)
- daee036 fix: stale agent state crashes resource pool [DET-7493] (#4295)
- 4722caa fix: limit waiting for logger to 30 seconds (#4289)
- b8f9463 fix: jupyterlab modal stale values, not sending values to api
- 42615b4 feat: add cli and api for delete checkpoints [DET-7119], [DET-7120] (#4246)
- ad9bcd8 chore: restore determined_version in Trial checkpoint metadata (#4288)
- 0b7dcdb chore: widen node version detection to 16.29 (#4290)
- 118a850 fix: pass signals through wrapper processes (#4286)
- 0089ede fix: enable Full Configuration in appropriate modals [DET-7495] (#4280)
- 45c62f2 feat: show only selected trials in learning curve
- 922b113 chore: reduce use of any in InteractiveTable
- 5b03672 chore: update docs to accurate node support (#4279)
- 59f686c fix: make trials table sort properly (#4275)
- 695b301 feat: incrementally release resources (#4278)
- c0f53fa perf: avoid Seq Scanning raw steps (#4244)
- b36596f fix: backslashes in show_ssh_command for windows (#4260)
- ec503e0 fix: correct error message when command list fails (#4235)
- 15b0525 feat: add master-side verification for agent mTLS (#4220)
- c6e8249 fix: don't hardcode /bin/which in entrypoints (#4257)
- 236c241 refactor: migrating tables to use InteractiveTable component [DET-7382] (#4229)
- 156ff50 fix: reshow drop targets for customize columns modal (#4199)
- 5e99626 docs: delete mnist_tf_layers example (#4263)
- 9d82231 Revert "add autosync action"
- 8288461 add autosync action
- 70e534f fix: det task logs unable to use trial task IDs and checkpoint GC task IDs [DET-7424] (#4258)
- 12d21c2 fix: gcs storage upstream test failure (#4253)
- b09ec40 feat: improve trial logs to have some system events [DET-5885] (#4215)
- c20ad43 ci: Lints migrations to ensure new migrations have higher timestamps than old ones [DET-7146] (#4250)
- da83e65 chore: small cleanups for slurm (#4211)
- b71f179 feat: det deploy now can use --yes to skip prompts [DET-7408] (#4255)
- 7b3fee4 chore: better slurm option override support (#4254)
- 7af3cfe feat: add google cloud storage (gcs) prefix support [DET-6883] (#4238)
- b5c2b14 docs: add enhanced launcher user guide (#4248)
- eb1d9aa chore: add dev server support for embedded tasks view (#4243)
- 9d19a87 perf: dont repeatedly reprocess profiler data
- b6582c0 ci: add check to prevent ssh git url (#4240)
- 50bdbc3 chore: bumpenvs (#4239)
- 1065314 chore: add node_modules to eslintignore (#4237)
- c74dee6 feat: Add theme toggle to user settings [DET-7321] (#4204)
- 3016fc1 chore: remove legacy code/docs for NCCL/Gloo port range config (#4187)
- f1e8d3c ci: check ulimit before 4x4 distributed test on macs (#4234)
- 7d9e999 fix: adjust page to preserve props.children (#4231)
- 35258ef feat: Enable sending empty string for displayname with fallback to username [DET-7031] (#4140)
- 2e77ec5 fix: det shell start/open, in windows (#4227)
- 7217cfd feat: k8s detect non-det tasks (#4154)
- a88f5c4 chore: share webui base page (#4218)
- 15bd758 fix: use custom image for tensor board [DET-7242] (#4123)
- de926b1 chore: fix rendezvous timeout logic (#4226)
- c29e97a chore: base Dockerfile TensorFlow 2.6, 2.7, 2.8 security patches [DET-7325] (#4223)
- 7882381 fix: authenticate pprof endpoints [DET-7402]
- 2b0ccb8 chore: bump version: 0.18.1-dev0 -> 0.18.2-dev0
- bcbab4d docs: add release notes for 0.18.1 (#4216)
- 034b957 chore: revert rename of
RestoreResourcesFailure -> ResourcesFailure
. (#4210) - a7c4c2a feat: enable agent-side mTLS for connection to master (#4212)
- c9e13b6 feat: save connection in context (#4213)
- 15f65ab feat: pix2pix example (#4125)
- db987de chore: delete "conditional" json-schema extension (#4177)
- 9b33f82 fix: use bigint for checkpoint size in
proto_get_trials_plus
(#4208) - 2c4c847 docs: update release note instructions with important admonition (#4207)
- 138caf8 fix: pool detail page tab count when loading (#4200)
- c5f685f feat: move task logs to embedded view [DET-7169] (#4179)
- 54982c9 perf: tweak proto_get_trials_plus plan (#4206)
- e585ebb refactor: cleanup task logging shell scripts (#4113)
- 4e2913f chore: update entrypoint in expconf docs (#4198)
- fcff1c2 fix: agent panic on commands with unusal formatted environment variables [DET-6649] (#4202)
- 4172a46 refactor: pull in user service code changes from EE (#4183)
- 1c48fa6 docs: improve OpenTelemetry docs slightly (#4182)
- 7f508e4 fix: allow
internal: null
for pre-0.15.6 experiments (#4197) - 55957fe fix: add restarts back to get_trial_ids for sorting
- dd8d3f3 feat: add det experiment logs <EXP_ID> [DET-7145] (#4190)
- 3ef36d5 chore: refactor action dropdown comp to be reused [DET-7171] (#4164)
- 623e60d ci: bust circleci cache (#4189)
- af56e01 docs: document using AWS Load Balancer on EKS [DET-6669] (#4174)
- 36e5667 feat: allow enabling Prometheus monitoring through helm [DET-6993] (#4158)
- 3bb7bb1 style: minor theme fixes and style adjustments [DET-7349] (#4161)
- 2166dfc docs: update screen shots for cluster UI (#4188)
- b7a3278 style: address new
flake8-comprehensions
,pyzmq==23.0.0
. (#4185) - 5e1a81c feat: allow setting of checkpointStorage.prefix through helm [DET-7152] (#4152)
- 526e1dc feat: display trial restarts [DET-7347] (#4160)
- 2fdc6d7 fix: agent can now be control-C while connecting to master [DET-6287] (#4178)
- d339f7b chore: migrate det a list to new api and bindings (#4186)
- ed257d3 refactor: rip out
UseFluentLogging
. (#4184) - 21b8590 docs: update fluent-bit version. (#4181)
- 4b325a5 docs: document database SSL options (#4169)
- 5d228f8 chore: make core-api tutorial Windows-friendly (#4176)
- 1875051 fix: sync slot usage for k8s [DET-7350] (#4172)
- 92a944f chore: add .dccache to .gitignore (#4173)
- 5e7a30c docs: fix typo in release note (#4170)
- a52210f feat: chart sync provider [DET-7309] (#4139)
- a7fafbb fix: enable currently active side nav item (#4167)
- 0a5d54d chore: fix hardcoded url in schema logic (#4171)
- 7df89b2 chore: allow deleting delete failed experiments (#4141) [DET-7070]
- 4ef2b67 perf: fixup query for latest training per trial (#4166) [DET-7352]
- 3d3fe1c fix: include both old and new checkpoints in total checkpoint size (#4165)
- 4d98cee fix: replace carriage returns with newlines in task output [DET-5302] (#3945)
- a2f878a chore: only warn on invalid calls to daemonize resources for slurm (#4108)
- c00ce0a chore: check git state in lock-api-state.sh (#4163)
- ec743d7 ci: turn off github annotations. (#4146)
- 0efd44d doc: fix a broken file reference (#4131)
- b911d85 fix: avoid potential race between AllocationReady and Running state (#4159)
- cc59985 revert: partial revert of 96e0e58 (#4162)
- f07633b fix: port collisions for multiple shared-non distributed jobs (#4120) [HAL-2894]
- 80d1bb1 feat: Add embedded experience for JupyterLab and TensorBoard [DET-7162] (#4134)
- 83cc1ec fix: prevent experiment name in header from flowing entire vertical space of screen during resize (#4157)
Docker images
docker pull determinedai/determined-master:0.18.2
docker pull determinedai/determined-master:d214a34df
docker pull determinedai/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1
docker pull determinedai/determined-dev:determined-master-d214a34df
docker pull determinedai/determined-dev:determined-master-d214a34df0c0eb2e5e38ae63d1359862fd2af8f1
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.2
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1