diff --git a/doc/notifications.md b/doc/notifications.md index 5dbd8f34..450f17b2 100644 --- a/doc/notifications.md +++ b/doc/notifications.md @@ -1,4 +1,308 @@ # Notifications -**TODO(stevvooe)** Cover use and deployment of webhook notifications. Link to -description in architecture documentation. \ No newline at end of file +> **TODO:** Link out to the architecture document on notification support. + +The Registry supports sending webhook notifications in response to events +happening within the registry. Notifications are sent in response to manifest +pushes and pulls and layer pushes and pulls. These actions are serialized into +events. The events are queued into a registry-internal broadcast system which +queues and dispatches events to [_Endpoints_](#endpoints). + +> **TODO:** Insert diagram of event system. + +## Endpoints + +Notifications are sent to _endpoints_ via HTTP requests. Each configurated +endpoint has isolated queues, retry configuration and http targets within each +instance of a registry. When an action happens within the registry, it is +converted into an event which is dropped into an inmemory queue. When the +event reaches the end of the queue, an http request is made to the endpoint +until the request succeeds. The events are sent serially to each endpoint but +order is not guaranteed. + +## Configuration + +To setup a registry instance to send notifications to endpoints, one must add +them to the configuration. A simple example follows: + +```yaml +notifications: + endpoints: + - name: alistener + url: https://mylistener.example.com/event + headers: + Authorization: [Bearer ] + timeout: 500ms + threshold: 5 + backoff: 1s +``` + +The above would configure the registry with an endpoint to send events to +"https://mylistener.example.com/event", with the header "Authorization: Bearer +". The request would timeout after 500 milliseconds. If +5 failures happen consecutively, the registry will backoff for 1 second before +trying again. + +For details on the fields, please see the [configuration documentation](configuration.md#notifications). + +A properly configured endpoint should lead to a log message from the registry +upon startup: + +``` +INFO[0000] configuring endpoint alistener (https://mylistener.example.com/event), timeout=500ms, headers=map[Authorization:[Bearer ]] app.id=812bfeb2-62d6-43cf-b0c6-152f541618a3 environment=development service=registry +``` + +## Events + +Events have a well-defined JSON structure and are sent as the body of +notification requests. One or more events are sent in a structure called an +envelope. Each event has a unique id that can be used to uniqify incoming +requests, if required. Along with that, an _action_ is provided with a +_target, identifying the object mutated during the event. + +The fields available in an event are described in detail in the +[godoc](http://godoc.org/github.com/docker/distribution/notifications#Event). + +> **TODO:** Let's break out the fields here rather than rely on the godoc. + +The following is an example of a JSON event, sent in response to the push of a +manifest: + +```json +{ + "id": "asdf-asdf-asdf-asdf-0", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.distribution.manifest.v1+json", + "length": 1, + "digest": "sha256:0123456789abcdef0", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } +} +``` + +## Envelope + +The envelope contains one or more events, with the following json structure: + +```json +{ + "events": [ ... ], +} +``` + +While events may be sent in the same envelope, the set of events within that +envelope have no implied relationship. For example, the registry may choose to +group unrelated events and send them in the same envelope to reduce the total +number of requests. + +The full package has the mediatype +"application/vnd.docker.distribution.events.v1+json", which will be set on the +request coming to an endpoint. + +An example of a full event may look as follows: + +```json +GET /callback +Host: application/vnd.docker.distribution.events.v1+json +Authorization: Bearer +Content-Type: application/vnd.docker.distribution.events.v1+json + +{ + "events": [ + { + "id": "asdf-asdf-asdf-asdf-0", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.distribution.manifest.v1+json", + "length": 1, + "digest": "sha256:0123456789abcdef0", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + }, + { + "id": "asdf-asdf-asdf-asdf-1", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.container.image.rootfs.diff+x-gtar", + "length": 2, + "digest": "tarsum.v2+sha256:0123456789abcdef1", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + }, + { + "id": "asdf-asdf-asdf-asdf-2", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.container.image.rootfs.diff+x-gtar", + "length": 3, + "digest": "tarsum.v2+sha256:0123456789abcdef2", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + } + ] +} +``` + +## Responses + +The registry is fairly accepting of the response codes from endpoints. If an +endpoint responds with any 2xx or 3xx response code (after following +redirects), the message will be considered delivered and discarded. + +In turn, it is recommended that endpoints are accepting of incoming responses, +as well. While the format of event envelopes are standardized by media type, +any "pickyness" about validation may cause the queue to backup on the +registry. + +## Monitoring + +The state of the endpoints are reported via the debug/vars http interface, +usually configured to "http://localhost:5001/debug/vars". Information such as +configuration and metrics are available by endpoint. + +The following provides and example of a few endpoints that have experience +several failures and have since recovered: + +```json +"notifications":{ + "endpoints":[ + { + "name":"local-8082", + "url":"http://localhost:5003/callback", + "Headers":{ + "Authorization":[ + "Bearer \u003can example token\u003e" + ] + }, + "Timeout":1000000000, + "Threshold":10, + "Backoff":1000000000, + "Metrics":{ + "Pending":76, + "Events":76, + "Successes":0, + "Failures":0, + "Errors":46, + "Statuses":{ + + } + } + }, + { + "name":"local-8083", + "url":"http://localhost:8083/callback", + "Headers":null, + "Timeout":1000000000, + "Threshold":10, + "Backoff":1000000000, + "Metrics":{ + "Pending":0, + "Events":76, + "Successes":76, + "Failures":0, + "Errors":28, + "Statuses":{ + "202 Accepted":76 + } + } + } + ] +} +``` + +If using notification as part of a larger application, it is _critical_ to +monitor the size ("Pending" above) of the endpoint queues. If failures or +queue sizes are increasing, it can indicate a larger problem. + +The logs are also a valuable resource for monitoring problems. A failing +endpoint will lead to messages similar to the following: + +``` +ERRO[0340] retryingsink: error writing events: httpSink{http://localhost:5003/callback}: error posting: Post http://localhost:5003/callback: dial tcp 127.0.0.1:5003: connection refused, retrying +WARN[0340] httpSink{http://localhost:5003/callback} encountered too many errors, backing off +``` + +The above indicates that several errors have led to a backoff and the registry +will wait before retrying. + +## Considerations + +Currently, the queues are inmemory, so endpoints should be _reasonably +reliable_. They are designed to make a best-effort to send the messages but if +an instance is lost, messages may be dropped. If an endpoint goes down, care +should be taken to ensure that the registry instance is not terminated before +the endpoint comes back up or messages will be lost. + +This can be mitigated by running endpoints in close proximity to the registry +instances. One could run an endpoint that pages to disk and then forwards a +request to provide better durability. + +The notification system is designed around a series of interchangeable _sinks_ +which can be wired up to achieve interesting behavior. If this system doesn't +provide acceptable guarantees, adding a transactional `Sink` to the registry +is a possibility, although it may have an effect on request service time. +Please see the +[godoc](http://godoc.org/github.com/docker/distribution/notifications#Sink) +for more information. +