Merge pull request #599 from stevvooe/clarify-deletion-by-digest-constraint

Clarify digest in API specification
2015-06-08 19:04:53 -07:00 · 2015-06-08 19:04:53 -07:00 · f63313de1f
commit f63313de1f
parent 89e0955d4c 7e6b4e8c52
3 changed files with 177 additions and 9 deletions
--- a/docs/spec/api.md
+++ b/docs/spec/api.md
@ -116,6 +116,12 @@ indicating what is different. Optionally, we may start marking parts of the
 specification to correspond with the versions enumerated here.

 <dl>
+  <dt>2.0.2</dt>
+  <dd>
+    <li>Added section covering digest format.</li>
+    <li>Added more clarification that manifest cannot be deleted by tag.</li>
+  </dd>
+
 	<dt>2.0.1</dt>
 	<dd>
 		<ul>
@ -238,6 +244,84 @@ When a `200 OK` or `401 Unauthorized` response is returned, the
 Clients may require this header value to determine if the endpoint serves this
 API. When this header is omitted, clients may fallback to an older API version.

+### Content Digests
+
+This API design is driven heavily by [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
+The core of this design is the concept of a content addressable identifier. It
+uniquely identifies content by taking a collision-resistent hash of the bytes.
+Such an identifier can be independently calculated and verified by selection
+of a common _algorithm_. If such an identifier can be communicated in a secure
+manner, one can retrieve the content from an insecure source, calculate it
+independently and be certain that the correct content was obtained. Put simply,
+the identifier is a property of the content.
+
+To disambiguate from other concepts, we call this identifier a _digest_. A
+_digest_ is a serialized hash result, consisting of a _algorithm_ and _hex_
+portion. The _algorithm_ identifies the methodology used to calculate the
+digest. The _hex_ portion is the hex-encoded result of the hash.
+
+We define a _digest_ string to match the following grammar:
+
+  digest      := algorithm ":" hex
+  algorithm   := /[A-Fa-f0-9_+.-]+/
+  hex         := /[A-Fa-f0-9]+/
+
+Some examples of _digests_ include the following:
+
+digest                                                                            | description                                   |
+----------------------------------------------------------------------------------|------------------------------------------------
+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b           | Common sha256 based digest                    |
+tarsum.v1+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Tarsum digest, used for legacy layer digests. |
+
+> __NOTE:__ While we show an example of using a `tarsum` digest, the security
+> of tarsum has not been verified. It is recommended that most implementations
+> use sha256 for interoperability.
+
+While the _algorithm_ does allow one to implement a wide variety of
+algorithms, compliant implementations should use sha256. Heavy processing of
+input before calculating a hash is discouraged to avoid degrading the
+uniqueness of the _digest_ but some canonicalization may be performed to
+ensure consistent identifiers.
+
+Let's use a simple example in pseudo-code to demonstrate a digest calculation:
+
+```
+let C = 'a small string'
+let B = sha256(C)
+let D = 'sha256:' + EncodeHex(B)
+let ID(C) = D
+```
+
+Above, we have bytestring _C_ passed into a function, _SHA256_, that returns a
+bytestring B, which is the hash of _C_. _D_ gets the algorithm concatenated
+with the hex encoding of _B_. We then define the identifier of _C_ to _ID(C)_
+as equal to _D_. A digest can be verified by independently calculating _D_ and
+comparing it with identifier _ID(C)_
+
+#### Digest Header
+
+To provide verification of http content, any response may include a `Docker-
+Content-Digest` header. This will include the digest of the target entity
+returned in the response. For blobs, this is the entire blob content. For
+manifests, this is the manifest body without the signature content, also known
+as the JWS payload. Note that the commonly used canonicalization for digest
+calculation may be dependent on the mediatype of the content, such as with
+manifests.
+
+The client may choose to ignore the header or may verify it to ensure content
+integrity and transport security. This is most important when fetching by a
+digest. To ensure security, the content should be verified against the digest
+used to fetch the content. At times, the returned digest may differ from that
+used to initiate a request. Such digests are considered to be from different
+_domains_, meaning they have different values for _algorithm_. In such a case,
+the client may choose to verify the digests in both domains or ignore the
+server's digest. To maintain security, the client _must_ always verify the
+content against the _digest_ used to fetch the content.
+
+> __IMPORTANT:__ If a _digest_ is used to fetch content, the client should use
+> the same digest used to fetch the content to verify it. The header `Docker-
+> Content-Digest` should not be trusted over the "local" digest.
+
 ### Pulling An Image

 An "image" is a combination of a JSON manifest and individual layer files. The
@ -717,7 +801,7 @@ A list of methods and URIs are covered in the table below:
 | GET | `/v2/<name>/tags/list` | Tags | Fetch the tags under the repository identified by `name`. |
 | GET | `/v2/<name>/manifests/<reference>` | Manifest | Fetch the manifest identified by `name` and `reference` where `reference` can be a tag or digest. |
 | PUT | `/v2/<name>/manifests/<reference>` | Manifest | Put the manifest identified by `name` and `reference` where `reference` can be a tag or digest. |
-| DELETE | `/v2/<name>/manifests/<reference>` | Manifest | Delete the manifest identified by `name` and `reference` where `reference` can be a tag or digest. |
+| DELETE | `/v2/<name>/manifests/<reference>` | Manifest | Delete the manifest identified by `name` and `reference`. Note that a manifest can _only_ be deleted by `digest`. |
 | GET | `/v2/<name>/blobs/<digest>` | Blob | Retrieve the blob from the registry identified by `digest`. A `HEAD` request can also be issued to this endpoint to obtain resource information without receiving all data. |
 | POST | `/v2/<name>/blobs/uploads/` | Intiate Blob Upload | Initiate a resumable blob upload. If successful, an upload location will be provided to complete the upload. Optionally, if the `digest` parameter is present, the request body will be used to complete the upload in a single request. |
 | GET | `/v2/<name>/blobs/uploads/<uuid>` | Blob Upload | Retrieve status of upload identified by `uuid`. The primary purpose of this endpoint is to resolve the current status of a resumable upload. |
@ -1324,7 +1408,7 @@ The error codes that may be included in the response body are enumerated below:

 #### DELETE Manifest

-Delete the manifest identified by `name` and `reference` where `reference` can be a tag or digest.
+Delete the manifest identified by `name` and `reference`. Note that a manifest can _only_ be deleted by `digest`.



@ -1361,7 +1445,7 @@ The following parameters should be specified on the request:



-###### On Failure: Invalid Name or Tag
+###### On Failure: Invalid Name or Reference

 ```
 400 Bad Request
@ -1379,7 +1463,7 @@ Content-Type: application/json; charset=utf-8
 }
 ```

-The specified `name` or `tag` were invalid and the delete was unable to proceed.
+The specified `name` or `reference` were invalid and the delete was unable to proceed.



@ -1449,7 +1533,7 @@ Content-Type: application/json; charset=utf-8
 }
 ```

-The specified `name` or `tag` are unknown to the registry and the delete was unable to proceed. Clients can assume the manifest was already deleted if this response is returned.
+The specified `name` or `reference` are unknown to the registry and the delete was unable to proceed. Clients can assume the manifest was already deleted if this response is returned.



--- a/docs/spec/api.md.tmpl
+++ b/docs/spec/api.md.tmpl
@ -116,6 +116,12 @@ indicating what is different. Optionally, we may start marking parts of the
 specification to correspond with the versions enumerated here.

 <dl>
+  <dt>2.0.2</dt>
+  <dd>
+    <li>Added section covering digest format.</li>
+    <li>Added more clarification that manifest cannot be deleted by tag.</li>
+  </dd>
+
 	<dt>2.0.1</dt>
 	<dd>
 		<ul>
@ -238,6 +244,84 @@ When a `200 OK` or `401 Unauthorized` response is returned, the
 Clients may require this header value to determine if the endpoint serves this
 API. When this header is omitted, clients may fallback to an older API version.

+### Content Digests
+
+This API design is driven heavily by [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
+The core of this design is the concept of a content addressable identifier. It
+uniquely identifies content by taking a collision-resistent hash of the bytes.
+Such an identifier can be independently calculated and verified by selection
+of a common _algorithm_. If such an identifier can be communicated in a secure
+manner, one can retrieve the content from an insecure source, calculate it
+independently and be certain that the correct content was obtained. Put simply,
+the identifier is a property of the content.
+
+To disambiguate from other concepts, we call this identifier a _digest_. A
+_digest_ is a serialized hash result, consisting of a _algorithm_ and _hex_
+portion. The _algorithm_ identifies the methodology used to calculate the
+digest. The _hex_ portion is the hex-encoded result of the hash.
+
+We define a _digest_ string to match the following grammar:
+
+  digest      := algorithm ":" hex
+  algorithm   := /[A-Fa-f0-9_+.-]+/
+  hex         := /[A-Fa-f0-9]+/
+
+Some examples of _digests_ include the following:
+
+digest                                                                            | description                                   |
+----------------------------------------------------------------------------------|------------------------------------------------
+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b           | Common sha256 based digest                    |
+tarsum.v1+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Tarsum digest, used for legacy layer digests. |
+
+> __NOTE:__ While we show an example of using a `tarsum` digest, the security
+> of tarsum has not been verified. It is recommended that most implementations
+> use sha256 for interoperability.
+
+While the _algorithm_ does allow one to implement a wide variety of
+algorithms, compliant implementations should use sha256. Heavy processing of
+input before calculating a hash is discouraged to avoid degrading the
+uniqueness of the _digest_ but some canonicalization may be performed to
+ensure consistent identifiers.
+
+Let's use a simple example in pseudo-code to demonstrate a digest calculation:
+
+```
+let C = 'a small string'
+let B = sha256(C)
+let D = 'sha256:' + EncodeHex(B)
+let ID(C) = D
+```
+
+Above, we have bytestring _C_ passed into a function, _SHA256_, that returns a
+bytestring B, which is the hash of _C_. _D_ gets the algorithm concatenated
+with the hex encoding of _B_. We then define the identifier of _C_ to _ID(C)_
+as equal to _D_. A digest can be verified by independently calculating _D_ and
+comparing it with identifier _ID(C)_
+
+#### Digest Header
+
+To provide verification of http content, any response may include a `Docker-
+Content-Digest` header. This will include the digest of the target entity
+returned in the response. For blobs, this is the entire blob content. For
+manifests, this is the manifest body without the signature content, also known
+as the JWS payload. Note that the commonly used canonicalization for digest
+calculation may be dependent on the mediatype of the content, such as with
+manifests.
+
+The client may choose to ignore the header or may verify it to ensure content
+integrity and transport security. This is most important when fetching by a
+digest. To ensure security, the content should be verified against the digest
+used to fetch the content. At times, the returned digest may differ from that
+used to initiate a request. Such digests are considered to be from different
+_domains_, meaning they have different values for _algorithm_. In such a case,
+the client may choose to verify the digests in both domains or ignore the
+server's digest. To maintain security, the client _must_ always verify the
+content against the _digest_ used to fetch the content.
+
+> __IMPORTANT:__ If a _digest_ is used to fetch content, the client should use
+> the same digest used to fetch the content to verify it. The header `Docker-
+> Content-Digest` should not be trusted over the "local" digest.
+
 ### Pulling An Image

 An "image" is a combination of a JSON manifest and individual layer files. The
--- a/registry/api/v2/descriptors.go
+++ b/registry/api/v2/descriptors.go
@ -639,7 +639,7 @@ var routeDescriptors = []RouteDescriptor{
 			},
 			{
 				Method:      "DELETE",
-				Description: "Delete the manifest identified by `name` and `reference` where `reference` can be a tag or digest.",
+				Description: "Delete the manifest identified by `name` and `reference`. Note that a manifest can _only_ be deleted by `digest`.",
 				Requests: []RequestDescriptor{
 					{
 						Headers: []ParameterDescriptor{
@ -657,8 +657,8 @@ var routeDescriptors = []RouteDescriptor{
 						},
 						Failures: []ResponseDescriptor{
 							{
-								Name:        "Invalid Name or Tag",
-								Description: "The specified `name` or `tag` were invalid and the delete was unable to proceed.",
+								Name:        "Invalid Name or Reference",
+								Description: "The specified `name` or `reference` were invalid and the delete was unable to proceed.",
 								StatusCode:  http.StatusBadRequest,
 								ErrorCodes: []ErrorCode{
 									ErrorCodeNameInvalid,
@ -690,7 +690,7 @@ var routeDescriptors = []RouteDescriptor{
 							},
 							{
 								Name:        "Unknown Manifest",
-								Description: "The specified `name` or `tag` are unknown to the registry and the delete was unable to proceed. Clients can assume the manifest was already deleted if this response is returned.",
+								Description: "The specified `name` or `reference` are unknown to the registry and the delete was unable to proceed. Clients can assume the manifest was already deleted if this response is returned.",
 								StatusCode:  http.StatusNotFound,
 								ErrorCodes: []ErrorCode{
 									ErrorCodeNameUnknown,