The repodata.RepoDataQueryResult is supposed to be a bytes API and
that's what our users (see build module) expect.
Note that the repodata.RepoDataQueryResult.path method still returns
a str. That's what the rpmquery.RpmQuery, debquery.DebQuery, and
archquery.ArchQuery classes also do (if the "path" was initially
passed as a str).
Fixes: #760 ("osc build fails when called with --prefer-pkgs where the
passed directory is a repodata repository or a subdirectory of one")
The packagequery.PackageQueryResult class is supposed to provide a
bytes API. Hence, packagequery.PackageQueryResult.evr() should return
bytes instead of a str. Also, adjust the single caller in the build
module.
This is a follow-up commit for commit
6dbf103e10 ("Use html.escape instead
removed cgi.escape"), which breaks the python2 backward compatibility
(since the "html" module is not available by default) and also breaks
the code in general (due to missing html imports).
The fix is based on the proposed fix in [1].
Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace")
[1] https://github.com/openSUSE/osc/pull/764
The correct zst magic is b'(\xb5/\xfd' (4 bytes) (that's what obs-build
is also using).
Kudos to Tobias Ellinghaus for spotting this.
Fixes: #756 ("zst detection fails")
In some rare cases the chardet encoding detection detects
a wrong encoding standard. Then we switch to latin-1 which
covers most if utf-8 does not work.
No functional changes. Note that we cannot simply decode the control's
fields as ascii/utf-8 because a field is not necessarily a valid
ascii/utf-8 encoding (it is possible to register _arbitrary_ custom
fields via a 'register-custom-fields' hook when building a deb
package).
Note: DebQuery.debvercmp really deserves a cleanup:/
cmp(a, b) returns
-1 if a < b
0 if a == 0
1 if a > b
This is needed since python3 has no cmp function anymore.
All credits for this go to Marco Strigl <mstrigl@suse.com> (see
PR#483 [1]).
[1] https://github.com/openSUSE/osc/pull/483
The None argument is always <= than the other argument. We need this
in case of a broken/pathological package where version() or release()
return None (see vercmp (which calls rpmvercmp)).
Returning None breaks ArchQuery.vercmp. Returning b'0' is ok because
an epoch, if present, is always supposed to be an integer (at least
in a "valid" arch package (see scripts/libmakepkg/lint_pkgbuild/epoch.sh.in
in the pacman sources)). Hence, if we compare the epoch of a package,
which has no explicit epoch set, with the epoch of a package, which
has an explicit epoch set, we always have a <= relation.
Now, CpioWrite provides a bytes-only API. It would be also possible
that the API accepts bytes and str (we would need to explicitly
encode the latter) but this would be a bit inconsistent wrt.
cpio.CpioRead (which is bytes-only).
Also, by using a bytesarray instead of a [] we avoid several
intermediate ''.join(...)s.
This is a bytes only API because a filename in a cpio archive can
contain, for instance, illegal utf-8 sequences. A user can decode
the filename/content as she wishes.
A ValueError is more appropriate because there is no issue with the
ar archive itself. Also, the old codepath never worked because the
fn parameter was missing.
Since an ar archive can contain arbitary filenames (that is a
filename can be an invalid utf-8 encoding (for instance,
"foo\xff\xffbar")), the ar module provides a bytes only API. A
user can decode filenames as she wishes.
Note: if a "fn" parameter is passed to Ar.__init__ it should be a
bytes (a str is also ok, but then be aware that an ArError's file
attribute might be a str or a bytes).
There is no need to unpack a single byte because it is not
affected by (byte) endianness (and that's what struct.unpack is
about). Moreover, rpmquery.unpack_string now supports an optional
encoding parameter, which could be used by the python3 port to
decode a string. Note: in general we cannot assume that all strings
in a rpm are utf-8 encoded (it is possible to build a rpm that
contains illegal utf-8 sequences).
This functions are used in the whole code and are
mandatory for the python3 support to work. In python2
case nothing is touched.
* cmp_to_key:
converts a cmp= into a key= function
* decode_list:
decodes each element of a list. This is needed if
we have a mixed list with strings and bytes.
* decode_it:
Takes the input and checks if it is not a string.
Then it uses chardet to get the encoding.