1
0
mirror of https://github.com/openSUSE/osc.git synced 2024-09-20 09:16:16 +02:00
Commit Graph

122 Commits

Author SHA1 Message Date
Marcus Huewe
1933da5bcc Use os.getcwdb() instead of os.getcwd().encode() in util.cpio.CpioRead
Using os.getcwd() in combination with a subsequent .encode() is error
prone:

marcus@linux:~> mkdir illegal_utf-8_encoding_$'\xff'_dir
marcus@linux:~> cd illegal_utf-8_encoding_$'\xff'_dir/
marcus@linux:~/illegal_utf-8_encoding_ÿ_dir> python3
Python 3.8.6 (default, Nov 09 2020, 12:09:06) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd().encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 36: surrogates not allowed
>>>

Hence, use os.getcwdb(), which returns a bytes, instead of
os.getcwd().encode().

Fixes: commit 36f7b8ffe9 ("Fix a
potential TypeError in CpioRead.copyin and CpioRead.copyin_file")
2020-11-22 17:39:54 +01:00
Marcus Huewe
674ea78815 Avoid a potential TypeError in util.ArFile.saveTo
If no dir is passed to util.ArFile.saveTo, dir is set to os.getcwd(),
which returns a str. Since self.name is a bytes, the subsequent
os.path.join(dir, self.name) results in a TypeError.
To fix this, use os.getcwdb(), which returns a bytes instead of a
str.
2020-11-22 17:36:17 +01:00
Marcus Huewe
36f7b8ffe9 Fix a potential TypeError in CpioRead.copyin and CpioRead.copyin_file
If no "dest" argument is specified when calling CpioRead.copyin or
CpioRead.copyin_file, a TypeError occurs in CpioRead._copyin_file
because os.getcwd(), which returns a str, is used as dest and, hence,
the subsequent os.path.join(...) fails (because it tries to join a
str and a bytes).
In order to avoid this, encode the result of os.getcwd().

Note that the existing

archive.copyin_file(hdr.filename,
                    os.path.dirname(tmpfile),
                    os.path.basename(tmpfile))

was OK because CpioRead._copyin_file os.path.join()s "dest" and
"new_fn", which are both str. It is just changed to stress that
CpioRead is a bytes-only API.

Fixes: #865 ("Traceback in osc/util/cpio.py line 128: TypeError:
Can't mix strings and bytes in path components")
2020-11-20 09:55:09 +01:00
Marcus Huewe
d85030b72d Fix python2 regression in util.helper.decode_it
In commit 276d6e2439 ("Do not use the
chardet module in util.helper.decode_it") util.helper.decode_it was
changed to always decode the passed object if it has a decode method.
Since a python2 str has a decode method, the new code tries to utf-8
decode the passed str. As a result, a unicode object is returned (if
the decoding worked). Since a unicode object is not an instance of
type str, all subsequent isinstance(decoded_obj, str) checks evaluate
to False, which break some codepaths.
In order to fix this, restore the old python2 behavior (that is, if
the passed object is a str, it is not decode it). This change does not
affect the python3 codepaths.

Fixes: #814 ("osc log | fails")
2020-06-25 15:38:14 +02:00
Marcus Huewe
276d6e2439 Do not use the chardet module in util.helper.decode_it
In general, decode_it is used to get a str from an arbitrary bytes
instance. For this, decode_it used the chardet module (if present)
to detect the underlying encoding (if the bytes instance corresponds
to a "supported" encoding). The drawback of this detection is that
it can take quite some time in case of a large bytes instance, which
represents no "supported" encoding (see #669 and #746).
Instead of doing a potentially "time consuming" detection, either
assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just
not worth the effort to detect a _potential_ encoding because we have
no clue what the _correct_ encoding is. For instance, consider the
following bytes instance:

b'This character group is not supported: [abc\xc3\xbf]'

It represents a valid utf-8 and latin-1 encoding. What is the "correct"
one? We don't know... Even if you interpret the bytes instance as a
human you cannot give a definite answer (implicit assumption: there is
no additional context available).
That is, if we cannot give a definite answer in case of two potential
encodings, there is no point in bringing even more potential encodings
into play. Hence, do not use the chardet module.

Note: the rationale for trying utf-8 first is that utf-8 is pretty
much in vogue these days and, hence, the chances are "high" that we
guess the "correct" encoding.

Fixes: #669 ("check in huge shell archives is insanely slow")
Fixes: #746 ("Very slow local buildlog parsing")
2020-06-04 13:12:22 +02:00
Adam Williamson
13a13a87c4 Fix ElementTree imports for Python 3.9
Importing `cElementTree` has been deprecated since Python 3.3 -
importing `ElementTree` automatically uses the fastest
implementation available - and is finally removed in Python 3.9.
Importing cElementTree directly (not as part of xml) is an even
older relic, it's for Ye Time Before ElementTree Was Added To
Python and it was instead an external module...which was before
Python 2.5.

We still need to work with Python 2.7 for now, so we use a try/
except to handle both 2.7 and 3.9 cases. Also, let's not repeat
this import 12 times in one file for some reason.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-06-02 15:13:10 -07:00
Marcus Huewe
55aef1a014 Convert repodata.RepoDataQueryResult to a bytes API
The repodata.RepoDataQueryResult is supposed to be a bytes API and
that's what our users (see build module) expect.
Note that the repodata.RepoDataQueryResult.path method still returns
a str. That's what the rpmquery.RpmQuery, debquery.DebQuery, and
archquery.ArchQuery classes also do (if the "path" was initially
passed as a str).

Fixes: #760 ("osc build fails when called with --prefer-pkgs where the
       passed directory is a repodata repository or a subdirectory of one")
2020-03-15 18:30:28 +01:00
Marcus Huewe
cd51f47a77 Return bytes in packagequery.PackageQueryResult.evr() instead of a str
The packagequery.PackageQueryResult class is supposed to provide a
bytes API. Hence, packagequery.PackageQueryResult.evr() should return
bytes instead of a str. Also, adjust the single caller in the build
module.
2020-03-15 18:30:00 +01:00
Marcus Huewe
33bbc57b5f Fix the previously introduced escaping via the html module
This is a follow-up commit for commit
6dbf103e10 ("Use html.escape instead
removed cgi.escape"), which breaks the python2 backward compatibility
(since the "html" module is not available by default) and also breaks
the code in general (due to missing html imports).

The fix is based on the proposed fix in [1].

Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace")

[1] https://github.com/openSUSE/osc/pull/764
2020-03-12 23:00:47 +01:00
Marcus Huewe
4e8e0492e8 Fix arch zst magic in util.packagequery
The correct zst magic is b'(\xb5/\xfd' (4 bytes) (that's what obs-build
is also using).

Kudos to Tobias Ellinghaus for spotting this.

Fixes: #756 ("zst detection fails")
2020-02-26 20:04:26 +01:00
lethliel
95c68dc3f0 import oscerr in helper.py 2020-02-20 08:45:02 +01:00
5f2721d8f6 - support zstd arch linux files in local build
Note: This requires a tar executable supporting zstd
2020-01-09 15:49:54 +01:00
lethliel
c9d85ac248 move raw_input function to helper module 2019-08-27 15:17:53 +02:00
Marcus Huewe
e5c4a10673 Merge branch 'dont_decode_None' of https://github.com/lethliel/osc
Do not try to decode None in decode_it (in this case None is returned).
2019-07-26 14:35:03 +02:00
lethliel
a802df15ad return the obj if None type is passed to decode_it
If a obj of type None is passed to decode_it just
return it and do not try to decode it as this will fail
2019-07-26 14:22:26 +02:00
lethliel
2aa6e998d2 fix and unify building of local package cache
* all filename functions now return bytes-like objects
* the caller does the decoding
* the caller in build.py passes encoded arguments
2019-07-26 13:38:45 +02:00
lethliel
5841bf759f add exception if encoding fails and try ISO-8859-1
In some rare cases the chardet encoding detection detects
a wrong encoding standard. Then we switch to latin-1 which
covers most if utf-8 does not work.
2019-04-16 14:40:13 +02:00
Marco Strigl
71770555ac
Merge pull request #526 from lethliel/python3_fix_debquery_decoding
[python3] fix decoding issue in debquery.py
2019-04-15 15:10:21 +02:00
Marco Strigl
6c074fce20
Merge pull request #524 from lethliel/python3_packagequery_fix_decoding
[python3] fix decoding for packageqeury.py
2019-04-15 15:09:59 +02:00
Marco Strigl
41ced89fcc
Merge pull request #522 from lethliel/python3_repodata_module
[python3] fix epoch encoding in repodata.py
2019-04-15 15:07:50 +02:00
Marco Strigl
0086bcfa64
Merge pull request #483 from lethliel/python3_rpmquery_module
[python3] rpmquery.py now python3 ready
2019-04-15 15:02:59 +02:00
Marco Strigl
d5108c7536
Merge pull request #464 from lethliel/python3_utils_helper
[python3] add helper functions for python3 support
2019-04-15 15:00:58 +02:00
lethliel
60c6ec2b52 [python3] fix decoding issue in debquery.py
name, version, release and arch are strings, not bytes
2019-04-07 14:44:25 -05:00
lethliel
c235148180 [python3] now python3 ready:
* new function cmp (not available in python3)
  * fix decoding in canonname function
2019-04-07 11:05:13 -05:00
lethliel
c6d3870942 [python3] fix decoding for packageqeury.py
name, arch, version and release need to be decoded
2019-04-07 10:31:23 -05:00
lethliel
87628a4150 [python3] fix epoch encoding in repodata.py
other.epoch() needs to be encoded to work with the vercmp callers.
2019-04-07 10:19:47 -05:00
Marcus Huewe
c534d7e990 Fix logic error in DebQuery.vercmp
res is never None, because DebQuery.rpmvercmp always returns -1, 0,
or 1.
2019-01-27 19:43:38 +01:00
Marcus Huewe
cd5f46984d Port debquery module to python3
No functional changes. Note that we cannot simply decode the control's
fields as ascii/utf-8 because a field is not necessarily a valid
ascii/utf-8 encoding (it is possible to register _arbitrary_ custom
fields via a 'register-custom-fields' hook when building a deb
package).

Note: DebQuery.debvercmp really deserves a cleanup:/
2019-01-27 19:31:47 +01:00
Marcus Huewe
bb9f9a7fde Refactor DebQuery.__parse_control a bit
No functional changes. This just simplifies the upcoming python3
port a bit.
2019-01-27 17:35:47 +01:00
Marcus Huewe
f63a0957af Remove superfluous try-except block in the archquery module
ArchQuery.query never raises an ArchError exception.
2019-01-27 16:51:58 +01:00
Marcus Huewe
2074a1c01d Make ArchQuery.canonname more robust against None values
Use ArchQuery.filename to construct the filename and raise an
ArchError exception if we are unable to construct a filename.
2019-01-27 16:46:52 +01:00
Marcus Huewe
8c1cb190bd Port the missing pieces of the archquery module to python3
This is a follow-up commit for commit
21eca9e3f1 ("[python3] switch
ArchQuery to bytestrings").
2019-01-27 16:27:30 +01:00
Marcus Huewe
2d0c974296 Add cmp function to packagequery module
cmp(a, b) returns
-1 if a < b
 0 if a == 0
 1 if a > b

This is needed since python3 has no cmp function anymore.

All credits for this go to Marco Strigl <mstrigl@suse.com> (see
PR#483 [1]).

[1] https://github.com/openSUSE/osc/pull/483
2019-01-27 16:12:57 +01:00
Marcus Huewe
a3720c5286 Fix ArchQuery.rpmvercmp if one of its arguments is None
The None argument is always <= than the other argument. We need this
in case of a broken/pathological package where version() or release()
return None (see vercmp (which calls rpmvercmp)).
2019-01-27 15:50:35 +01:00
Marcus Huewe
5c639db805 ArchQuery.epoch should never return None
Returning None breaks ArchQuery.vercmp. Returning b'0' is ok because
an epoch, if present, is always supposed to be an integer (at least
in a "valid" arch package (see scripts/libmakepkg/lint_pkgbuild/epoch.sh.in
in the pacman sources)). Hence, if we compare the epoch of a package,
which has no explicit epoch set, with the epoch of a package, which
has an explicit epoch set, we always have a <= relation.
2019-01-27 15:39:07 +01:00
Marcus Huewe
deee8ef6cb Fix logic error in ArchQuery.vercmp
res is never None, because ArchQuery.rpmvercmp always returns -1, 0,
or 1.
2019-01-27 15:00:36 +01:00
Marcus Huewe
562374f045 Simplify ArchQuery.read a bit
No functional changes - just to improve readability.
2019-01-27 14:57:47 +01:00
Marcus Huewe
e580769757 Merge branch 'python3_archquery_module' of https://github.com/lethliel/osc
Initial port of the archquery module to python3 (ArchQuery.__init__,
ArchQuery.read, and ArchQuery.canonname are ported - the rest is missing).
2019-01-27 14:55:01 +01:00
lethliel
21eca9e3f1 [python3] switch ArchQuery to bytestrings
decode explicit (ascii)
2019-01-23 22:59:55 +01:00
Marco Strigl
f233066448
Merge pull request #482 from lethliel/python3_packagequery_module
[python3] magic is now a bytestring in python3
2019-01-18 14:34:43 +01:00
Marcus Huewe
e60af6f120 Use with statement in CpioRead._copyin_file
This makes sure that the file is closed in case of an exception.
2019-01-15 20:49:26 +01:00
Marcus Huewe
5387744d36 Port CpioWrite to python3
Now, CpioWrite provides a bytes-only API. It would be also possible
that the API accepts bytes and str (we would need to explicitly
encode the latter) but this would be a bit inconsistent wrt.
cpio.CpioRead (which is bytes-only).
Also, by using a bytesarray instead of a [] we avoid several
intermediate ''.join(...)s.
2019-01-15 20:48:42 +01:00
Marcus Huewe
3e326b1bb4 Port CpioRead and CpioHdr to python3
This is a bytes only API because a filename in a cpio archive can
contain, for instance, illegal utf-8 sequences. A user can decode
the filename/content as she wishes.
2019-01-15 20:05:47 +01:00
Marcus Huewe
54ac438eb0 Do not mmap a cpio archive
There is simply no need for a mmap.
2019-01-15 19:47:27 +01:00
Marcus Huewe
1c4385a579 Run a small demo when the cpio module is invoked as a script
It just reads in a cpio archive and print the headers.
2019-01-15 19:46:00 +01:00
Marcus Huewe
5c19425c9b Use with statement in ArFile.saveTo
This makes sure that the file is closed in case of an exception.
2019-01-15 17:18:50 +01:00
Marcus Huewe
b26a4a967d Raise a ValueError if neither fn nor fh is passed to Ar.__init__
A ValueError is more appropriate because there is no issue with the
ar archive itself. Also, the old codepath never worked because the
fn parameter was missing.
2019-01-15 17:18:50 +01:00
Marcus Huewe
6fdce86fc9 Port the ar module to python3
Since an ar archive can contain arbitary filenames (that is a
filename can be an invalid utf-8 encoding (for instance,
"foo\xff\xffbar")), the ar module provides a bytes only API. A
user can decode filenames as she wishes.
Note: if a "fn" parameter is passed to Ar.__init__ it should be a
bytes (a str is also ok, but then be aware that an ArError's file
attribute might be a str or a bytes).
2019-01-15 17:18:37 +01:00
Marcus Huewe
68cf974c78 Do not mmap the ar archive
There is really no need for a mmap here. Also, the comment in the
docstr does not apply/is nonsense (there is no performance gain).
2019-01-15 17:18:19 +01:00
Marcus Huewe
e12181b11d An ext fn header in an ar file has no mode
Use a dummy mode of 0 in this case (internally, the mode is never
used).
2019-01-15 17:18:19 +01:00