There seem to be a bug in how GitHub generates archives.
"Format:" and "$" characters get removed from the version string,
setting it to:
version = "%(describe:tags=true)"
Current OBS is delivering hdrmd5 in buildinfo. It turns out
that osc has already code for validating cached files, but it
invalidates all local files atm with python 3.x
Using os.getcwd() in combination with a subsequent .encode() is error
prone:
marcus@linux:~> mkdir illegal_utf-8_encoding_$'\xff'_dir
marcus@linux:~> cd illegal_utf-8_encoding_$'\xff'_dir/
marcus@linux:~/illegal_utf-8_encoding_ÿ_dir> python3
Python 3.8.6 (default, Nov 09 2020, 12:09:06) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd().encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 36: surrogates not allowed
>>>
Hence, use os.getcwdb(), which returns a bytes, instead of
os.getcwd().encode().
Fixes: commit 36f7b8ffe9 ("Fix a
potential TypeError in CpioRead.copyin and CpioRead.copyin_file")
If no dir is passed to util.ArFile.saveTo, dir is set to os.getcwd(),
which returns a str. Since self.name is a bytes, the subsequent
os.path.join(dir, self.name) results in a TypeError.
To fix this, use os.getcwdb(), which returns a bytes instead of a
str.
If no "dest" argument is specified when calling CpioRead.copyin or
CpioRead.copyin_file, a TypeError occurs in CpioRead._copyin_file
because os.getcwd(), which returns a str, is used as dest and, hence,
the subsequent os.path.join(...) fails (because it tries to join a
str and a bytes).
In order to avoid this, encode the result of os.getcwd().
Note that the existing
archive.copyin_file(hdr.filename,
os.path.dirname(tmpfile),
os.path.basename(tmpfile))
was OK because CpioRead._copyin_file os.path.join()s "dest" and
"new_fn", which are both str. It is just changed to stress that
CpioRead is a bytes-only API.
Fixes: #865 ("Traceback in osc/util/cpio.py line 128: TypeError:
Can't mix strings and bytes in path components")
In commit 276d6e2439 ("Do not use the
chardet module in util.helper.decode_it") util.helper.decode_it was
changed to always decode the passed object if it has a decode method.
Since a python2 str has a decode method, the new code tries to utf-8
decode the passed str. As a result, a unicode object is returned (if
the decoding worked). Since a unicode object is not an instance of
type str, all subsequent isinstance(decoded_obj, str) checks evaluate
to False, which break some codepaths.
In order to fix this, restore the old python2 behavior (that is, if
the passed object is a str, it is not decode it). This change does not
affect the python3 codepaths.
Fixes: #814 ("osc log | fails")