github.com_openSUSE_osc

mirror of https://github.com/openSUSE/osc.git synced 2024-11-11 07:06:16 +01:00

Author	SHA1	Message	Date
Marcus Huewe	d85030b72d	Fix python2 regression in util.helper.decode_it In commit `276d6e2439` ("Do not use the chardet module in util.helper.decode_it") util.helper.decode_it was changed to always decode the passed object if it has a decode method. Since a python2 str has a decode method, the new code tries to utf-8 decode the passed str. As a result, a unicode object is returned (if the decoding worked). Since a unicode object is not an instance of type str, all subsequent isinstance(decoded_obj, str) checks evaluate to False, which break some codepaths. In order to fix this, restore the old python2 behavior (that is, if the passed object is a str, it is not decode it). This change does not affect the python3 codepaths. Fixes: #814 ("osc log \| fails")	2020-06-25 15:38:14 +02:00
Marcus Huewe	276d6e2439	Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing")	2020-06-04 13:12:22 +02:00
Marcus Huewe	33bbc57b5f	Fix the previously introduced escaping via the html module This is a follow-up commit for commit `6dbf103e10` ("Use html.escape instead removed cgi.escape"), which breaks the python2 backward compatibility (since the "html" module is not available by default) and also breaks the code in general (due to missing html imports). The fix is based on the proposed fix in [1]. Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace") [1] https://github.com/openSUSE/osc/pull/764	2020-03-12 23:00:47 +01:00
lethliel	95c68dc3f0	import oscerr in helper.py	2020-02-20 08:45:02 +01:00
lethliel	c9d85ac248	move raw_input function to helper module	2019-08-27 15:17:53 +02:00
lethliel	a802df15ad	return the obj if None type is passed to decode_it If a obj of type None is passed to decode_it just return it and do not try to decode it as this will fail	2019-07-26 14:22:26 +02:00
lethliel	5841bf759f	add exception if encoding fails and try ISO-8859-1 In some rare cases the chardet encoding detection detects a wrong encoding standard. Then we switch to latin-1 which covers most if utf-8 does not work.	2019-04-16 14:40:13 +02:00
lethliel	4b29e1c543	add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding.	2018-11-08 09:55:07 +01:00

8 Commits