1
0
mirror of https://github.com/openSUSE/osc.git synced 2024-12-28 02:36:15 +01:00
Commit Graph

8 Commits

Author SHA1 Message Date
Marcus Huewe
d85030b72d Fix python2 regression in util.helper.decode_it
In commit 276d6e2439 ("Do not use the
chardet module in util.helper.decode_it") util.helper.decode_it was
changed to always decode the passed object if it has a decode method.
Since a python2 str has a decode method, the new code tries to utf-8
decode the passed str. As a result, a unicode object is returned (if
the decoding worked). Since a unicode object is not an instance of
type str, all subsequent isinstance(decoded_obj, str) checks evaluate
to False, which break some codepaths.
In order to fix this, restore the old python2 behavior (that is, if
the passed object is a str, it is not decode it). This change does not
affect the python3 codepaths.

Fixes: #814 ("osc log | fails")
2020-06-25 15:38:14 +02:00
Marcus Huewe
276d6e2439 Do not use the chardet module in util.helper.decode_it
In general, decode_it is used to get a str from an arbitrary bytes
instance. For this, decode_it used the chardet module (if present)
to detect the underlying encoding (if the bytes instance corresponds
to a "supported" encoding). The drawback of this detection is that
it can take quite some time in case of a large bytes instance, which
represents no "supported" encoding (see #669 and #746).
Instead of doing a potentially "time consuming" detection, either
assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just
not worth the effort to detect a _potential_ encoding because we have
no clue what the _correct_ encoding is. For instance, consider the
following bytes instance:

b'This character group is not supported: [abc\xc3\xbf]'

It represents a valid utf-8 and latin-1 encoding. What is the "correct"
one? We don't know... Even if you interpret the bytes instance as a
human you cannot give a definite answer (implicit assumption: there is
no additional context available).
That is, if we cannot give a definite answer in case of two potential
encodings, there is no point in bringing even more potential encodings
into play. Hence, do not use the chardet module.

Note: the rationale for trying utf-8 first is that utf-8 is pretty
much in vogue these days and, hence, the chances are "high" that we
guess the "correct" encoding.

Fixes: #669 ("check in huge shell archives is insanely slow")
Fixes: #746 ("Very slow local buildlog parsing")
2020-06-04 13:12:22 +02:00
Marcus Huewe
33bbc57b5f Fix the previously introduced escaping via the html module
This is a follow-up commit for commit
6dbf103e10 ("Use html.escape instead
removed cgi.escape"), which breaks the python2 backward compatibility
(since the "html" module is not available by default) and also breaks
the code in general (due to missing html imports).

The fix is based on the proposed fix in [1].

Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace")

[1] https://github.com/openSUSE/osc/pull/764
2020-03-12 23:00:47 +01:00
lethliel
95c68dc3f0 import oscerr in helper.py 2020-02-20 08:45:02 +01:00
lethliel
c9d85ac248 move raw_input function to helper module 2019-08-27 15:17:53 +02:00
lethliel
a802df15ad return the obj if None type is passed to decode_it
If a obj of type None is passed to decode_it just
return it and do not try to decode it as this will fail
2019-07-26 14:22:26 +02:00
lethliel
5841bf759f add exception if encoding fails and try ISO-8859-1
In some rare cases the chardet encoding detection detects
a wrong encoding standard. Then we switch to latin-1 which
covers most if utf-8 does not work.
2019-04-16 14:40:13 +02:00
lethliel
4b29e1c543 add helper functions for python3 support
This functions are used in the whole code and are
mandatory for the python3 support to work. In python2
case nothing is touched.

* cmp_to_key:
  converts a cmp= into a key= function

* decode_list:
  decodes each element of a list. This is needed if
  we have a mixed list with strings and bytes.

* decode_it:
  Takes the input and checks if it is not a string.
  Then it uses chardet to get the encoding.
2018-11-08 09:55:07 +01:00