github.com_openSUSE_osc/osc/util/helper.py

# Copyright (C) 2018 SUSE Linux.  All rights reserved.
# This program is free software; it may be used, copied, modified
# and distributed under the terms of the GNU General Public Licence,
# either version 2, or (at your option) any later version.


import html

from .. import oscerr


def decode_list(ilist):
    """ Decodes the elements of a list if needed
    """

    dlist = []
    for elem in ilist:
        if not isinstance(elem, str):
            dlist.append(decode_it(elem))
        else:
            dlist.append(elem)
    return dlist


def decode_it(obj):
    """Decode the given object unless it is a str.

    If the given object is a str or has no decode method, the object itself is
    returned. Otherwise, try to decode the object using utf-8. If this
    fails due to a UnicodeDecodeError, try to decode the object using
    latin-1.
    """
    if isinstance(obj, str) or not hasattr(obj, 'decode'):
        return obj
    try:
        return obj.decode('utf-8')
    except UnicodeDecodeError:
        return obj.decode('latin-1')


def raw_input(*args):
    import builtins
    func = builtins.input

    try:
        return func(*args)
    except EOFError:
        # interpret ctrl-d as user abort
        raise oscerr.UserAbort()


def _html_escape(data):
    return html.escape(data, quote=False)
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00			`# Copyright (C) 2018 SUSE Linux. All rights reserved.`
			`# This program is free software; it may be used, copied, modified`
			`# and distributed under the terms of the GNU General Public Licence,`
			`# either version 2, or (at your option) any later version.`

Fix the previously introduced escaping via the html module This is a follow-up commit for commit 6dbf103e1030494381c2fbb384f9648a78b68ce6 ("Use html.escape instead removed cgi.escape"), which breaks the python2 backward compatibility (since the "html" module is not available by default) and also breaks the code in general (due to missing html imports). The fix is based on the proposed fix in [1]. Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace") [1] https://github.com/openSUSE/osc/pull/764 2020-03-12 23:00:47 +01:00
Clean imports up, drop python 2 fallbacks 2022-07-28 12:28:33 +02:00			`import html`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00
Clean imports up, drop python 2 fallbacks 2022-07-28 12:28:33 +02:00			`from .. import oscerr`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00

			`def decode_list(ilist):`
			`""" Decodes the elements of a list if needed`
			`"""`

			`dlist = []`
			`for elem in ilist:`
			`if not isinstance(elem, str):`
			`dlist.append(decode_it(elem))`
			`else:`
			`dlist.append(elem)`
			`return dlist`


			`def decode_it(obj):`
Fix python2 regression in util.helper.decode_it In commit 276d6e2439c8c53c182dbe785b038919e64da9f3 ("Do not use the chardet module in util.helper.decode_it") util.helper.decode_it was changed to always decode the passed object if it has a decode method. Since a python2 str has a decode method, the new code tries to utf-8 decode the passed str. As a result, a unicode object is returned (if the decoding worked). Since a unicode object is not an instance of type str, all subsequent isinstance(decoded_obj, str) checks evaluate to False, which break some codepaths. In order to fix this, restore the old python2 behavior (that is, if the passed object is a str, it is not decode it). This change does not affect the python3 codepaths. Fixes: #814 ("osc log \| fails") 2020-06-25 15:38:14 +02:00			`"""Decode the given object unless it is a str.`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00
Fix python2 regression in util.helper.decode_it In commit 276d6e2439c8c53c182dbe785b038919e64da9f3 ("Do not use the chardet module in util.helper.decode_it") util.helper.decode_it was changed to always decode the passed object if it has a decode method. Since a python2 str has a decode method, the new code tries to utf-8 decode the passed str. As a result, a unicode object is returned (if the decoding worked). Since a unicode object is not an instance of type str, all subsequent isinstance(decoded_obj, str) checks evaluate to False, which break some codepaths. In order to fix this, restore the old python2 behavior (that is, if the passed object is a str, it is not decode it). This change does not affect the python3 codepaths. Fixes: #814 ("osc log \| fails") 2020-06-25 15:38:14 +02:00			`If the given object is a str or has no decode method, the object itself is`
Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing") 2020-06-04 13:12:22 +02:00			`returned. Otherwise, try to decode the object using utf-8. If this`
			`fails due to a UnicodeDecodeError, try to decode the object using`
			`latin-1.`
			`"""`
Fix python2 regression in util.helper.decode_it In commit 276d6e2439c8c53c182dbe785b038919e64da9f3 ("Do not use the chardet module in util.helper.decode_it") util.helper.decode_it was changed to always decode the passed object if it has a decode method. Since a python2 str has a decode method, the new code tries to utf-8 decode the passed str. As a result, a unicode object is returned (if the decoding worked). Since a unicode object is not an instance of type str, all subsequent isinstance(decoded_obj, str) checks evaluate to False, which break some codepaths. In order to fix this, restore the old python2 behavior (that is, if the passed object is a str, it is not decode it). This change does not affect the python3 codepaths. Fixes: #814 ("osc log \| fails") 2020-06-25 15:38:14 +02:00			`if isinstance(obj, str) or not hasattr(obj, 'decode'):`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00			`return obj`
Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing") 2020-06-04 13:12:22 +02:00			`try:`
			`return obj.decode('utf-8')`
			`except UnicodeDecodeError:`
			`return obj.decode('latin-1')`
move raw_input function to helper module 2019-08-27 15:07:41 +02:00

			`def raw_input(*args):`
Clean imports up, drop python 2 fallbacks 2022-07-28 12:28:33 +02:00			`import builtins`
			`func = builtins.input`
move raw_input function to helper module 2019-08-27 15:07:41 +02:00
			`try:`
			`return func(*args)`
			`except EOFError:`
			`# interpret ctrl-d as user abort`
			`raise oscerr.UserAbort()`
Fix the previously introduced escaping via the html module This is a follow-up commit for commit 6dbf103e1030494381c2fbb384f9648a78b68ce6 ("Use html.escape instead removed cgi.escape"), which breaks the python2 backward compatibility (since the "html" module is not available by default) and also breaks the code in general (due to missing html imports). The fix is based on the proposed fix in [1]. Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace") [1] https://github.com/openSUSE/osc/pull/764 2020-03-12 23:00:47 +01:00

			`def _html_escape(data):`
			`return html.escape(data, quote=False)`