github.com_openSUSE_osc/osc/util/helper.py

# Copyright (C) 2018 SUSE Linux.  All rights reserved.
# This program is free software; it may be used, copied, modified
# and distributed under the terms of the GNU General Public Licence,
# either version 2, or (at your option) any later version.

try:
    import html
except ImportError:
    import cgi as html

from osc import oscerr

def cmp_to_key(mycmp):
    """ Converts a cmp= function into a key= function.
    """

    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj

        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0

        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0

        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0

        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0

        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0

        def __ne__(self, other):
            return mycmp(self.obj, other.obj) != 0

        def __hash__(self):
            raise TypeError('hash not implemented')

    return K


def decode_list(ilist):
    """ Decodes the elements of a list if needed
    """

    dlist = []
    for elem in ilist:
        if not isinstance(elem, str):
            dlist.append(decode_it(elem))
        else:
            dlist.append(elem)
    return dlist


def decode_it(obj):
    """Decode the given object.

    If the given object has no decode method, the object itself is
    returned. Otherwise, try to decode the object using utf-8. If this
    fails due to a UnicodeDecodeError, try to decode the object using
    latin-1.
    """
    if not hasattr(obj, 'decode'):
        return obj
    try:
        return obj.decode('utf-8')
    except UnicodeDecodeError:
        return obj.decode('latin-1')


def raw_input(*args):
    try:
        import builtins
        func = builtins.input
    except ImportError:
        #python 2.7
        import __builtin__
        func = __builtin__.raw_input

    try:
        return func(*args)
    except EOFError:
        # interpret ctrl-d as user abort
        raise oscerr.UserAbort()


def _html_escape(data):
    return html.escape(data, quote=False)
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00			`# Copyright (C) 2018 SUSE Linux. All rights reserved.`
			`# This program is free software; it may be used, copied, modified`
			`# and distributed under the terms of the GNU General Public Licence,`
			`# either version 2, or (at your option) any later version.`

Fix the previously introduced escaping via the html module This is a follow-up commit for commit 6dbf103e1030494381c2fbb384f9648a78b68ce6 ("Use html.escape instead removed cgi.escape"), which breaks the python2 backward compatibility (since the "html" module is not available by default) and also breaks the code in general (due to missing html imports). The fix is based on the proposed fix in [1]. Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace") [1] https://github.com/openSUSE/osc/pull/764 2020-03-12 23:00:47 +01:00			`try:`
			`import html`
			`except ImportError:`
			`import cgi as html`

import oscerr in helper.py 2020-02-20 08:45:02 +01:00			`from osc import oscerr`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00
			`def cmp_to_key(mycmp):`
			`""" Converts a cmp= function into a key= function.`
			`"""`

			`class K(object):`
			`def __init__(self, obj, *args):`
			`self.obj = obj`

			`def __lt__(self, other):`
			`return mycmp(self.obj, other.obj) < 0`

			`def __gt__(self, other):`
			`return mycmp(self.obj, other.obj) > 0`

			`def __eq__(self, other):`
			`return mycmp(self.obj, other.obj) == 0`

			`def __le__(self, other):`
			`return mycmp(self.obj, other.obj) <= 0`

			`def __ge__(self, other):`
			`return mycmp(self.obj, other.obj) >= 0`

			`def __ne__(self, other):`
			`return mycmp(self.obj, other.obj) != 0`

			`def __hash__(self):`
			`raise TypeError('hash not implemented')`

			`return K`


			`def decode_list(ilist):`
			`""" Decodes the elements of a list if needed`
			`"""`

			`dlist = []`
			`for elem in ilist:`
			`if not isinstance(elem, str):`
			`dlist.append(decode_it(elem))`
			`else:`
			`dlist.append(elem)`
			`return dlist`


			`def decode_it(obj):`
Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing") 2020-06-04 13:12:22 +02:00			`"""Decode the given object.`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00
Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing") 2020-06-04 13:12:22 +02:00			`If the given object has no decode method, the object itself is`
			`returned. Otherwise, try to decode the object using utf-8. If this`
			`fails due to a UnicodeDecodeError, try to decode the object using`
			`latin-1.`
			`"""`
			`if not hasattr(obj, 'decode'):`
add helper functions for python3 support This functions are used in the whole code and are mandatory for the python3 support to work. In python2 case nothing is touched. * cmp_to_key: converts a cmp= into a key= function * decode_list: decodes each element of a list. This is needed if we have a mixed list with strings and bytes. * decode_it: Takes the input and checks if it is not a string. Then it uses chardet to get the encoding. 2018-11-07 15:03:43 +01:00			`return obj`
Do not use the chardet module in util.helper.decode_it In general, decode_it is used to get a str from an arbitrary bytes instance. For this, decode_it used the chardet module (if present) to detect the underlying encoding (if the bytes instance corresponds to a "supported" encoding). The drawback of this detection is that it can take quite some time in case of a large bytes instance, which represents no "supported" encoding (see #669 and #746). Instead of doing a potentially "time consuming" detection, either assume an utf-8 encoding or a latin-1 encoding. Rationale: it is just not worth the effort to detect a _potential_ encoding because we have no clue what the _correct_ encoding is. For instance, consider the following bytes instance: b'This character group is not supported: [abc\xc3\xbf]' It represents a valid utf-8 and latin-1 encoding. What is the "correct" one? We don't know... Even if you interpret the bytes instance as a human you cannot give a definite answer (implicit assumption: there is no additional context available). That is, if we cannot give a definite answer in case of two potential encodings, there is no point in bringing even more potential encodings into play. Hence, do not use the chardet module. Note: the rationale for trying utf-8 first is that utf-8 is pretty much in vogue these days and, hence, the chances are "high" that we guess the "correct" encoding. Fixes: #669 ("check in huge shell archives is insanely slow") Fixes: #746 ("Very slow local buildlog parsing") 2020-06-04 13:12:22 +02:00			`try:`
			`return obj.decode('utf-8')`
			`except UnicodeDecodeError:`
			`return obj.decode('latin-1')`
move raw_input function to helper module 2019-08-27 15:07:41 +02:00

			`def raw_input(*args):`
			`try:`
			`import builtins`
			`func = builtins.input`
			`except ImportError:`
			`#python 2.7`
			`import __builtin__`
			`func = __builtin__.raw_input`

			`try:`
			`return func(*args)`
			`except EOFError:`
			`# interpret ctrl-d as user abort`
			`raise oscerr.UserAbort()`
Fix the previously introduced escaping via the html module This is a follow-up commit for commit 6dbf103e1030494381c2fbb384f9648a78b68ce6 ("Use html.escape instead removed cgi.escape"), which breaks the python2 backward compatibility (since the "html" module is not available by default) and also breaks the code in general (due to missing html imports). The fix is based on the proposed fix in [1]. Fixes: boo#1166537 ("osc rq accept - forwarding request causes backtrace") [1] https://github.com/openSUSE/osc/pull/764 2020-03-12 23:00:47 +01:00

			`def _html_escape(data):`
			`return html.escape(data, quote=False)`