42 Commits

Author SHA1 Message Date
94fdb3b442 Revert "Remove devel branch import"
This reverts commit 1318f9e0c4.
2024-08-11 22:54:47 +02:00
f2358446da don't create gitea repos with importer 2024-08-11 22:54:44 +02:00
9660e633af Parametrizes git import parameters 2024-08-08 17:56:41 +02:00
85b9ed5e75 disable LFS check for testing 2024-08-08 17:56:17 +02:00
86f82325d8 Stop importing/exporting scmsync packages/projects
Also, allow other-than Factory projects
2024-08-08 10:35:53 +02:00
39ba616226 Merge pull request 'Add ability to specify non-Factory' (#31) from adamm/git-importer:option_for_non_factory into main
Reviewed-on: #31
Reviewed-by: Dirk Mueller <dirkmueller@noreply@src.opensuse.org>
2024-08-07 18:27:11 +02:00
531dbc7c1b Add ability to specify non-Factory
This is important for devel-project only imports
non-factory is still blocked by assert
2024-08-07 16:55:05 +02:00
1318f9e0c4 Remove devel branch import
this for yet undefined reason screws up systemd history import
2024-08-07 09:47:54 +02:00
d563076d9e add explicit conversion to string to fix the concatenation 2024-08-07 09:47:18 +02:00
b11b3f1adb Add and remove literal files
pathspec in git has special characters that we should not trigger.
Assume every filespec as literal
2024-08-01 16:53:46 +02:00
479738d4b2 ruff format run 2024-07-10 10:34:20 +02:00
2d04136ca5 Make sure we create devel branch, when no diff to Factory 2024-06-13 15:36:59 +02:00
40ad64ddff Ignore .osc directory 2024-06-10 18:13:51 +02:00
6bd5d72100 New branch is empty
New branches must be born empty
2024-06-10 17:06:15 +02:00
022ae5ab58 remember failed tasks in a separate directory 2024-06-10 17:04:43 +02:00
2ff8ed76d0 Reconnect to the AMQP bus when the connection breaks down 2024-06-10 17:04:25 +02:00
5f228dc046 enable robust push 2024-05-17 21:47:35 +02:00
4e07d8272e don't loop over failed packages 2024-05-17 21:47:15 +02:00
2a3475ab6e Create with sha256 enabled 2024-05-17 20:39:55 +02:00
574bc9aa10 Avoid guessing in switch 2024-05-17 20:07:16 +02:00
0414b33206 Fix testing for origin
The previous code path was untested and not working
2024-05-17 20:06:25 +02:00
b9670821a9 Only init the repository if it doesn't exist already
harmless, but avoids a scary warning
2024-05-17 20:05:54 +02:00
073550825c Fixups to improve the conversion process 2024-05-17 14:41:42 +02:00
5a353c98d3 Add tasks 2024-05-17 11:46:18 +02:00
1fc466d15b Add monitor for commits 2024-05-17 11:40:19 +02:00
39fde7744a Code cleanup 2024-05-16 15:47:45 +02:00
f5ffc83a69 Remove double quoting of url parameters
makeurl quotes by itself, so this was messing it up
2024-05-16 11:49:14 +02:00
d0ccf83684 Revert "Try to fetch the element as deleted if initial access failed"
The OBS api has been fixed to provide an automatic fallback via
https://github.com/openSUSE/open-build-service/pull/15655

This reverts commit c9e07e536f.
2024-05-16 11:49:14 +02:00
b0ffb01c59 cleanups 2024-05-16 11:49:14 +02:00
28d5c6e606 Switch to psycopg rather than psycopg2
It's a bit more modern and uses dedicated c bindings
2024-05-16 11:49:14 +02:00
1e22c2895a Merge pull request 'Switch to sha-256 git repo and use git tools again' (#23) from adamm/git-importer:main into main
Reviewed-on: #23
2024-05-16 11:48:36 +02:00
5da7861c2a Switch to sha-256 git repo and use git tools again 2024-04-09 11:40:26 +02:00
c9e07e536f Try to fetch the element as deleted if initial access failed
The reference to the object might be already deleted by when the
request is failing. plus setting deleted=0 is rejected by the API.
So try with deleted=1 if and only if the previous access failed.
2023-12-07 18:30:36 +01:00
dc0f33354e Failing to LFS register should abort the import 2023-12-07 18:29:56 +01:00
56cbe0a125 Avoid multi-threading races on import
There seems to be races when using db cursors from multiple threads. as
found by import issues after switching to a newer computer that has
performance and energy efficient cores.

As this is not particularly performance critical, convert to single
threaded use which makes it work again
2023-11-28 23:36:44 +01:00
4353f015c8 Switch to localhost:9999 which is provided via a ssh tunnel
The port is no longer directly exposed, so we need to ssh tunnel it
2023-11-22 14:39:55 +01:00
9cbe0899bc Remove unused import 2023-06-19 13:19:52 +02:00
9e80a64fe0 Change hostname references from gitea.opensuse.org to src.opensuse.org 2023-06-19 10:59:56 +02:00
12001b1640 Commit local changes 2023-04-18 22:31:38 +02:00
3797ea178a Merge pull request 'Add a list of packages no longer existing' (#22) from add_gone into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/22
2023-02-09 10:23:35 +01:00
999dcabcfa Add a list of packages no longer existing
I made this a file and not a DB that is automatically maintained as I think
for now adding an entry in there should be done manually - OBS being OBS
packages might look have gone for a brief moment and reappar the day after.
2022-12-02 11:00:31 +01:00
9962673eff Merge pull request 'Add force push for the devel branch' (#21) from add_force into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/21
2022-12-02 09:35:40 +01:00
16 changed files with 1698 additions and 153 deletions

View File

@ -1,5 +1,18 @@
sudo zypper in python3-psycopg2
sudo su - postgres
# `createdb -O <LOCAL_USER> imported_git`
Installation
------------
sudo zypper in python3-psycopg
sudo su - postgres
createdb -O <LOCAL_USER> imported_git`
To reset the database, drop table scheme
Gitea parameters
----------------
* `GITEA_HOST` - default: src.opensuse.org
* `GITEA_USER` - Used to generate SSH links for push. Default: gitea
* `GITEA_ORG` - target organization to push to
* `GITEA_DEFAULT_BRANCH` - default branch

View File

@ -42,8 +42,8 @@ PROJECTS = [
]
def export_package(package, repodir, cachedir, gc):
exporter = GitExporter(URL_OBS, "openSUSE:Factory", package, repodir, cachedir)
def export_package(project, package, repodir, cachedir, gc):
exporter = GitExporter(URL_OBS, project, package, repodir, cachedir)
exporter.set_gc_interval(gc)
exporter.export_as_git()
@ -51,6 +51,12 @@ def export_package(package, repodir, cachedir, gc):
def main():
parser = argparse.ArgumentParser(description="OBS history importer into git")
parser.add_argument("packages", help="OBS package names", nargs="*")
parser.add_argument(
"-p",
"--project",
default="openSUSE:Factory",
help="Project to import/export, default is openSUSE:Factory",
)
parser.add_argument(
"-r",
"--repodir",
@ -110,10 +116,13 @@ def main():
if not args.cachedir:
args.cachedir = pathlib.Path("~/.cache/git-import/").expanduser()
importer = Importer(URL_OBS, "openSUSE:Factory", args.packages)
importer = Importer(URL_OBS, args.project, args.packages)
importer.import_into_db()
for package in args.packages:
export_package(package, args.repodir, args.cachedir, args.gc)
if not importer.package_with_scmsync(package):
export_package(args.project, package, args.repodir, args.cachedir, args.gc)
else:
logging.debug(f"{args.project}/{package} has scmsync links - skipping export")
if __name__ == "__main__":

1355
gone-packages.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -14,8 +14,6 @@ def config(filename="database.ini", section="production"):
for param in params:
db[param[0]] = param[1]
else:
raise Exception(
"Section {0} not found in the {1} file".format(section, filename)
)
raise Exception(f"Section {section} not found in the {filename} file")
return db

View File

@ -1,7 +1,6 @@
import logging
import psycopg2
from psycopg2.extras import LoggingConnection
import psycopg
from lib.config import config
@ -17,22 +16,20 @@ class DB:
# read the connection parameters
params = config(section=self.config_section)
# connect to the PostgreSQL server
self.conn = psycopg2.connect(connection_factory=LoggingConnection, **params)
logger = logging.getLogger(__name__)
self.conn.initialize(logger)
self.conn = psycopg.connect(conninfo=f"dbname={params['database']}")
logging.getLogger("psycopg.pool").setLevel(logging.INFO)
except (Exception, psycopg2.DatabaseError) as error:
except (Exception, psycopg.DatabaseError) as error:
print(error)
raise error
def schema_version(self):
# create a cursor
with self.conn.cursor() as cur:
# execute a statement
try:
cur.execute("SELECT MAX(version) from scheme")
except psycopg2.errors.UndefinedTable as error:
except psycopg.errors.UndefinedTable:
cur.close()
self.close()
self.connect()
@ -146,9 +143,9 @@ class DB:
)
schemes[10] = (
"ALTER TABLE revisions ADD COLUMN request_id INTEGER",
"""ALTER TABLE revisions
"""ALTER TABLE revisions
ADD CONSTRAINT request_id_foreign_key
FOREIGN KEY (request_id)
FOREIGN KEY (request_id)
REFERENCES requests (id)""",
"UPDATE scheme SET version=10",
)
@ -273,7 +270,7 @@ class DB:
cur.execute(command)
# commit the changes
self.conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
except (Exception, psycopg.DatabaseError) as error:
print(error)
self.close()
raise error

View File

@ -2,7 +2,6 @@ from __future__ import annotations
from hashlib import md5
from pathlib import Path
from typing import Optional
from lib.db import DB
from lib.obs_revision import OBSRevision
@ -206,7 +205,7 @@ class DBRevision:
):
continue
cur.execute(
"""INSERT INTO files (name, md5, size, mtime, revision_id)
"""INSERT INTO files (name, md5, size, mtime, revision_id)
VALUES (%s,%s,%s,%s,%s)""",
(
entry.get("name"),
@ -255,7 +254,7 @@ class DBRevision:
self._files.sort(key=lambda x: x["name"])
return self._files
def calc_delta(self, current_rev: Optional[DBRevision]):
def calc_delta(self, current_rev: DBRevision | None):
"""Calculate the list of files to download and to delete.
Param current_rev is the revision that's currently checked out.
If it's None, the repository is empty.

View File

@ -4,7 +4,6 @@ import os
import pathlib
import subprocess
import pygit2
import requests
from lib.binary import BINARY
@ -20,11 +19,6 @@ class Git:
self.committer = committer
self.committer_email = committer_email
self.repo = None
def is_open(self):
return self.repo is not None
def exists(self):
"""Check if the path is a valid git repository"""
return (self.path / ".git").exists()
@ -34,36 +28,70 @@ class Git:
self.path.mkdir(parents=True, exist_ok=True)
self.open()
def git_run(self, args, **kwargs):
"""Run a git command"""
if "env" in kwargs:
envs = kwargs["env"].copy()
del kwargs["env"]
else:
envs = os.environ.copy()
envs["GIT_LFS_SKIP_SMUDGE"] = "1"
envs["GIT_CONFIG_GLOBAL"] = "/dev/null"
return subprocess.run(
["git"] + args,
cwd=self.path,
check=True,
env=envs,
**kwargs,
)
def open(self):
# Convert the path to string, to avoid some limitations in
# older pygit2
self.repo = pygit2.init_repository(str(self.path))
if not self.exists():
self.git_run(["init", "--object-format=sha256", "-b", "factory"])
self.git_run(["config", "lfs.allowincompletepush", "true"])
def is_dirty(self):
"""Check if there is something to commit"""
assert self.is_open()
return self.repo.status()
status_str = self.git_run(
["status", "--porcelain=2"],
stdout=subprocess.PIPE,
).stdout.decode("utf-8")
return len(list(filter(None, status_str.split("\n")))) > 0
def branches(self):
return list(self.repo.branches)
br = (
self.git_run(
["for-each-ref", "--format=%(refname:short)", "refs/heads/"],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.split()
)
if len(br) == 0:
br.append("factory") # unborn branch?
return br
def branch(self, branch, commit=None):
if not commit:
commit = self.repo.head
else:
commit = self.repo.get(commit)
self.repo.branches.local.create(branch, commit)
def branch(self, branch, commit="HEAD"):
commit = (
self.git_run(
["rev-parse", "--verify", "--end-of-options", commit + "^{commit}"],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.strip()
)
return self.git_run(["branch", branch, commit])
def checkout(self, branch):
"""Checkout into the branch HEAD"""
new_branch = False
ref = f"refs/heads/{branch}"
if branch not in self.branches():
self.repo.references["HEAD"].set_target(ref)
self.git_run(["switch", "-q", "--orphan", branch])
new_branch = True
else:
self.repo.checkout(ref)
ref = f"refs/heads/{branch}"
if (self.path / ".git" / ref).exists():
self.git_run(["switch", "--no-guess", "-q", branch])
return new_branch
def commit(
@ -87,51 +115,79 @@ class Git:
committer_time = committer_time if committer_time else user_time
if self.is_dirty():
self.repo.index.add_all()
self.git_run(["add", "--all", "."])
self.repo.index.write()
author = pygit2.Signature(user, user_email, int(user_time.timestamp()))
committer = pygit2.Signature(
committer, committer_email, int(committer_time.timestamp())
tree_id = (
self.git_run(["write-tree"], stdout=subprocess.PIPE)
.stdout.decode("utf-8")
.strip()
)
tree = self.repo.index.write_tree()
return self.repo.create_commit(
"HEAD", author, committer, message, tree, parents
parent_array = []
if isinstance(parents, list):
for parent in filter(None, parents):
parent_array = parent_array + ["-p", parent]
elif isinstance(parents, str):
parent_array = ["-p", parents]
commit_id = (
self.git_run(
["commit-tree"] + parent_array + [tree_id],
env={
"GIT_AUTHOR_NAME": user,
"GIT_AUTHOR_EMAIL": user_email,
"GIT_AUTHOR_DATE": f"{int(user_time.timestamp())} +0000",
"GIT_COMMITTER_NAME": committer,
"GIT_COMMITTER_EMAIL": committer_email,
"GIT_COMMITTER_DATE": f"{int(committer_time.timestamp())} +0000",
},
input=message.encode("utf-8"),
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.rstrip()
)
self.git_run(["reset", "--soft", commit_id])
return commit_id
def last_commit(self):
try:
return self.repo.head.target
except:
return None
def branch_head(self, branch):
return self.repo.references["refs/heads/" + branch].target
def branch_head(self, branch="HEAD"):
return (
self.git_run(
["rev-parse", "--verify", "--end-of-options", branch],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.strip()
)
def set_branch_head(self, branch, commit):
self.repo.references["refs/heads/" + branch].set_target(commit)
return self.git_run(["update-ref", f"refs/heads/{branch}", commit])
def gc(self):
logging.debug(f"Garbage recollect and repackage {self.path}")
subprocess.run(
["git", "gc", "--auto"],
cwd=self.path,
self.git_run(
["gc", "--auto"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
def clean(self):
for path, _ in self.repo.status().items():
logging.debug(f"Cleaning {path}")
try:
(self.path / path).unlink()
self.repo.index.remove(path)
except Exception as e:
logging.warning(f"Error removing file {path}: {e}")
# def clean(self):
# for path, _ in self.repo.status().items():
# logging.debug(f"Cleaning {path}")
# try:
# (self.path / path).unlink()
# self.repo.index.remove(path)
# except Exception as e:
# logging.warning(f"Error removing file {path}: {e}")
def add(self, filename):
self.repo.index.add(filename)
self.git_run(["add", ":(literal)" + str(filename)])
def add_default_gitignore(self):
if not (self.path / ".gitignore").exists():
with (self.path / ".gitignore").open("w") as f:
f.write(".osc\n")
self.add(".gitignore")
def add_default_lfs_gitattributes(self, force=False):
if not (self.path / ".gitattributes").exists() or force:
@ -185,9 +241,9 @@ class Git:
return any(fnmatch.fnmatch(filename, line) for line in patterns)
def remove(self, file: pathlib.Path):
self.repo.index.remove(file.name)
(self.path / file).unlink()
self.git_run(
["rm", "-q", "-f", "--ignore-unmatch", ":(literal)" + file.name],
)
patterns = self.get_specific_lfs_gitattributes()
if file.name in patterns:
patterns.remove(file.name)
@ -196,15 +252,27 @@ class Git:
def add_gitea_remote(self, package):
repo_name = package.replace("+", "_")
org_name = "rpm"
gitea_user = "gitea"
gitea_host = "src.opensuse.org"
default_branch = "factory"
if os.getenv("GITEA_HOST"):
gitea_host = getenv("GITEA_HOST")
if os.getenv("GITEA_USER"):
gitea_user = getenv("GITEA_USER")
if os.getenv("GITEA_ORG"):
org_name = getenv("GITEA_ORG")
if os.getenv("GITEA_DEFAULT_BRANCH"):
default_branch = getenv("GITEA_DEFAULT_BRANCH")
if not os.getenv("GITEA_TOKEN"):
logging.warning("Not adding a remote due to missing $GITEA_TOKEN")
return
url = f"https://gitea.opensuse.org/api/v1/org/{org_name}/repos"
url = f"https://{gitea_host}/api/v1/org/{org_name}/repos"
response = requests.post(
url,
data={"name": repo_name},
data={"name": repo_name, "object_format_name": "sha256", "default_branch": default_branch},
headers={"Authorization": f"token {os.getenv('GITEA_TOKEN')}"},
timeout=10,
)
@ -212,20 +280,21 @@ class Git:
# 201 Created
if response.status_code not in (201, 409):
print(response.data)
url = f"gitea@gitea.opensuse.org:{org_name}/{repo_name}.git"
self.repo.remotes.create("origin", url)
url = f"{gitea_user}@{gitea_host}:{org_name}/{repo_name}.git"
self.git_run(
["remote", "add", "origin", url],
)
def push(self, force=False):
remo = self.repo.remotes["origin"]
if "origin" not in self.git_run(
["remote"],
stdout=subprocess.PIPE,
).stdout.decode("utf-8"):
logging.warning("Not pushing to remote because no 'origin' configured")
return
keypair = pygit2.KeypairFromAgent("gitea")
callbacks = pygit2.RemoteCallbacks(credentials=keypair)
refspecs = ["refs/heads/factory"]
develspec = "refs/heads/devel"
if develspec in self.repo.references:
if force:
refspecs.append(f"+{develspec}:{develspec}")
else:
refspecs.append("{develspec}:{develspec}")
remo.push(refspecs, callbacks=callbacks)
cmd = ["push"]
if force:
cmd.append("-f")
cmd += ["origin", "--all"]
self.git_run(cmd)

View File

@ -29,7 +29,7 @@ class GitExporter:
self.git.open()
else:
self.git.create()
self.git.add_gitea_remote(package)
# self.git.add_gitea_remote(package)
self.state_file = os.path.join(self.git.path, ".git", "_flat_state.yaml")
self.gc_interval = 200
self.cachedir = cachedir
@ -40,9 +40,9 @@ class GitExporter:
def check_repo_state(self, flats, branch_state):
state_data = dict()
if os.path.exists(self.state_file):
with open(self.state_file, "r") as f:
with open(self.state_file) as f:
state_data = yaml.safe_load(f)
if type(state_data) != dict:
if not isinstance(state_data, dict):
state_data = {}
left_to_commit = []
for flat in reversed(flats):
@ -86,6 +86,11 @@ class GitExporter:
logging.debug(f"Committing {flat}")
self.commit_flat(flat, branch_state)
# make sure that we create devel branch
if not branch_state["devel"]:
logging.debug("force creating devel")
self.git.set_branch_head("devel", self.git.branch_head("factory"))
self.git.push(force=True)
def run_gc(self):
@ -150,6 +155,7 @@ class GitExporter:
# create file if not existant
self.git.add_default_lfs_gitattributes(force=False)
self.git.add_default_gitignore()
to_download, to_delete = flat.commit.calc_delta(branch_state[flat.branch])
for file in to_delete:

View File

@ -1,5 +1,5 @@
import concurrent.futures
import logging
import pathlib
import xml.etree.ElementTree as ET
from lib.db import DB
@ -26,11 +26,15 @@ class Importer:
# Import multiple Factory packages into the database
self.packages = packages
self.project = project
self.scmsync_cache = dict()
self.packages_with_scmsync = set()
self.db = DB()
self.obs = OBS(api_url)
assert project == "openSUSE:Factory"
assert not self.has_scmsync(project)
self.refreshed_packages = set()
self.gone_packages_set = None
def import_request(self, number):
self.obs.request(number).import_into_db(self.db)
@ -105,7 +109,7 @@ class Importer:
with self.db.cursor() as cur:
cur.execute(
"""SELECT * FROM revisions WHERE id IN
(SELECT revision_id from linked_revs WHERE linked_id=%s)
(SELECT revision_id from linked_revs WHERE linked_id=%s)
AND commit_time <= %s ORDER BY commit_time""",
(prev.dbid, rev.commit_time),
)
@ -138,7 +142,7 @@ class Importer:
fake_rev = linked.rev + rev.rev / 1000.0
comment = f"Updating link to change in {rev.project}/{rev.package} revision {int(rev.rev)}"
cur.execute(
"""INSERT INTO revisions (project,package,rev,unexpanded_srcmd5,
"""INSERT INTO revisions (project,package,rev,unexpanded_srcmd5,
commit_time, userid, comment, api_url) VALUES(%s,%s,%s,%s,%s,%s,%s,%s) RETURNING id""",
(
linked.project,
@ -161,10 +165,12 @@ class Importer:
(rev.dbid, linked.dbid),
)
def revisions_without_files(self):
def revisions_without_files(self, package):
logging.debug(f"revisions_without_files({package})")
with self.db.cursor() as cur:
cur.execute(
"SELECT * FROM revisions WHERE broken=FALSE AND expanded_srcmd5 IS NULL"
"SELECT * FROM revisions WHERE package=%s AND broken=FALSE AND expanded_srcmd5 IS NULL",
(package,),
)
return [DBRevision(self.db, row) for row in cur.fetchall()]
@ -178,11 +184,11 @@ class Importer:
linked_rev = cur.fetchone()
if linked_rev:
linked_rev = linked_rev[0]
list = self.obs.list(
obs_dir_list = self.obs.list(
rev.project, rev.package, rev.unexpanded_srcmd5, linked_rev
)
if list:
rev.import_dir_list(list)
if obs_dir_list:
rev.import_dir_list(obs_dir_list)
md5 = rev.calculate_files_hash()
with self.db.cursor() as cur:
cur.execute(
@ -196,53 +202,47 @@ class Importer:
self.find_linked_revs()
self.find_fake_revisions()
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
fs = [
executor.submit(import_rev, self, rev)
for rev in self.revisions_without_files()
]
concurrent.futures.wait(fs)
for package in self.packages:
for rev in self.revisions_without_files(package):
print(f"rev {rev} is without files")
self.import_rev(rev)
def refresh_package(self, project, package):
key = f"{project}/{package}"
if key in self.refreshed_packages:
# refreshing once is good enough
return
if self.package_gone(key):
return
logging.debug(f"Refresh {project}/{package}")
self.refreshed_packages.add(key)
if self.has_scmsync(project) or self.has_scmsync(key):
self.packages_with_scmsync.add(package)
logging.debug(f"{project}/{package} already in Git - skipping")
return
self.update_db_package(project, package)
self.fetch_all_linked_packages(project, package)
def import_into_db(self):
for package in self.packages:
refresh_package(self, self.project, package)
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
fs = [
executor.submit(refresh_package, self, self.project, package)
for package in self.packages
]
concurrent.futures.wait(fs)
self.db.conn.commit()
self.db.conn.commit()
for number in DBRevision.requests_to_fetch(self.db):
self.import_request(number)
fs = [
executor.submit(import_request, self, number)
for number in DBRevision.requests_to_fetch(self.db)
]
concurrent.futures.wait(fs)
self.db.conn.commit()
self.db.conn.commit()
with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
for project, package in cur.fetchall():
self.refresh_package(project, package)
with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
fs = [
executor.submit(refresh_package, self, project, package)
for project, package in cur.fetchall()
]
concurrent.futures.wait(fs)
self.db.conn.commit()
missing_users = User.missing_users(self.db)
@ -254,3 +254,26 @@ class Importer:
self.fill_file_lists()
self.db.conn.commit()
def package_gone(self, key):
if not self.gone_packages_set:
self.gone_packages_set = set()
with open(pathlib.Path(__file__).parent.parent / "gone-packages.txt") as f:
for line in f.readlines():
self.gone_packages_set.add(line.strip())
return key in self.gone_packages_set
def has_scmsync(self, key):
if key in self.scmsync_cache:
return self.scmsync_cache[key]
root = self.obs._meta(key)
scmsync_exists = False
if root is not None:
scmsync_exists = root.find('scmsync') is not None
self.scmsync_cache[key] = scmsync_exists
return scmsync_exists
def package_with_scmsync(self, package):
return package in self.packages_with_scmsync

View File

@ -68,7 +68,7 @@ class LFSOid:
row = cur.fetchone()
lfs_oid_id = row[0]
cur.execute(
"""INSERT INTO lfs_oid_in_package (package,filename,lfs_oid_id)
"""INSERT INTO lfs_oid_in_package (package,filename,lfs_oid_id)
VALUES (%s,%s,%s)""",
(package, filename, lfs_oid_id),
)
@ -83,7 +83,8 @@ class LFSOid:
self.register()
def check(self):
url = f"http://gitea.opensuse.org:9999/check/{self.sha256}/{self.size}"
return True
url = f"http://localhost:9999/check/{self.sha256}/{self.size}"
response = requests.get(
url,
timeout=10,
@ -127,12 +128,13 @@ class LFSOid:
"size": self.size,
}
url = "http://gitea.opensuse.org:9999/register"
url = "http://localhost:9999/register"
response = requests.post(
url,
json=data,
timeout=10,
)
response.raise_for_status()
logging.info(f"Register LFS returned {response.status_code}")
@ -167,7 +169,7 @@ if __name__ == "__main__":
cur.execute(
"""
CREATE TEMPORARY TABLE lfs_oid_in_revision (
revision_id INTEGER,
revision_id INTEGER,
lfs_oid_id INTEGER NOT NULL,
name VARCHAR(255) NOT NULL
)

View File

@ -73,11 +73,11 @@ class OBS:
logging.debug(f"GET {url}")
return ET.parse(osc.core.http_GET(url)).getroot()
def _meta(self, project, package, **params):
def _meta(self, key, **params):
try:
root = self._xml(f"source/{project}/{package}/_meta", **params)
root = self._xml(f"source/{key}/_meta", **params)
except HTTPError:
logging.error(f"Package [{project}/{package} {params}] has no meta")
logging.error(f"Project/Package [{key} {params}] has no meta")
return None
return root
@ -118,13 +118,13 @@ class OBS:
return root
def exists(self, project, package):
root = self._meta(project, package)
root = self._meta(f"{project}/{package}")
if root is None:
return False
return root.get("project") == project
def devel_project(self, project, package):
root = self._meta(project, package)
root = self._meta(f"{project}/{package}")
devel = root.find("devel")
if devel is None:
return None
@ -150,7 +150,7 @@ class OBS:
def _download(self, project, package, name, revision):
url = osc.core.makeurl(
self.url,
["source", project, package, urllib.parse.quote(name)],
["source", project, package, name],
{"rev": revision, "expand": 1},
)
return osc.core.http_GET(url)
@ -165,7 +165,6 @@ class OBS:
cachedir: str,
file_md5: str,
) -> None:
cached_file = self._path_from_md5(name, cachedir, file_md5)
if not self.in_cache(name, cachedir, file_md5):
with (dirpath / name).open("wb") as f:

View File

@ -7,8 +7,6 @@ except:
print("Install python3-python-magic, not python3-magic")
raise
import requests
from lib.db import DB
from lib.lfs_oid import LFSOid
from lib.obs import OBS
@ -43,7 +41,6 @@ class ProxySHA256:
}
def put(self, project, package, name, revision, file_md5, size):
if not self.mime:
self.mime = magic.Magic(mime=True)

View File

@ -1,4 +1,3 @@
from typing import Dict
from xmlrpc.client import Boolean
from lib.db_revision import DBRevision
@ -114,7 +113,7 @@ class TreeBuilder:
candidates.append(node)
if node.merged_into:
# we can't have candidates that are crossing previous merges
# see https://gitea.opensuse.org/importers/git-importer/issues/14
# see https://src.opensuse.org/importers/git-importer/issues/14
candidates = []
node = node.parent
if candidates:
@ -138,7 +137,7 @@ class TreeBuilder:
self.requests.add(node.revision.request_id)
class FindMergeWalker(AbstractWalker):
def __init__(self, builder: TreeBuilder, requests: Dict) -> None:
def __init__(self, builder: TreeBuilder, requests: dict) -> None:
super().__init__()
self.source_revisions = dict()
self.builder = builder

59
opensuse-monitor.py Executable file
View File

@ -0,0 +1,59 @@
#!/usr/bin/python3
import json
from pathlib import Path
import pika
import random
import time
MY_TASKS_DIR = Path(__file__).parent / "tasks"
def listen_events():
connection = pika.BlockingConnection(
pika.URLParameters("amqps://opensuse:opensuse@rabbit.opensuse.org")
)
channel = connection.channel()
channel.exchange_declare(
exchange="pubsub", exchange_type="topic", passive=True, durable=False
)
result = channel.queue_declare("", exclusive=True)
queue_name = result.method.queue
channel.queue_bind(
exchange="pubsub", queue=queue_name, routing_key="opensuse.obs.package.commit"
)
print(" [*] Waiting for logs. To exit press CTRL+C")
def callback(ch, method, properties, body):
if method.routing_key not in ("opensuse.obs.package.commit",):
return
body = json.loads(body)
if (
"project" in body
and "package" in body
and body["project"] == "openSUSE:Factory"
):
if "/" in body["package"]:
return
(MY_TASKS_DIR / body["package"]).touch()
print(" [x] %r:%r" % (method.routing_key, body["package"]))
channel.basic_consume(queue_name, callback, auto_ack=True)
channel.start_consuming()
def main():
while True:
try:
listen_events()
except (pika.exceptions.ConnectionClosed, pika.exceptions.AMQPHeartbeatTimeout):
time.sleep(random.randint(10, 100))
if __name__ == "__main__":
main()

1
tasks/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
*

19
update-tasks.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
#
cd /space/dmueller/git-importer
source credentials.sh
while true; do
for i in $PWD/tasks/*; do
if test -f "$i"; then
echo "$(date): Importing $(basename $i)"
if ! python3 ./git-importer.py -c repos/.cache $(basename $i); then
mkdir -p $PWD/failed-tasks
mv -f $i $PWD/failed-tasks
fi
rm -f $i
fi
done
inotifywait -q -e create $PWD/tasks
done