Compare commits

...

44 Commits

Author SHA1 Message Date
Adam Majer
c8455c22dc Update Makefile 2024-08-24 17:13:35 +02:00
Adam Majer
94fdb3b442 Revert "Remove devel branch import"
This reverts commit 1318f9e0c46a0f0569fc77535703cc1b64bba57b.
2024-08-11 22:54:47 +02:00
Adam Majer
f2358446da don't create gitea repos with importer 2024-08-11 22:54:44 +02:00
9660e633af Parametrizes git import parameters 2024-08-08 17:56:41 +02:00
85b9ed5e75 disable LFS check for testing 2024-08-08 17:56:17 +02:00
86f82325d8 Stop importing/exporting scmsync packages/projects
Also, allow other-than Factory projects
2024-08-08 10:35:53 +02:00
Dirk Mueller
39ba616226 Merge pull request 'Add ability to specify non-Factory' (#31) from adamm/git-importer:option_for_non_factory into main
Reviewed-on: importers/git-importer#31
Reviewed-by: Dirk Mueller <dirkmueller@noreply@src.opensuse.org>
2024-08-07 18:27:11 +02:00
531dbc7c1b Add ability to specify non-Factory
This is important for devel-project only imports
non-factory is still blocked by assert
2024-08-07 16:55:05 +02:00
Dirk Müller
1318f9e0c4
Remove devel branch import
this for yet undefined reason screws up systemd history import
2024-08-07 09:47:54 +02:00
Dirk Müller
d563076d9e
add explicit conversion to string to fix the concatenation 2024-08-07 09:47:18 +02:00
b11b3f1adb
Add and remove literal files
pathspec in git has special characters that we should not trigger.
Assume every filespec as literal
2024-08-01 16:53:46 +02:00
Dirk Müller
479738d4b2
ruff format run 2024-07-10 10:34:20 +02:00
Adam Majer
2d04136ca5
Make sure we create devel branch, when no diff to Factory 2024-06-13 15:36:59 +02:00
Adam Majer
40ad64ddff
Ignore .osc directory 2024-06-10 18:13:51 +02:00
Adam Majer
6bd5d72100
New branch is empty
New branches must be born empty
2024-06-10 17:06:15 +02:00
Dirk Müller
022ae5ab58
remember failed tasks in a separate directory 2024-06-10 17:04:43 +02:00
Dirk Müller
2ff8ed76d0
Reconnect to the AMQP bus when the connection breaks down 2024-06-10 17:04:25 +02:00
Dirk Müller
5f228dc046
enable robust push 2024-05-17 21:47:35 +02:00
Dirk Müller
4e07d8272e
don't loop over failed packages 2024-05-17 21:47:15 +02:00
Dirk Müller
2a3475ab6e
Create with sha256 enabled 2024-05-17 20:39:55 +02:00
Dirk Müller
574bc9aa10
Avoid guessing in switch 2024-05-17 20:07:16 +02:00
Dirk Müller
0414b33206
Fix testing for origin
The previous code path was untested and not working
2024-05-17 20:06:25 +02:00
Dirk Müller
b9670821a9
Only init the repository if it doesn't exist already
harmless, but avoids a scary warning
2024-05-17 20:05:54 +02:00
Dirk Müller
073550825c
Fixups to improve the conversion process 2024-05-17 14:41:42 +02:00
Dirk Müller
5a353c98d3
Add tasks 2024-05-17 11:46:18 +02:00
Dirk Müller
1fc466d15b
Add monitor for commits 2024-05-17 11:40:19 +02:00
Dirk Müller
39fde7744a
Code cleanup 2024-05-16 15:47:45 +02:00
Dirk Müller
f5ffc83a69
Remove double quoting of url parameters
makeurl quotes by itself, so this was messing it up
2024-05-16 11:49:14 +02:00
Dirk Müller
d0ccf83684
Revert "Try to fetch the element as deleted if initial access failed"
The OBS api has been fixed to provide an automatic fallback via
https://github.com/openSUSE/open-build-service/pull/15655

This reverts commit c9e07e536f19820c4bba1f11e2edcb23069874d7.
2024-05-16 11:49:14 +02:00
Dirk Müller
b0ffb01c59
cleanups 2024-05-16 11:49:14 +02:00
Dirk Müller
28d5c6e606
Switch to psycopg rather than psycopg2
It's a bit more modern and uses dedicated c bindings
2024-05-16 11:49:14 +02:00
Dirk Mueller
1e22c2895a Merge pull request 'Switch to sha-256 git repo and use git tools again' (#23) from adamm/git-importer:main into main
Reviewed-on: importers/git-importer#23
2024-05-16 11:48:36 +02:00
Adam Majer
5da7861c2a Switch to sha-256 git repo and use git tools again 2024-04-09 11:40:26 +02:00
Dirk Müller
c9e07e536f
Try to fetch the element as deleted if initial access failed
The reference to the object might be already deleted by when the
request is failing. plus setting deleted=0 is rejected by the API.
So try with deleted=1 if and only if the previous access failed.
2023-12-07 18:30:36 +01:00
Dirk Müller
dc0f33354e
Failing to LFS register should abort the import 2023-12-07 18:29:56 +01:00
Dirk Müller
56cbe0a125
Avoid multi-threading races on import
There seems to be races when using db cursors from multiple threads. as
found by import issues after switching to a newer computer that has
performance and energy efficient cores.

As this is not particularly performance critical, convert to single
threaded use which makes it work again
2023-11-28 23:36:44 +01:00
Dirk Müller
4353f015c8
Switch to localhost:9999 which is provided via a ssh tunnel
The port is no longer directly exposed, so we need to ssh tunnel it
2023-11-22 14:39:55 +01:00
Dirk Müller
9cbe0899bc
Remove unused import 2023-06-19 13:19:52 +02:00
Dirk Müller
9e80a64fe0
Change hostname references from gitea.opensuse.org to src.opensuse.org 2023-06-19 10:59:56 +02:00
Dirk Müller
12001b1640
Commit local changes 2023-04-18 22:31:38 +02:00
Stephan Kulow
3797ea178a Merge pull request 'Add a list of packages no longer existing' (#22) from add_gone into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/22
2023-02-09 10:23:35 +01:00
Stephan Kulow
999dcabcfa Add a list of packages no longer existing
I made this a file and not a DB that is automatically maintained as I think
for now adding an entry in there should be done manually - OBS being OBS
packages might look have gone for a brief moment and reappar the day after.
2022-12-02 11:00:31 +01:00
9962673eff Merge pull request 'Add force push for the devel branch' (#21) from add_force into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/21
2022-12-02 09:35:40 +01:00
Stephan Kulow
7b20c03256 Add force push for the devel branch
As devel branches can change in case of factory reverts we need to force
push. Factory branch shouldn't be affected, so not force pushing there
2022-12-02 09:12:11 +01:00
17 changed files with 1701 additions and 152 deletions

View File

@ -9,5 +9,5 @@ test:
update-packages:
f=$$(mktemp) ;\
osc api /source/openSUSE:Factory?view=info | grep -v lsrcmd5 | grep srcmd5= | sed -e 's,.*package=",,; s,".*,,' | grep -v : > $$f ;\
echo _project >> $$f ;\
echo _project >> $$f;\
mv $$f packages

View File

@ -1,5 +1,18 @@
sudo zypper in python3-psycopg2
sudo su - postgres
# `createdb -O <LOCAL_USER> imported_git`
Installation
------------
sudo zypper in python3-psycopg
sudo su - postgres
createdb -O <LOCAL_USER> imported_git`
To reset the database, drop table scheme
Gitea parameters
----------------
* `GITEA_HOST` - default: src.opensuse.org
* `GITEA_USER` - Used to generate SSH links for push. Default: gitea
* `GITEA_ORG` - target organization to push to
* `GITEA_DEFAULT_BRANCH` - default branch

View File

@ -42,8 +42,8 @@ PROJECTS = [
]
def export_package(package, repodir, cachedir, gc):
exporter = GitExporter(URL_OBS, "openSUSE:Factory", package, repodir, cachedir)
def export_package(project, package, repodir, cachedir, gc):
exporter = GitExporter(URL_OBS, project, package, repodir, cachedir)
exporter.set_gc_interval(gc)
exporter.export_as_git()
@ -51,6 +51,12 @@ def export_package(package, repodir, cachedir, gc):
def main():
parser = argparse.ArgumentParser(description="OBS history importer into git")
parser.add_argument("packages", help="OBS package names", nargs="*")
parser.add_argument(
"-p",
"--project",
default="openSUSE:Factory",
help="Project to import/export, default is openSUSE:Factory",
)
parser.add_argument(
"-r",
"--repodir",
@ -110,10 +116,13 @@ def main():
if not args.cachedir:
args.cachedir = pathlib.Path("~/.cache/git-import/").expanduser()
importer = Importer(URL_OBS, "openSUSE:Factory", args.packages)
importer = Importer(URL_OBS, args.project, args.packages)
importer.import_into_db()
for package in args.packages:
export_package(package, args.repodir, args.cachedir, args.gc)
if not importer.package_with_scmsync(package):
export_package(args.project, package, args.repodir, args.cachedir, args.gc)
else:
logging.debug(f"{args.project}/{package} has scmsync links - skipping export")
if __name__ == "__main__":

1355
gone-packages.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -14,8 +14,6 @@ def config(filename="database.ini", section="production"):
for param in params:
db[param[0]] = param[1]
else:
raise Exception(
"Section {0} not found in the {1} file".format(section, filename)
)
raise Exception(f"Section {section} not found in the {filename} file")
return db

View File

@ -1,7 +1,6 @@
import logging
import psycopg2
from psycopg2.extras import LoggingConnection
import psycopg
from lib.config import config
@ -17,22 +16,20 @@ class DB:
# read the connection parameters
params = config(section=self.config_section)
# connect to the PostgreSQL server
self.conn = psycopg2.connect(connection_factory=LoggingConnection, **params)
logger = logging.getLogger(__name__)
self.conn.initialize(logger)
self.conn = psycopg.connect(conninfo=f"dbname={params['database']}")
logging.getLogger("psycopg.pool").setLevel(logging.INFO)
except (Exception, psycopg2.DatabaseError) as error:
except (Exception, psycopg.DatabaseError) as error:
print(error)
raise error
def schema_version(self):
# create a cursor
with self.conn.cursor() as cur:
# execute a statement
try:
cur.execute("SELECT MAX(version) from scheme")
except psycopg2.errors.UndefinedTable as error:
except psycopg.errors.UndefinedTable:
cur.close()
self.close()
self.connect()
@ -146,9 +143,9 @@ class DB:
)
schemes[10] = (
"ALTER TABLE revisions ADD COLUMN request_id INTEGER",
"""ALTER TABLE revisions
"""ALTER TABLE revisions
ADD CONSTRAINT request_id_foreign_key
FOREIGN KEY (request_id)
FOREIGN KEY (request_id)
REFERENCES requests (id)""",
"UPDATE scheme SET version=10",
)
@ -273,7 +270,7 @@ class DB:
cur.execute(command)
# commit the changes
self.conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
except (Exception, psycopg.DatabaseError) as error:
print(error)
self.close()
raise error

View File

@ -2,7 +2,6 @@ from __future__ import annotations
from hashlib import md5
from pathlib import Path
from typing import Optional
from lib.db import DB
from lib.obs_revision import OBSRevision
@ -206,7 +205,7 @@ class DBRevision:
):
continue
cur.execute(
"""INSERT INTO files (name, md5, size, mtime, revision_id)
"""INSERT INTO files (name, md5, size, mtime, revision_id)
VALUES (%s,%s,%s,%s,%s)""",
(
entry.get("name"),
@ -255,7 +254,7 @@ class DBRevision:
self._files.sort(key=lambda x: x["name"])
return self._files
def calc_delta(self, current_rev: Optional[DBRevision]):
def calc_delta(self, current_rev: DBRevision | None):
"""Calculate the list of files to download and to delete.
Param current_rev is the revision that's currently checked out.
If it's None, the repository is empty.

View File

@ -4,7 +4,6 @@ import os
import pathlib
import subprocess
import pygit2
import requests
from lib.binary import BINARY
@ -20,11 +19,6 @@ class Git:
self.committer = committer
self.committer_email = committer_email
self.repo = None
def is_open(self):
return self.repo is not None
def exists(self):
"""Check if the path is a valid git repository"""
return (self.path / ".git").exists()
@ -34,36 +28,70 @@ class Git:
self.path.mkdir(parents=True, exist_ok=True)
self.open()
def git_run(self, args, **kwargs):
"""Run a git command"""
if "env" in kwargs:
envs = kwargs["env"].copy()
del kwargs["env"]
else:
envs = os.environ.copy()
envs["GIT_LFS_SKIP_SMUDGE"] = "1"
envs["GIT_CONFIG_GLOBAL"] = "/dev/null"
return subprocess.run(
["git"] + args,
cwd=self.path,
check=True,
env=envs,
**kwargs,
)
def open(self):
# Convert the path to string, to avoid some limitations in
# older pygit2
self.repo = pygit2.init_repository(str(self.path))
if not self.exists():
self.git_run(["init", "--object-format=sha256", "-b", "factory"])
self.git_run(["config", "lfs.allowincompletepush", "true"])
def is_dirty(self):
"""Check if there is something to commit"""
assert self.is_open()
return self.repo.status()
status_str = self.git_run(
["status", "--porcelain=2"],
stdout=subprocess.PIPE,
).stdout.decode("utf-8")
return len(list(filter(None, status_str.split("\n")))) > 0
def branches(self):
return list(self.repo.branches)
br = (
self.git_run(
["for-each-ref", "--format=%(refname:short)", "refs/heads/"],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.split()
)
if len(br) == 0:
br.append("factory") # unborn branch?
return br
def branch(self, branch, commit=None):
if not commit:
commit = self.repo.head
else:
commit = self.repo.get(commit)
self.repo.branches.local.create(branch, commit)
def branch(self, branch, commit="HEAD"):
commit = (
self.git_run(
["rev-parse", "--verify", "--end-of-options", commit + "^{commit}"],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.strip()
)
return self.git_run(["branch", branch, commit])
def checkout(self, branch):
"""Checkout into the branch HEAD"""
new_branch = False
ref = f"refs/heads/{branch}"
if branch not in self.branches():
self.repo.references["HEAD"].set_target(ref)
self.git_run(["switch", "-q", "--orphan", branch])
new_branch = True
else:
self.repo.checkout(ref)
ref = f"refs/heads/{branch}"
if (self.path / ".git" / ref).exists():
self.git_run(["switch", "--no-guess", "-q", branch])
return new_branch
def commit(
@ -87,51 +115,79 @@ class Git:
committer_time = committer_time if committer_time else user_time
if self.is_dirty():
self.repo.index.add_all()
self.git_run(["add", "--all", "."])
self.repo.index.write()
author = pygit2.Signature(user, user_email, int(user_time.timestamp()))
committer = pygit2.Signature(
committer, committer_email, int(committer_time.timestamp())
tree_id = (
self.git_run(["write-tree"], stdout=subprocess.PIPE)
.stdout.decode("utf-8")
.strip()
)
tree = self.repo.index.write_tree()
return self.repo.create_commit(
"HEAD", author, committer, message, tree, parents
parent_array = []
if isinstance(parents, list):
for parent in filter(None, parents):
parent_array = parent_array + ["-p", parent]
elif isinstance(parents, str):
parent_array = ["-p", parents]
commit_id = (
self.git_run(
["commit-tree"] + parent_array + [tree_id],
env={
"GIT_AUTHOR_NAME": user,
"GIT_AUTHOR_EMAIL": user_email,
"GIT_AUTHOR_DATE": f"{int(user_time.timestamp())} +0000",
"GIT_COMMITTER_NAME": committer,
"GIT_COMMITTER_EMAIL": committer_email,
"GIT_COMMITTER_DATE": f"{int(committer_time.timestamp())} +0000",
},
input=message.encode("utf-8"),
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.rstrip()
)
self.git_run(["reset", "--soft", commit_id])
return commit_id
def last_commit(self):
try:
return self.repo.head.target
except:
return None
def branch_head(self, branch):
return self.repo.references["refs/heads/" + branch].target
def branch_head(self, branch="HEAD"):
return (
self.git_run(
["rev-parse", "--verify", "--end-of-options", branch],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.strip()
)
def set_branch_head(self, branch, commit):
self.repo.references["refs/heads/" + branch].set_target(commit)
return self.git_run(["update-ref", f"refs/heads/{branch}", commit])
def gc(self):
logging.debug(f"Garbage recollect and repackage {self.path}")
subprocess.run(
["git", "gc", "--auto"],
cwd=self.path,
self.git_run(
["gc", "--auto"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
def clean(self):
for path, _ in self.repo.status().items():
logging.debug(f"Cleaning {path}")
try:
(self.path / path).unlink()
self.repo.index.remove(path)
except Exception as e:
logging.warning(f"Error removing file {path}: {e}")
# def clean(self):
# for path, _ in self.repo.status().items():
# logging.debug(f"Cleaning {path}")
# try:
# (self.path / path).unlink()
# self.repo.index.remove(path)
# except Exception as e:
# logging.warning(f"Error removing file {path}: {e}")
def add(self, filename):
self.repo.index.add(filename)
self.git_run(["add", ":(literal)" + str(filename)])
def add_default_gitignore(self):
if not (self.path / ".gitignore").exists():
with (self.path / ".gitignore").open("w") as f:
f.write(".osc\n")
self.add(".gitignore")
def add_default_lfs_gitattributes(self, force=False):
if not (self.path / ".gitattributes").exists() or force:
@ -185,9 +241,9 @@ class Git:
return any(fnmatch.fnmatch(filename, line) for line in patterns)
def remove(self, file: pathlib.Path):
self.repo.index.remove(file.name)
(self.path / file).unlink()
self.git_run(
["rm", "-q", "-f", "--ignore-unmatch", ":(literal)" + file.name],
)
patterns = self.get_specific_lfs_gitattributes()
if file.name in patterns:
patterns.remove(file.name)
@ -196,15 +252,27 @@ class Git:
def add_gitea_remote(self, package):
repo_name = package.replace("+", "_")
org_name = "rpm"
gitea_user = "gitea"
gitea_host = "src.opensuse.org"
default_branch = "factory"
if os.getenv("GITEA_HOST"):
gitea_host = getenv("GITEA_HOST")
if os.getenv("GITEA_USER"):
gitea_user = getenv("GITEA_USER")
if os.getenv("GITEA_ORG"):
org_name = getenv("GITEA_ORG")
if os.getenv("GITEA_DEFAULT_BRANCH"):
default_branch = getenv("GITEA_DEFAULT_BRANCH")
if not os.getenv("GITEA_TOKEN"):
logging.warning("Not adding a remote due to missing $GITEA_TOKEN")
return
url = f"https://gitea.opensuse.org/api/v1/org/{org_name}/repos"
url = f"https://{gitea_host}/api/v1/org/{org_name}/repos"
response = requests.post(
url,
data={"name": repo_name},
data={"name": repo_name, "object_format_name": "sha256", "default_branch": default_branch},
headers={"Authorization": f"token {os.getenv('GITEA_TOKEN')}"},
timeout=10,
)
@ -212,16 +280,21 @@ class Git:
# 201 Created
if response.status_code not in (201, 409):
print(response.data)
url = f"gitea@gitea.opensuse.org:{org_name}/{repo_name}.git"
self.repo.remotes.create("origin", url)
url = f"{gitea_user}@{gitea_host}:{org_name}/{repo_name}.git"
self.git_run(
["remote", "add", "origin", url],
)
def push(self):
remo = self.repo.remotes["origin"]
def push(self, force=False):
if "origin" not in self.git_run(
["remote"],
stdout=subprocess.PIPE,
).stdout.decode("utf-8"):
logging.warning("Not pushing to remote because no 'origin' configured")
return
keypair = pygit2.KeypairFromAgent("gitea")
callbacks = pygit2.RemoteCallbacks(credentials=keypair)
refspecs = ["refs/heads/factory"]
if "refs/heads/devel" in self.repo.references:
refspecs.append("refs/heads/devel")
remo.push(refspecs, callbacks=callbacks)
cmd = ["push"]
if force:
cmd.append("-f")
cmd += ["origin", "--all"]
self.git_run(cmd)

View File

@ -29,7 +29,7 @@ class GitExporter:
self.git.open()
else:
self.git.create()
self.git.add_gitea_remote(package)
# self.git.add_gitea_remote(package)
self.state_file = os.path.join(self.git.path, ".git", "_flat_state.yaml")
self.gc_interval = 200
self.cachedir = cachedir
@ -40,9 +40,9 @@ class GitExporter:
def check_repo_state(self, flats, branch_state):
state_data = dict()
if os.path.exists(self.state_file):
with open(self.state_file, "r") as f:
with open(self.state_file) as f:
state_data = yaml.safe_load(f)
if type(state_data) != dict:
if not isinstance(state_data, dict):
state_data = {}
left_to_commit = []
for flat in reversed(flats):
@ -86,7 +86,12 @@ class GitExporter:
logging.debug(f"Committing {flat}")
self.commit_flat(flat, branch_state)
self.git.push()
# make sure that we create devel branch
if not branch_state["devel"]:
logging.debug("force creating devel")
self.git.set_branch_head("devel", self.git.branch_head("factory"))
self.git.push(force=True)
def run_gc(self):
self.gc_cnt = self.gc_interval
@ -150,6 +155,7 @@ class GitExporter:
# create file if not existant
self.git.add_default_lfs_gitattributes(force=False)
self.git.add_default_gitignore()
to_download, to_delete = flat.commit.calc_delta(branch_state[flat.branch])
for file in to_delete:

View File

@ -1,5 +1,5 @@
import concurrent.futures
import logging
import pathlib
import xml.etree.ElementTree as ET
from lib.db import DB
@ -26,11 +26,15 @@ class Importer:
# Import multiple Factory packages into the database
self.packages = packages
self.project = project
self.scmsync_cache = dict()
self.packages_with_scmsync = set()
self.db = DB()
self.obs = OBS(api_url)
assert project == "openSUSE:Factory"
assert not self.has_scmsync(project)
self.refreshed_packages = set()
self.gone_packages_set = None
def import_request(self, number):
self.obs.request(number).import_into_db(self.db)
@ -105,7 +109,7 @@ class Importer:
with self.db.cursor() as cur:
cur.execute(
"""SELECT * FROM revisions WHERE id IN
(SELECT revision_id from linked_revs WHERE linked_id=%s)
(SELECT revision_id from linked_revs WHERE linked_id=%s)
AND commit_time <= %s ORDER BY commit_time""",
(prev.dbid, rev.commit_time),
)
@ -138,7 +142,7 @@ class Importer:
fake_rev = linked.rev + rev.rev / 1000.0
comment = f"Updating link to change in {rev.project}/{rev.package} revision {int(rev.rev)}"
cur.execute(
"""INSERT INTO revisions (project,package,rev,unexpanded_srcmd5,
"""INSERT INTO revisions (project,package,rev,unexpanded_srcmd5,
commit_time, userid, comment, api_url) VALUES(%s,%s,%s,%s,%s,%s,%s,%s) RETURNING id""",
(
linked.project,
@ -161,10 +165,12 @@ class Importer:
(rev.dbid, linked.dbid),
)
def revisions_without_files(self):
def revisions_without_files(self, package):
logging.debug(f"revisions_without_files({package})")
with self.db.cursor() as cur:
cur.execute(
"SELECT * FROM revisions WHERE broken=FALSE AND expanded_srcmd5 IS NULL"
"SELECT * FROM revisions WHERE package=%s AND broken=FALSE AND expanded_srcmd5 IS NULL",
(package,),
)
return [DBRevision(self.db, row) for row in cur.fetchall()]
@ -178,11 +184,11 @@ class Importer:
linked_rev = cur.fetchone()
if linked_rev:
linked_rev = linked_rev[0]
list = self.obs.list(
obs_dir_list = self.obs.list(
rev.project, rev.package, rev.unexpanded_srcmd5, linked_rev
)
if list:
rev.import_dir_list(list)
if obs_dir_list:
rev.import_dir_list(obs_dir_list)
md5 = rev.calculate_files_hash()
with self.db.cursor() as cur:
cur.execute(
@ -196,53 +202,47 @@ class Importer:
self.find_linked_revs()
self.find_fake_revisions()
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
fs = [
executor.submit(import_rev, self, rev)
for rev in self.revisions_without_files()
]
concurrent.futures.wait(fs)
for package in self.packages:
for rev in self.revisions_without_files(package):
print(f"rev {rev} is without files")
self.import_rev(rev)
def refresh_package(self, project, package):
key = f"{project}/{package}"
if key in self.refreshed_packages:
# refreshing once is good enough
return
if self.package_gone(key):
return
logging.debug(f"Refresh {project}/{package}")
self.refreshed_packages.add(key)
if self.has_scmsync(project) or self.has_scmsync(key):
self.packages_with_scmsync.add(package)
logging.debug(f"{project}/{package} already in Git - skipping")
return
self.update_db_package(project, package)
self.fetch_all_linked_packages(project, package)
def import_into_db(self):
for package in self.packages:
refresh_package(self, self.project, package)
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
fs = [
executor.submit(refresh_package, self, self.project, package)
for package in self.packages
]
concurrent.futures.wait(fs)
self.db.conn.commit()
self.db.conn.commit()
for number in DBRevision.requests_to_fetch(self.db):
self.import_request(number)
fs = [
executor.submit(import_request, self, number)
for number in DBRevision.requests_to_fetch(self.db)
]
concurrent.futures.wait(fs)
self.db.conn.commit()
self.db.conn.commit()
with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
for project, package in cur.fetchall():
self.refresh_package(project, package)
with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
fs = [
executor.submit(refresh_package, self, project, package)
for project, package in cur.fetchall()
]
concurrent.futures.wait(fs)
self.db.conn.commit()
missing_users = User.missing_users(self.db)
@ -254,3 +254,26 @@ class Importer:
self.fill_file_lists()
self.db.conn.commit()
def package_gone(self, key):
if not self.gone_packages_set:
self.gone_packages_set = set()
with open(pathlib.Path(__file__).parent.parent / "gone-packages.txt") as f:
for line in f.readlines():
self.gone_packages_set.add(line.strip())
return key in self.gone_packages_set
def has_scmsync(self, key):
if key in self.scmsync_cache:
return self.scmsync_cache[key]
root = self.obs._meta(key)
scmsync_exists = False
if root is not None:
scmsync_exists = root.find('scmsync') is not None
self.scmsync_cache[key] = scmsync_exists
return scmsync_exists
def package_with_scmsync(self, package):
return package in self.packages_with_scmsync

View File

@ -68,7 +68,7 @@ class LFSOid:
row = cur.fetchone()
lfs_oid_id = row[0]
cur.execute(
"""INSERT INTO lfs_oid_in_package (package,filename,lfs_oid_id)
"""INSERT INTO lfs_oid_in_package (package,filename,lfs_oid_id)
VALUES (%s,%s,%s)""",
(package, filename, lfs_oid_id),
)
@ -83,7 +83,8 @@ class LFSOid:
self.register()
def check(self):
url = f"http://gitea.opensuse.org:9999/check/{self.sha256}/{self.size}"
return True
url = f"http://localhost:9999/check/{self.sha256}/{self.size}"
response = requests.get(
url,
timeout=10,
@ -127,12 +128,13 @@ class LFSOid:
"size": self.size,
}
url = "http://gitea.opensuse.org:9999/register"
url = "http://localhost:9999/register"
response = requests.post(
url,
json=data,
timeout=10,
)
response.raise_for_status()
logging.info(f"Register LFS returned {response.status_code}")
@ -167,7 +169,7 @@ if __name__ == "__main__":
cur.execute(
"""
CREATE TEMPORARY TABLE lfs_oid_in_revision (
revision_id INTEGER,
revision_id INTEGER,
lfs_oid_id INTEGER NOT NULL,
name VARCHAR(255) NOT NULL
)

View File

@ -73,11 +73,11 @@ class OBS:
logging.debug(f"GET {url}")
return ET.parse(osc.core.http_GET(url)).getroot()
def _meta(self, project, package, **params):
def _meta(self, key, **params):
try:
root = self._xml(f"source/{project}/{package}/_meta", **params)
root = self._xml(f"source/{key}/_meta", **params)
except HTTPError:
logging.error(f"Package [{project}/{package} {params}] has no meta")
logging.error(f"Project/Package [{key} {params}] has no meta")
return None
return root
@ -118,13 +118,13 @@ class OBS:
return root
def exists(self, project, package):
root = self._meta(project, package)
root = self._meta(f"{project}/{package}")
if root is None:
return False
return root.get("project") == project
def devel_project(self, project, package):
root = self._meta(project, package)
root = self._meta(f"{project}/{package}")
devel = root.find("devel")
if devel is None:
return None
@ -150,7 +150,7 @@ class OBS:
def _download(self, project, package, name, revision):
url = osc.core.makeurl(
self.url,
["source", project, package, urllib.parse.quote(name)],
["source", project, package, name],
{"rev": revision, "expand": 1},
)
return osc.core.http_GET(url)
@ -165,7 +165,6 @@ class OBS:
cachedir: str,
file_md5: str,
) -> None:
cached_file = self._path_from_md5(name, cachedir, file_md5)
if not self.in_cache(name, cachedir, file_md5):
with (dirpath / name).open("wb") as f:

View File

@ -7,8 +7,6 @@ except:
print("Install python3-python-magic, not python3-magic")
raise
import requests
from lib.db import DB
from lib.lfs_oid import LFSOid
from lib.obs import OBS
@ -43,7 +41,6 @@ class ProxySHA256:
}
def put(self, project, package, name, revision, file_md5, size):
if not self.mime:
self.mime = magic.Magic(mime=True)

View File

@ -1,4 +1,3 @@
from typing import Dict
from xmlrpc.client import Boolean
from lib.db_revision import DBRevision
@ -114,7 +113,7 @@ class TreeBuilder:
candidates.append(node)
if node.merged_into:
# we can't have candidates that are crossing previous merges
# see https://gitea.opensuse.org/importers/git-importer/issues/14
# see https://src.opensuse.org/importers/git-importer/issues/14
candidates = []
node = node.parent
if candidates:
@ -138,7 +137,7 @@ class TreeBuilder:
self.requests.add(node.revision.request_id)
class FindMergeWalker(AbstractWalker):
def __init__(self, builder: TreeBuilder, requests: Dict) -> None:
def __init__(self, builder: TreeBuilder, requests: dict) -> None:
super().__init__()
self.source_revisions = dict()
self.builder = builder

59
opensuse-monitor.py Executable file
View File

@ -0,0 +1,59 @@
#!/usr/bin/python3
import json
from pathlib import Path
import pika
import random
import time
MY_TASKS_DIR = Path(__file__).parent / "tasks"
def listen_events():
connection = pika.BlockingConnection(
pika.URLParameters("amqps://opensuse:opensuse@rabbit.opensuse.org")
)
channel = connection.channel()
channel.exchange_declare(
exchange="pubsub", exchange_type="topic", passive=True, durable=False
)
result = channel.queue_declare("", exclusive=True)
queue_name = result.method.queue
channel.queue_bind(
exchange="pubsub", queue=queue_name, routing_key="opensuse.obs.package.commit"
)
print(" [*] Waiting for logs. To exit press CTRL+C")
def callback(ch, method, properties, body):
if method.routing_key not in ("opensuse.obs.package.commit",):
return
body = json.loads(body)
if (
"project" in body
and "package" in body
and body["project"] == "openSUSE:Factory"
):
if "/" in body["package"]:
return
(MY_TASKS_DIR / body["package"]).touch()
print(" [x] %r:%r" % (method.routing_key, body["package"]))
channel.basic_consume(queue_name, callback, auto_ack=True)
channel.start_consuming()
def main():
while True:
try:
listen_events()
except (pika.exceptions.ConnectionClosed, pika.exceptions.AMQPHeartbeatTimeout):
time.sleep(random.randint(10, 100))
if __name__ == "__main__":
main()

1
tasks/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
*

19
update-tasks.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
#
cd /space/dmueller/git-importer
source credentials.sh
while true; do
for i in $PWD/tasks/*; do
if test -f "$i"; then
echo "$(date): Importing $(basename $i)"
if ! python3 ./git-importer.py -c repos/.cache $(basename $i); then
mkdir -p $PWD/failed-tasks
mv -f $i $PWD/failed-tasks
fi
rm -f $i
fi
done
inotifywait -q -e create $PWD/tasks
done