Compare commits

...

44 Commits

Author SHA1 Message Date
Adam Majer
c8455c22dc Update Makefile 2024-08-24 17:13:35 +02:00
Adam Majer
94fdb3b442 Revert "Remove devel branch import"
This reverts commit 1318f9e0c46a0f0569fc77535703cc1b64bba57b.
2024-08-11 22:54:47 +02:00
Adam Majer
f2358446da don't create gitea repos with importer 2024-08-11 22:54:44 +02:00
9660e633af Parametrizes git import parameters 2024-08-08 17:56:41 +02:00
85b9ed5e75 disable LFS check for testing 2024-08-08 17:56:17 +02:00
86f82325d8 Stop importing/exporting scmsync packages/projects
Also, allow other-than Factory projects
2024-08-08 10:35:53 +02:00
Dirk Mueller
39ba616226 Merge pull request 'Add ability to specify non-Factory' (#31) from adamm/git-importer:option_for_non_factory into main
Reviewed-on: importers/git-importer#31
Reviewed-by: Dirk Mueller <dirkmueller@noreply@src.opensuse.org>
2024-08-07 18:27:11 +02:00
531dbc7c1b Add ability to specify non-Factory
This is important for devel-project only imports
non-factory is still blocked by assert
2024-08-07 16:55:05 +02:00
Dirk Müller
1318f9e0c4
Remove devel branch import
this for yet undefined reason screws up systemd history import
2024-08-07 09:47:54 +02:00
Dirk Müller
d563076d9e
add explicit conversion to string to fix the concatenation 2024-08-07 09:47:18 +02:00
b11b3f1adb
Add and remove literal files
pathspec in git has special characters that we should not trigger.
Assume every filespec as literal
2024-08-01 16:53:46 +02:00
Dirk Müller
479738d4b2
ruff format run 2024-07-10 10:34:20 +02:00
Adam Majer
2d04136ca5
Make sure we create devel branch, when no diff to Factory 2024-06-13 15:36:59 +02:00
Adam Majer
40ad64ddff
Ignore .osc directory 2024-06-10 18:13:51 +02:00
Adam Majer
6bd5d72100
New branch is empty
New branches must be born empty
2024-06-10 17:06:15 +02:00
Dirk Müller
022ae5ab58
remember failed tasks in a separate directory 2024-06-10 17:04:43 +02:00
Dirk Müller
2ff8ed76d0
Reconnect to the AMQP bus when the connection breaks down 2024-06-10 17:04:25 +02:00
Dirk Müller
5f228dc046
enable robust push 2024-05-17 21:47:35 +02:00
Dirk Müller
4e07d8272e
don't loop over failed packages 2024-05-17 21:47:15 +02:00
Dirk Müller
2a3475ab6e
Create with sha256 enabled 2024-05-17 20:39:55 +02:00
Dirk Müller
574bc9aa10
Avoid guessing in switch 2024-05-17 20:07:16 +02:00
Dirk Müller
0414b33206
Fix testing for origin
The previous code path was untested and not working
2024-05-17 20:06:25 +02:00
Dirk Müller
b9670821a9
Only init the repository if it doesn't exist already
harmless, but avoids a scary warning
2024-05-17 20:05:54 +02:00
Dirk Müller
073550825c
Fixups to improve the conversion process 2024-05-17 14:41:42 +02:00
Dirk Müller
5a353c98d3
Add tasks 2024-05-17 11:46:18 +02:00
Dirk Müller
1fc466d15b
Add monitor for commits 2024-05-17 11:40:19 +02:00
Dirk Müller
39fde7744a
Code cleanup 2024-05-16 15:47:45 +02:00
Dirk Müller
f5ffc83a69
Remove double quoting of url parameters
makeurl quotes by itself, so this was messing it up
2024-05-16 11:49:14 +02:00
Dirk Müller
d0ccf83684
Revert "Try to fetch the element as deleted if initial access failed"
The OBS api has been fixed to provide an automatic fallback via
https://github.com/openSUSE/open-build-service/pull/15655

This reverts commit c9e07e536f19820c4bba1f11e2edcb23069874d7.
2024-05-16 11:49:14 +02:00
Dirk Müller
b0ffb01c59
cleanups 2024-05-16 11:49:14 +02:00
Dirk Müller
28d5c6e606
Switch to psycopg rather than psycopg2
It's a bit more modern and uses dedicated c bindings
2024-05-16 11:49:14 +02:00
Dirk Mueller
1e22c2895a Merge pull request 'Switch to sha-256 git repo and use git tools again' (#23) from adamm/git-importer:main into main
Reviewed-on: importers/git-importer#23
2024-05-16 11:48:36 +02:00
Adam Majer
5da7861c2a Switch to sha-256 git repo and use git tools again 2024-04-09 11:40:26 +02:00
Dirk Müller
c9e07e536f
Try to fetch the element as deleted if initial access failed
The reference to the object might be already deleted by when the
request is failing. plus setting deleted=0 is rejected by the API.
So try with deleted=1 if and only if the previous access failed.
2023-12-07 18:30:36 +01:00
Dirk Müller
dc0f33354e
Failing to LFS register should abort the import 2023-12-07 18:29:56 +01:00
Dirk Müller
56cbe0a125
Avoid multi-threading races on import
There seems to be races when using db cursors from multiple threads. as
found by import issues after switching to a newer computer that has
performance and energy efficient cores.

As this is not particularly performance critical, convert to single
threaded use which makes it work again
2023-11-28 23:36:44 +01:00
Dirk Müller
4353f015c8
Switch to localhost:9999 which is provided via a ssh tunnel
The port is no longer directly exposed, so we need to ssh tunnel it
2023-11-22 14:39:55 +01:00
Dirk Müller
9cbe0899bc
Remove unused import 2023-06-19 13:19:52 +02:00
Dirk Müller
9e80a64fe0
Change hostname references from gitea.opensuse.org to src.opensuse.org 2023-06-19 10:59:56 +02:00
Dirk Müller
12001b1640
Commit local changes 2023-04-18 22:31:38 +02:00
Stephan Kulow
3797ea178a Merge pull request 'Add a list of packages no longer existing' (#22) from add_gone into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/22
2023-02-09 10:23:35 +01:00
Stephan Kulow
999dcabcfa Add a list of packages no longer existing
I made this a file and not a DB that is automatically maintained as I think
for now adding an entry in there should be done manually - OBS being OBS
packages might look have gone for a brief moment and reappar the day after.
2022-12-02 11:00:31 +01:00
9962673eff Merge pull request 'Add force push for the devel branch' (#21) from add_force into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/21
2022-12-02 09:35:40 +01:00
Stephan Kulow
7b20c03256 Add force push for the devel branch
As devel branches can change in case of factory reverts we need to force
push. Factory branch shouldn't be affected, so not force pushing there
2022-12-02 09:12:11 +01:00
17 changed files with 1701 additions and 152 deletions

View File

@ -9,5 +9,5 @@ test:
update-packages: update-packages:
f=$$(mktemp) ;\ f=$$(mktemp) ;\
osc api /source/openSUSE:Factory?view=info | grep -v lsrcmd5 | grep srcmd5= | sed -e 's,.*package=",,; s,".*,,' | grep -v : > $$f ;\ osc api /source/openSUSE:Factory?view=info | grep -v lsrcmd5 | grep srcmd5= | sed -e 's,.*package=",,; s,".*,,' | grep -v : > $$f ;\
echo _project >> $$f ;\ echo _project >> $$f;\
mv $$f packages mv $$f packages

View File

@ -1,5 +1,18 @@
sudo zypper in python3-psycopg2 Installation
sudo su - postgres ------------
# `createdb -O <LOCAL_USER> imported_git`
sudo zypper in python3-psycopg
sudo su - postgres
createdb -O <LOCAL_USER> imported_git`
To reset the database, drop table scheme To reset the database, drop table scheme
Gitea parameters
----------------
* `GITEA_HOST` - default: src.opensuse.org
* `GITEA_USER` - Used to generate SSH links for push. Default: gitea
* `GITEA_ORG` - target organization to push to
* `GITEA_DEFAULT_BRANCH` - default branch

View File

@ -42,8 +42,8 @@ PROJECTS = [
] ]
def export_package(package, repodir, cachedir, gc): def export_package(project, package, repodir, cachedir, gc):
exporter = GitExporter(URL_OBS, "openSUSE:Factory", package, repodir, cachedir) exporter = GitExporter(URL_OBS, project, package, repodir, cachedir)
exporter.set_gc_interval(gc) exporter.set_gc_interval(gc)
exporter.export_as_git() exporter.export_as_git()
@ -51,6 +51,12 @@ def export_package(package, repodir, cachedir, gc):
def main(): def main():
parser = argparse.ArgumentParser(description="OBS history importer into git") parser = argparse.ArgumentParser(description="OBS history importer into git")
parser.add_argument("packages", help="OBS package names", nargs="*") parser.add_argument("packages", help="OBS package names", nargs="*")
parser.add_argument(
"-p",
"--project",
default="openSUSE:Factory",
help="Project to import/export, default is openSUSE:Factory",
)
parser.add_argument( parser.add_argument(
"-r", "-r",
"--repodir", "--repodir",
@ -110,10 +116,13 @@ def main():
if not args.cachedir: if not args.cachedir:
args.cachedir = pathlib.Path("~/.cache/git-import/").expanduser() args.cachedir = pathlib.Path("~/.cache/git-import/").expanduser()
importer = Importer(URL_OBS, "openSUSE:Factory", args.packages) importer = Importer(URL_OBS, args.project, args.packages)
importer.import_into_db() importer.import_into_db()
for package in args.packages: for package in args.packages:
export_package(package, args.repodir, args.cachedir, args.gc) if not importer.package_with_scmsync(package):
export_package(args.project, package, args.repodir, args.cachedir, args.gc)
else:
logging.debug(f"{args.project}/{package} has scmsync links - skipping export")
if __name__ == "__main__": if __name__ == "__main__":

1355
gone-packages.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -14,8 +14,6 @@ def config(filename="database.ini", section="production"):
for param in params: for param in params:
db[param[0]] = param[1] db[param[0]] = param[1]
else: else:
raise Exception( raise Exception(f"Section {section} not found in the {filename} file")
"Section {0} not found in the {1} file".format(section, filename)
)
return db return db

View File

@ -1,7 +1,6 @@
import logging import logging
import psycopg2 import psycopg
from psycopg2.extras import LoggingConnection
from lib.config import config from lib.config import config
@ -17,22 +16,20 @@ class DB:
# read the connection parameters # read the connection parameters
params = config(section=self.config_section) params = config(section=self.config_section)
# connect to the PostgreSQL server # connect to the PostgreSQL server
self.conn = psycopg2.connect(connection_factory=LoggingConnection, **params) self.conn = psycopg.connect(conninfo=f"dbname={params['database']}")
logger = logging.getLogger(__name__) logging.getLogger("psycopg.pool").setLevel(logging.INFO)
self.conn.initialize(logger)
except (Exception, psycopg2.DatabaseError) as error: except (Exception, psycopg.DatabaseError) as error:
print(error) print(error)
raise error raise error
def schema_version(self): def schema_version(self):
# create a cursor # create a cursor
with self.conn.cursor() as cur: with self.conn.cursor() as cur:
# execute a statement # execute a statement
try: try:
cur.execute("SELECT MAX(version) from scheme") cur.execute("SELECT MAX(version) from scheme")
except psycopg2.errors.UndefinedTable as error: except psycopg.errors.UndefinedTable:
cur.close() cur.close()
self.close() self.close()
self.connect() self.connect()
@ -273,7 +270,7 @@ class DB:
cur.execute(command) cur.execute(command)
# commit the changes # commit the changes
self.conn.commit() self.conn.commit()
except (Exception, psycopg2.DatabaseError) as error: except (Exception, psycopg.DatabaseError) as error:
print(error) print(error)
self.close() self.close()
raise error raise error

View File

@ -2,7 +2,6 @@ from __future__ import annotations
from hashlib import md5 from hashlib import md5
from pathlib import Path from pathlib import Path
from typing import Optional
from lib.db import DB from lib.db import DB
from lib.obs_revision import OBSRevision from lib.obs_revision import OBSRevision
@ -255,7 +254,7 @@ class DBRevision:
self._files.sort(key=lambda x: x["name"]) self._files.sort(key=lambda x: x["name"])
return self._files return self._files
def calc_delta(self, current_rev: Optional[DBRevision]): def calc_delta(self, current_rev: DBRevision | None):
"""Calculate the list of files to download and to delete. """Calculate the list of files to download and to delete.
Param current_rev is the revision that's currently checked out. Param current_rev is the revision that's currently checked out.
If it's None, the repository is empty. If it's None, the repository is empty.

View File

@ -4,7 +4,6 @@ import os
import pathlib import pathlib
import subprocess import subprocess
import pygit2
import requests import requests
from lib.binary import BINARY from lib.binary import BINARY
@ -20,11 +19,6 @@ class Git:
self.committer = committer self.committer = committer
self.committer_email = committer_email self.committer_email = committer_email
self.repo = None
def is_open(self):
return self.repo is not None
def exists(self): def exists(self):
"""Check if the path is a valid git repository""" """Check if the path is a valid git repository"""
return (self.path / ".git").exists() return (self.path / ".git").exists()
@ -34,36 +28,70 @@ class Git:
self.path.mkdir(parents=True, exist_ok=True) self.path.mkdir(parents=True, exist_ok=True)
self.open() self.open()
def git_run(self, args, **kwargs):
"""Run a git command"""
if "env" in kwargs:
envs = kwargs["env"].copy()
del kwargs["env"]
else:
envs = os.environ.copy()
envs["GIT_LFS_SKIP_SMUDGE"] = "1"
envs["GIT_CONFIG_GLOBAL"] = "/dev/null"
return subprocess.run(
["git"] + args,
cwd=self.path,
check=True,
env=envs,
**kwargs,
)
def open(self): def open(self):
# Convert the path to string, to avoid some limitations in if not self.exists():
# older pygit2 self.git_run(["init", "--object-format=sha256", "-b", "factory"])
self.repo = pygit2.init_repository(str(self.path)) self.git_run(["config", "lfs.allowincompletepush", "true"])
def is_dirty(self): def is_dirty(self):
"""Check if there is something to commit""" """Check if there is something to commit"""
assert self.is_open() status_str = self.git_run(
["status", "--porcelain=2"],
return self.repo.status() stdout=subprocess.PIPE,
).stdout.decode("utf-8")
return len(list(filter(None, status_str.split("\n")))) > 0
def branches(self): def branches(self):
return list(self.repo.branches) br = (
self.git_run(
["for-each-ref", "--format=%(refname:short)", "refs/heads/"],
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.split()
)
if len(br) == 0:
br.append("factory") # unborn branch?
return br
def branch(self, branch, commit=None): def branch(self, branch, commit="HEAD"):
if not commit: commit = (
commit = self.repo.head self.git_run(
else: ["rev-parse", "--verify", "--end-of-options", commit + "^{commit}"],
commit = self.repo.get(commit) stdout=subprocess.PIPE,
self.repo.branches.local.create(branch, commit) )
.stdout.decode("utf-8")
.strip()
)
return self.git_run(["branch", branch, commit])
def checkout(self, branch): def checkout(self, branch):
"""Checkout into the branch HEAD""" """Checkout into the branch HEAD"""
new_branch = False new_branch = False
ref = f"refs/heads/{branch}"
if branch not in self.branches(): if branch not in self.branches():
self.repo.references["HEAD"].set_target(ref) self.git_run(["switch", "-q", "--orphan", branch])
new_branch = True new_branch = True
else: else:
self.repo.checkout(ref) ref = f"refs/heads/{branch}"
if (self.path / ".git" / ref).exists():
self.git_run(["switch", "--no-guess", "-q", branch])
return new_branch return new_branch
def commit( def commit(
@ -87,51 +115,79 @@ class Git:
committer_time = committer_time if committer_time else user_time committer_time = committer_time if committer_time else user_time
if self.is_dirty(): if self.is_dirty():
self.repo.index.add_all() self.git_run(["add", "--all", "."])
self.repo.index.write() tree_id = (
author = pygit2.Signature(user, user_email, int(user_time.timestamp())) self.git_run(["write-tree"], stdout=subprocess.PIPE)
committer = pygit2.Signature( .stdout.decode("utf-8")
committer, committer_email, int(committer_time.timestamp()) .strip()
) )
tree = self.repo.index.write_tree() parent_array = []
return self.repo.create_commit( if isinstance(parents, list):
"HEAD", author, committer, message, tree, parents for parent in filter(None, parents):
parent_array = parent_array + ["-p", parent]
elif isinstance(parents, str):
parent_array = ["-p", parents]
commit_id = (
self.git_run(
["commit-tree"] + parent_array + [tree_id],
env={
"GIT_AUTHOR_NAME": user,
"GIT_AUTHOR_EMAIL": user_email,
"GIT_AUTHOR_DATE": f"{int(user_time.timestamp())} +0000",
"GIT_COMMITTER_NAME": committer,
"GIT_COMMITTER_EMAIL": committer_email,
"GIT_COMMITTER_DATE": f"{int(committer_time.timestamp())} +0000",
},
input=message.encode("utf-8"),
stdout=subprocess.PIPE,
)
.stdout.decode("utf-8")
.rstrip()
) )
self.git_run(["reset", "--soft", commit_id])
return commit_id
def last_commit(self): def branch_head(self, branch="HEAD"):
try: return (
return self.repo.head.target self.git_run(
except: ["rev-parse", "--verify", "--end-of-options", branch],
return None stdout=subprocess.PIPE,
)
def branch_head(self, branch): .stdout.decode("utf-8")
return self.repo.references["refs/heads/" + branch].target .strip()
)
def set_branch_head(self, branch, commit): def set_branch_head(self, branch, commit):
self.repo.references["refs/heads/" + branch].set_target(commit) return self.git_run(["update-ref", f"refs/heads/{branch}", commit])
def gc(self): def gc(self):
logging.debug(f"Garbage recollect and repackage {self.path}") logging.debug(f"Garbage recollect and repackage {self.path}")
subprocess.run( self.git_run(
["git", "gc", "--auto"], ["gc", "--auto"],
cwd=self.path,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT,
) )
def clean(self): # def clean(self):
for path, _ in self.repo.status().items(): # for path, _ in self.repo.status().items():
logging.debug(f"Cleaning {path}") # logging.debug(f"Cleaning {path}")
try: # try:
(self.path / path).unlink() # (self.path / path).unlink()
self.repo.index.remove(path) # self.repo.index.remove(path)
except Exception as e: # except Exception as e:
logging.warning(f"Error removing file {path}: {e}") # logging.warning(f"Error removing file {path}: {e}")
def add(self, filename): def add(self, filename):
self.repo.index.add(filename) self.git_run(["add", ":(literal)" + str(filename)])
def add_default_gitignore(self):
if not (self.path / ".gitignore").exists():
with (self.path / ".gitignore").open("w") as f:
f.write(".osc\n")
self.add(".gitignore")
def add_default_lfs_gitattributes(self, force=False): def add_default_lfs_gitattributes(self, force=False):
if not (self.path / ".gitattributes").exists() or force: if not (self.path / ".gitattributes").exists() or force:
@ -185,9 +241,9 @@ class Git:
return any(fnmatch.fnmatch(filename, line) for line in patterns) return any(fnmatch.fnmatch(filename, line) for line in patterns)
def remove(self, file: pathlib.Path): def remove(self, file: pathlib.Path):
self.repo.index.remove(file.name) self.git_run(
(self.path / file).unlink() ["rm", "-q", "-f", "--ignore-unmatch", ":(literal)" + file.name],
)
patterns = self.get_specific_lfs_gitattributes() patterns = self.get_specific_lfs_gitattributes()
if file.name in patterns: if file.name in patterns:
patterns.remove(file.name) patterns.remove(file.name)
@ -196,15 +252,27 @@ class Git:
def add_gitea_remote(self, package): def add_gitea_remote(self, package):
repo_name = package.replace("+", "_") repo_name = package.replace("+", "_")
org_name = "rpm" org_name = "rpm"
gitea_user = "gitea"
gitea_host = "src.opensuse.org"
default_branch = "factory"
if os.getenv("GITEA_HOST"):
gitea_host = getenv("GITEA_HOST")
if os.getenv("GITEA_USER"):
gitea_user = getenv("GITEA_USER")
if os.getenv("GITEA_ORG"):
org_name = getenv("GITEA_ORG")
if os.getenv("GITEA_DEFAULT_BRANCH"):
default_branch = getenv("GITEA_DEFAULT_BRANCH")
if not os.getenv("GITEA_TOKEN"): if not os.getenv("GITEA_TOKEN"):
logging.warning("Not adding a remote due to missing $GITEA_TOKEN") logging.warning("Not adding a remote due to missing $GITEA_TOKEN")
return return
url = f"https://gitea.opensuse.org/api/v1/org/{org_name}/repos" url = f"https://{gitea_host}/api/v1/org/{org_name}/repos"
response = requests.post( response = requests.post(
url, url,
data={"name": repo_name}, data={"name": repo_name, "object_format_name": "sha256", "default_branch": default_branch},
headers={"Authorization": f"token {os.getenv('GITEA_TOKEN')}"}, headers={"Authorization": f"token {os.getenv('GITEA_TOKEN')}"},
timeout=10, timeout=10,
) )
@ -212,16 +280,21 @@ class Git:
# 201 Created # 201 Created
if response.status_code not in (201, 409): if response.status_code not in (201, 409):
print(response.data) print(response.data)
url = f"gitea@gitea.opensuse.org:{org_name}/{repo_name}.git" url = f"{gitea_user}@{gitea_host}:{org_name}/{repo_name}.git"
self.repo.remotes.create("origin", url) self.git_run(
["remote", "add", "origin", url],
)
def push(self): def push(self, force=False):
remo = self.repo.remotes["origin"] if "origin" not in self.git_run(
["remote"],
stdout=subprocess.PIPE,
).stdout.decode("utf-8"):
logging.warning("Not pushing to remote because no 'origin' configured")
return
keypair = pygit2.KeypairFromAgent("gitea") cmd = ["push"]
callbacks = pygit2.RemoteCallbacks(credentials=keypair) if force:
cmd.append("-f")
refspecs = ["refs/heads/factory"] cmd += ["origin", "--all"]
if "refs/heads/devel" in self.repo.references: self.git_run(cmd)
refspecs.append("refs/heads/devel")
remo.push(refspecs, callbacks=callbacks)

View File

@ -29,7 +29,7 @@ class GitExporter:
self.git.open() self.git.open()
else: else:
self.git.create() self.git.create()
self.git.add_gitea_remote(package) # self.git.add_gitea_remote(package)
self.state_file = os.path.join(self.git.path, ".git", "_flat_state.yaml") self.state_file = os.path.join(self.git.path, ".git", "_flat_state.yaml")
self.gc_interval = 200 self.gc_interval = 200
self.cachedir = cachedir self.cachedir = cachedir
@ -40,9 +40,9 @@ class GitExporter:
def check_repo_state(self, flats, branch_state): def check_repo_state(self, flats, branch_state):
state_data = dict() state_data = dict()
if os.path.exists(self.state_file): if os.path.exists(self.state_file):
with open(self.state_file, "r") as f: with open(self.state_file) as f:
state_data = yaml.safe_load(f) state_data = yaml.safe_load(f)
if type(state_data) != dict: if not isinstance(state_data, dict):
state_data = {} state_data = {}
left_to_commit = [] left_to_commit = []
for flat in reversed(flats): for flat in reversed(flats):
@ -86,7 +86,12 @@ class GitExporter:
logging.debug(f"Committing {flat}") logging.debug(f"Committing {flat}")
self.commit_flat(flat, branch_state) self.commit_flat(flat, branch_state)
self.git.push() # make sure that we create devel branch
if not branch_state["devel"]:
logging.debug("force creating devel")
self.git.set_branch_head("devel", self.git.branch_head("factory"))
self.git.push(force=True)
def run_gc(self): def run_gc(self):
self.gc_cnt = self.gc_interval self.gc_cnt = self.gc_interval
@ -150,6 +155,7 @@ class GitExporter:
# create file if not existant # create file if not existant
self.git.add_default_lfs_gitattributes(force=False) self.git.add_default_lfs_gitattributes(force=False)
self.git.add_default_gitignore()
to_download, to_delete = flat.commit.calc_delta(branch_state[flat.branch]) to_download, to_delete = flat.commit.calc_delta(branch_state[flat.branch])
for file in to_delete: for file in to_delete:

View File

@ -1,5 +1,5 @@
import concurrent.futures
import logging import logging
import pathlib
import xml.etree.ElementTree as ET import xml.etree.ElementTree as ET
from lib.db import DB from lib.db import DB
@ -26,11 +26,15 @@ class Importer:
# Import multiple Factory packages into the database # Import multiple Factory packages into the database
self.packages = packages self.packages = packages
self.project = project self.project = project
self.scmsync_cache = dict()
self.packages_with_scmsync = set()
self.db = DB() self.db = DB()
self.obs = OBS(api_url) self.obs = OBS(api_url)
assert project == "openSUSE:Factory" assert not self.has_scmsync(project)
self.refreshed_packages = set() self.refreshed_packages = set()
self.gone_packages_set = None
def import_request(self, number): def import_request(self, number):
self.obs.request(number).import_into_db(self.db) self.obs.request(number).import_into_db(self.db)
@ -161,10 +165,12 @@ class Importer:
(rev.dbid, linked.dbid), (rev.dbid, linked.dbid),
) )
def revisions_without_files(self): def revisions_without_files(self, package):
logging.debug(f"revisions_without_files({package})")
with self.db.cursor() as cur: with self.db.cursor() as cur:
cur.execute( cur.execute(
"SELECT * FROM revisions WHERE broken=FALSE AND expanded_srcmd5 IS NULL" "SELECT * FROM revisions WHERE package=%s AND broken=FALSE AND expanded_srcmd5 IS NULL",
(package,),
) )
return [DBRevision(self.db, row) for row in cur.fetchall()] return [DBRevision(self.db, row) for row in cur.fetchall()]
@ -178,11 +184,11 @@ class Importer:
linked_rev = cur.fetchone() linked_rev = cur.fetchone()
if linked_rev: if linked_rev:
linked_rev = linked_rev[0] linked_rev = linked_rev[0]
list = self.obs.list( obs_dir_list = self.obs.list(
rev.project, rev.package, rev.unexpanded_srcmd5, linked_rev rev.project, rev.package, rev.unexpanded_srcmd5, linked_rev
) )
if list: if obs_dir_list:
rev.import_dir_list(list) rev.import_dir_list(obs_dir_list)
md5 = rev.calculate_files_hash() md5 = rev.calculate_files_hash()
with self.db.cursor() as cur: with self.db.cursor() as cur:
cur.execute( cur.execute(
@ -196,53 +202,47 @@ class Importer:
self.find_linked_revs() self.find_linked_revs()
self.find_fake_revisions() self.find_fake_revisions()
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: for package in self.packages:
fs = [ for rev in self.revisions_without_files(package):
executor.submit(import_rev, self, rev) print(f"rev {rev} is without files")
for rev in self.revisions_without_files() self.import_rev(rev)
]
concurrent.futures.wait(fs)
def refresh_package(self, project, package): def refresh_package(self, project, package):
key = f"{project}/{package}" key = f"{project}/{package}"
if key in self.refreshed_packages: if key in self.refreshed_packages:
# refreshing once is good enough # refreshing once is good enough
return return
if self.package_gone(key):
return
logging.debug(f"Refresh {project}/{package}") logging.debug(f"Refresh {project}/{package}")
self.refreshed_packages.add(key) self.refreshed_packages.add(key)
if self.has_scmsync(project) or self.has_scmsync(key):
self.packages_with_scmsync.add(package)
logging.debug(f"{project}/{package} already in Git - skipping")
return
self.update_db_package(project, package) self.update_db_package(project, package)
self.fetch_all_linked_packages(project, package) self.fetch_all_linked_packages(project, package)
def import_into_db(self): def import_into_db(self):
for package in self.packages:
refresh_package(self, self.project, package)
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: self.db.conn.commit()
fs = [
executor.submit(refresh_package, self, self.project, package)
for package in self.packages
]
concurrent.futures.wait(fs)
self.db.conn.commit() for number in DBRevision.requests_to_fetch(self.db):
self.import_request(number)
fs = [ self.db.conn.commit()
executor.submit(import_request, self, number)
for number in DBRevision.requests_to_fetch(self.db)
]
concurrent.futures.wait(fs)
self.db.conn.commit() with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
for project, package in cur.fetchall():
self.refresh_package(project, package)
with self.db.cursor() as cur:
cur.execute(
"""SELECT DISTINCT source_project,source_package FROM requests
WHERE id IN (SELECT request_id FROM revisions WHERE project=%s and package = ANY(%s));""",
(self.project, self.packages),
)
fs = [
executor.submit(refresh_package, self, project, package)
for project, package in cur.fetchall()
]
concurrent.futures.wait(fs)
self.db.conn.commit() self.db.conn.commit()
missing_users = User.missing_users(self.db) missing_users = User.missing_users(self.db)
@ -254,3 +254,26 @@ class Importer:
self.fill_file_lists() self.fill_file_lists()
self.db.conn.commit() self.db.conn.commit()
def package_gone(self, key):
if not self.gone_packages_set:
self.gone_packages_set = set()
with open(pathlib.Path(__file__).parent.parent / "gone-packages.txt") as f:
for line in f.readlines():
self.gone_packages_set.add(line.strip())
return key in self.gone_packages_set
def has_scmsync(self, key):
if key in self.scmsync_cache:
return self.scmsync_cache[key]
root = self.obs._meta(key)
scmsync_exists = False
if root is not None:
scmsync_exists = root.find('scmsync') is not None
self.scmsync_cache[key] = scmsync_exists
return scmsync_exists
def package_with_scmsync(self, package):
return package in self.packages_with_scmsync

View File

@ -83,7 +83,8 @@ class LFSOid:
self.register() self.register()
def check(self): def check(self):
url = f"http://gitea.opensuse.org:9999/check/{self.sha256}/{self.size}" return True
url = f"http://localhost:9999/check/{self.sha256}/{self.size}"
response = requests.get( response = requests.get(
url, url,
timeout=10, timeout=10,
@ -127,12 +128,13 @@ class LFSOid:
"size": self.size, "size": self.size,
} }
url = "http://gitea.opensuse.org:9999/register" url = "http://localhost:9999/register"
response = requests.post( response = requests.post(
url, url,
json=data, json=data,
timeout=10, timeout=10,
) )
response.raise_for_status()
logging.info(f"Register LFS returned {response.status_code}") logging.info(f"Register LFS returned {response.status_code}")

View File

@ -73,11 +73,11 @@ class OBS:
logging.debug(f"GET {url}") logging.debug(f"GET {url}")
return ET.parse(osc.core.http_GET(url)).getroot() return ET.parse(osc.core.http_GET(url)).getroot()
def _meta(self, project, package, **params): def _meta(self, key, **params):
try: try:
root = self._xml(f"source/{project}/{package}/_meta", **params) root = self._xml(f"source/{key}/_meta", **params)
except HTTPError: except HTTPError:
logging.error(f"Package [{project}/{package} {params}] has no meta") logging.error(f"Project/Package [{key} {params}] has no meta")
return None return None
return root return root
@ -118,13 +118,13 @@ class OBS:
return root return root
def exists(self, project, package): def exists(self, project, package):
root = self._meta(project, package) root = self._meta(f"{project}/{package}")
if root is None: if root is None:
return False return False
return root.get("project") == project return root.get("project") == project
def devel_project(self, project, package): def devel_project(self, project, package):
root = self._meta(project, package) root = self._meta(f"{project}/{package}")
devel = root.find("devel") devel = root.find("devel")
if devel is None: if devel is None:
return None return None
@ -150,7 +150,7 @@ class OBS:
def _download(self, project, package, name, revision): def _download(self, project, package, name, revision):
url = osc.core.makeurl( url = osc.core.makeurl(
self.url, self.url,
["source", project, package, urllib.parse.quote(name)], ["source", project, package, name],
{"rev": revision, "expand": 1}, {"rev": revision, "expand": 1},
) )
return osc.core.http_GET(url) return osc.core.http_GET(url)
@ -165,7 +165,6 @@ class OBS:
cachedir: str, cachedir: str,
file_md5: str, file_md5: str,
) -> None: ) -> None:
cached_file = self._path_from_md5(name, cachedir, file_md5) cached_file = self._path_from_md5(name, cachedir, file_md5)
if not self.in_cache(name, cachedir, file_md5): if not self.in_cache(name, cachedir, file_md5):
with (dirpath / name).open("wb") as f: with (dirpath / name).open("wb") as f:

View File

@ -7,8 +7,6 @@ except:
print("Install python3-python-magic, not python3-magic") print("Install python3-python-magic, not python3-magic")
raise raise
import requests
from lib.db import DB from lib.db import DB
from lib.lfs_oid import LFSOid from lib.lfs_oid import LFSOid
from lib.obs import OBS from lib.obs import OBS
@ -43,7 +41,6 @@ class ProxySHA256:
} }
def put(self, project, package, name, revision, file_md5, size): def put(self, project, package, name, revision, file_md5, size):
if not self.mime: if not self.mime:
self.mime = magic.Magic(mime=True) self.mime = magic.Magic(mime=True)

View File

@ -1,4 +1,3 @@
from typing import Dict
from xmlrpc.client import Boolean from xmlrpc.client import Boolean
from lib.db_revision import DBRevision from lib.db_revision import DBRevision
@ -114,7 +113,7 @@ class TreeBuilder:
candidates.append(node) candidates.append(node)
if node.merged_into: if node.merged_into:
# we can't have candidates that are crossing previous merges # we can't have candidates that are crossing previous merges
# see https://gitea.opensuse.org/importers/git-importer/issues/14 # see https://src.opensuse.org/importers/git-importer/issues/14
candidates = [] candidates = []
node = node.parent node = node.parent
if candidates: if candidates:
@ -138,7 +137,7 @@ class TreeBuilder:
self.requests.add(node.revision.request_id) self.requests.add(node.revision.request_id)
class FindMergeWalker(AbstractWalker): class FindMergeWalker(AbstractWalker):
def __init__(self, builder: TreeBuilder, requests: Dict) -> None: def __init__(self, builder: TreeBuilder, requests: dict) -> None:
super().__init__() super().__init__()
self.source_revisions = dict() self.source_revisions = dict()
self.builder = builder self.builder = builder

59
opensuse-monitor.py Executable file
View File

@ -0,0 +1,59 @@
#!/usr/bin/python3
import json
from pathlib import Path
import pika
import random
import time
MY_TASKS_DIR = Path(__file__).parent / "tasks"
def listen_events():
connection = pika.BlockingConnection(
pika.URLParameters("amqps://opensuse:opensuse@rabbit.opensuse.org")
)
channel = connection.channel()
channel.exchange_declare(
exchange="pubsub", exchange_type="topic", passive=True, durable=False
)
result = channel.queue_declare("", exclusive=True)
queue_name = result.method.queue
channel.queue_bind(
exchange="pubsub", queue=queue_name, routing_key="opensuse.obs.package.commit"
)
print(" [*] Waiting for logs. To exit press CTRL+C")
def callback(ch, method, properties, body):
if method.routing_key not in ("opensuse.obs.package.commit",):
return
body = json.loads(body)
if (
"project" in body
and "package" in body
and body["project"] == "openSUSE:Factory"
):
if "/" in body["package"]:
return
(MY_TASKS_DIR / body["package"]).touch()
print(" [x] %r:%r" % (method.routing_key, body["package"]))
channel.basic_consume(queue_name, callback, auto_ack=True)
channel.start_consuming()
def main():
while True:
try:
listen_events()
except (pika.exceptions.ConnectionClosed, pika.exceptions.AMQPHeartbeatTimeout):
time.sleep(random.randint(10, 100))
if __name__ == "__main__":
main()

1
tasks/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
*

19
update-tasks.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
#
cd /space/dmueller/git-importer
source credentials.sh
while true; do
for i in $PWD/tasks/*; do
if test -f "$i"; then
echo "$(date): Importing $(basename $i)"
if ! python3 ./git-importer.py -c repos/.cache $(basename $i); then
mkdir -p $PWD/failed-tasks
mv -f $i $PWD/failed-tasks
fi
rm -f $i
fi
done
inotifywait -q -e create $PWD/tasks
done