Commit Graph

92 Commits

Author SHA1 Message Date
Stephan Kulow
4d1ca8d882 Also treat some more mimetypes as text 2022-11-11 16:22:18 +01:00
Stephan Kulow
7861a7e9b0 Fix LFS register (it needs json not data)
Refactored the LFS Oid handling in its class of its own and
add a way to recheck all LFS handles (or re-register)
2022-11-09 08:32:18 +01:00
coolo
f5b29886ae Merge pull request 'No longer rely on external service for LFS tracking' (#18) from add_lfs into main
Reviewed-on: https://gitea.opensuse.org/importers/git-importer/pulls/18
2022-11-08 11:00:34 +01:00
Stephan Kulow
9f6c8f62e7 Push to the remote when the repo changed 2022-11-08 09:32:03 +01:00
Stephan Kulow
3e1fbaa1c3 Migrate the ProxySHA256 data into postgresql DB
The calculation of the sha256 and the mimetype is local due to that
2022-11-07 21:50:31 +01:00
Stephan Kulow
e1b32999f0 Fix confusion about User constructor 2022-11-07 16:04:44 +01:00
Stephan Kulow
be8fb2ab94 Fix fake revision creation 2022-11-06 12:27:36 +01:00
Stephan Kulow
9e895e34b6 Adding a gitea remote when creating the git repo 2022-11-06 12:18:16 +01:00
Stephan Kulow
5e495dbd95 Fancy up the git commit message 2022-11-06 11:46:04 +01:00
Stephan Kulow
5ae02a413d Store API URL in the revision table
Will be important once we get into SLE
2022-11-06 10:57:32 +01:00
Stephan Kulow
f1457e8f8e Move git commit message creation into class 2022-11-06 10:16:42 +01:00
Stephan Kulow
9114c2fff8 Change debug output for downloading files 2022-11-06 10:16:42 +01:00
Stephan Kulow
834cf61634 Use proper user info in commits 2022-11-06 09:53:52 +01:00
Stephan Kulow
a294c0f670 Readd the skipping of _staging_workflow file
A repository with 150k commits is just very hard to work with -
especially if 99% of them are worthless
2022-11-06 08:29:17 +01:00
Stephan Kulow
7bc4d6c8b1 Make downloading a little more careful for races
As we're downloading packages in parallel, it could happen that we
copy a file that isn't fully copied yet
2022-11-06 08:24:11 +01:00
Stephan Kulow
bd5bd5a444 Don't reset the .gitattributes file
Just change it if it existed before
2022-11-04 21:02:18 +01:00
Stephan Kulow
4e1d5b42ca Only validate the MD5 if we downloaded - trust the file system 2022-11-04 21:02:18 +01:00
Stephan Kulow
0bcc0183c9 Load the proxy data for is_text as well
Otherwise the text state changes over time
2022-11-04 21:02:18 +01:00
Stephan Kulow
a457a16a50 Limit the workers to 8
This is hard coding the limit, we may want to make this configurable
but for now the machines supposed to run this code are very similiar
2022-11-04 10:00:28 +01:00
Stephan Kulow
33a5733cb9 Create the git repos in multiple processes
Threads appear to be too dangerous for this
2022-11-04 07:48:17 +01:00
Stephan Kulow
d21ce571f5 Refresh the packages in multiple threads 2022-11-03 22:04:45 +01:00
Stephan Kulow
ab38332642 Allow to import multiple packages in one go
This way we avoid duplicating all startup and SQL queries
2022-11-03 20:14:56 +01:00
Stephan Kulow
dd5e26b779 Clarify which of the candidates is the right one - removing assert 2022-11-03 15:29:58 +01:00
Stephan Kulow
f2019db8ff Ignore merge point candidates that create crosses
In OBS you can create submit requests for revisions that are behind
the last merge point, in git you can't - so we ignore them.

Fixes #14
2022-11-03 15:19:51 +01:00
Stephan Kulow
ed4b7367eb Reset branch if the devel branch is based on Factory
This happens in packages that change their devel project over time. Then
the commit in the devel project no longer has the parent in the devel branch
but is based on factory
2022-11-03 15:12:07 +01:00
Stephan Kulow
f5b3e42165 Add a test case that switches devel project in its life time 2022-11-03 15:06:12 +01:00
8aed76e52a
change cached file naming pattern 2022-11-03 14:22:19 +01:00
639096b548
optimize cached file locations and add option for cache directory 2022-11-03 14:12:32 +01:00
7678967ae0
implement file caching
to prevent having to download files multiple times
2022-11-03 14:05:11 +01:00
Stephan Kulow
1c54a74ecd Download the full revision 2022-11-02 20:55:09 +01:00
Stephan Kulow
c2294d6200 Add a default LFS .gitattributes for now
Otherwise some packages will break to import
2022-11-02 18:27:17 +01:00
Stephan Kulow
ba7436f10c Keep a reference to the database in DBRevision
To avoid passing the db to all actions
2022-11-02 18:27:09 +01:00
Stephan Kulow
172242891d Fix up some code after aplanas' continued review 2022-11-02 15:22:24 +01:00
Stephan Kulow
05cf792b26 Add the file_md5 to the download function so it can cache and verify 2022-11-02 13:35:45 +01:00
Stephan Kulow
a1ead29734 Extend documention and use some more pythonier loops 2022-11-02 13:29:18 +01:00
Stephan Kulow
05a5e6aea7 Don't refresh packages we're already looking at 2022-11-02 10:52:53 +01:00
Stephan Kulow
68ded48be1 Don't crash on user with missing realname either 2022-11-02 10:52:37 +01:00
Stephan Kulow
fce8aac001 Import creator users as well 2022-11-02 08:59:25 +01:00
Stephan Kulow
bbf1bc2fda Fetch source projects of requests
We do not care for current devel project, but for the projects we saw
requests from
2022-11-02 08:50:54 +01:00
Stephan Kulow
c4654dd896 Split GitExporter out of Importer class 2022-11-02 07:59:25 +01:00
Stephan Kulow
9de0d6e6c5 Rename Exporter to TestExporter to make it more obvious 2022-11-02 07:39:04 +01:00
Stephan Kulow
4ff9b9771a Split out Flat generator to be able to test it 2022-11-02 07:20:53 +01:00
Stephan Kulow
c94d13d74e Don't crash on packages without merges ever (very few packages) 2022-11-01 19:30:41 +01:00
Stephan Kulow
ab8120ca53 Don't crash on last_node 2022-11-01 19:02:29 +01:00
Stephan Kulow
b2cadb8c01 Don't crash on packages that didn't get updates in devel 2022-11-01 18:44:59 +01:00
Stephan Kulow
578fb2a30a Change tree pruning algorithm
The first merge we see in Factory determines if we keep the devel
commits in the factory chain or cut that branch.
2022-11-01 13:52:15 +01:00
Stephan Kulow
e6a401d8ac Remove old history handling 2022-11-01 11:37:30 +01:00
Stephan Kulow
9ed8abad2b Make database usage the default
Some cleanup of no longer used functions
2022-11-01 11:23:40 +01:00
Stephan Kulow
9554fea7e1 Reuse the repository directory by storing a state yaml
Not using the database for that so that removing the repository directory will
automatically recreate it
2022-11-01 11:22:58 +01:00
Stephan Kulow
2168c898a2 Add users not known to the FAKE_ACCOUNTS
This is technically incorrect but we need to handle them all the same anyway
2022-11-01 09:12:42 +01:00