This reverts commit 3fa61c8b8780b8de14f821fd0768bdd0b82ce6da.
Remove this garbage from repository as it is not a toilet. This creates
corrupt records and prevents deployment to metrics.o.o.
After change from dropping entire database to individual measurements this
was overlooked and instead new points are simply written each time which
results in duplicates.
This allows for non-ephemeral queries to be cached indeinitely in the
future (like source queries that include the revision). For development
one may use --heavy-cache to be able to quickly iterate.
Savings of around 400MB per 10,000 requests. Using a named tuple was the
original approach for this reason, but the influxdb interface requires
dict()s and it seemed silly to spend time converting them. Additionally,
influxdb client already does batching.
Unfortunately, with the amount of data processed for Factory that will
continue to grow this approach is necessary. The dict() final structures
are buffered up to ~1000 before being written and released. Another
benefit of the batching is that influxdb does not allocate memory for the
entire incoming batch.
This prevents the memory required to process to be enough to load all
parsed request element trees at once. Instead only one page of requests
is loaded at a time and the memory freed after processed. The end result
is the memory consumption reduced by just over 20% (current Factory drops
by around 2.5GB).
A lot of room for improvement and additional metrics that can be extracted.
Including non-final state requests would allow for analyzing the current
staging state instead of only historical state. Additionally, the current
state can be used to present an activity log.
Handling incremental updates is non-trivial given the deltas are evaluated
and stored in sum state. A few possible approaches, but likely not worth
the hassle given the relatively short processing time and infrequent desire
to update data (daily at minimum).