Migrating Devnet/Mainnet Archive to Berkeley Archive
Before you start the process to migrate your archive database from the current Mainnet or Devnet format to Berkeley, be sure that you:
- Understand the Archive Migration
- Meet the foundational requirements in Archive migration prerequisites
- Have successfully installed the archive migration package
Migration process
The Devnet/Mainnet migration can take up to a couple of days. Therefore, you can achieve a successful migration by using three stages:
Stage 1: Initial migration
Stage 2: Incremental migration
Stage 3: Remainder migration
Each stage has three migration phases:
Phase 1: Copying data and precomputed blocks from Devnet/Mainnet database using the berkeley_migration app.
Phase 2: Populating new Berkeley tables using the replayer app in migration mode
Phase 3: Additional validation for migrated database
Review these phases and stages before you start the migration.
Simplified approach
For convenience, use the mina-berkeley-migration-script
app if you do not need to delve into the details of migration or if your environment does not require a special approach to migration.
Stage 1: Initial migration
mina-berkeley-migration-script \
initial \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--checkpoint-interval 10000 \
--checkpoint-output-path . \
--precomputed-blocks-local-path . \
--network NETWORK
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-i | --checkpoint-interval
: frequency of dumping checkpoint expressed in blocks count
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
The command output is the migration-replayer-XXX.json
file required for the next run.
Stage 2: Incremental migration
mina-berkeley-migration-script \
incremental \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--network NETWORK \
--checkpoint-output-path . \
--checkpoint-interval 10000 \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-r | --replayer-checkpoint
: path to the latest checkpoint file migration-checkpoint-XXX.json
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-i | --checkpoint-interval
: frequency of dumping checkpoint expressed in blocks count
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
Stage 3: Remainder migration
mina-berkeley-migration-script \
final \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--network NETWORK \
--checkpoint-output-path . \
--checkpoint-interval 10000 \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json \
-fc fork-genesis-config.json
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-r | --replayer-checkpoint
: path to the latest checkpoint file migration-checkpoint-XXX.json
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-i | --checkpoint-interval
: frequency of dumping checkpoint expressed in blocks count
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
-fc | --fork-config
: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced
Advanced approach
If the simplified berkeley migration script is, for some reason, not suitable for you, it is possible to run the migration using the berkeley_migration and replayer apps without an interface the script provides.
Stage 1: Initial migration
This first stage requires only the initial Berkeley schema, which is the foundation for the next migration stage. This schema populates the migrated database and creates an initial checkpoint for further incremental migration.
Inputs
- Unmigrated Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Empty target Berkeley database with the schema created, but without any content
Outputs
- Migrated Devnet/Mainnet database to the Berkeley format from genesis up to the last canonical block in the original database
- Replayer checkpoint that can be used for incremental migration
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
Phase 2: Replayer in migration mode run
Replayer config must contain the Devnet/Mainnet ledger as the starting point. So first, you must prepare the replayer config file:
jq '.ledger.accounts' genesis_ledger.json | jq '{genesis_ledger: {accounts: .}}' > replayer_input_config.json
where:
genesis_ledger.json
is the genesis file from a daemon bootstrap on a particular network
Then:
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer_input_config.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the replayer input file, see below on how's created
replayer_input_config.json
: is a file constructed out of network genesis ledger:
jq '.ledger.accounts' genesis_ledger.json | jq '{genesis_ledger: {accounts: .}}' > replayer_input_config.json
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
Stage 2: Incremental migration
After the initial migration, the data is migrated data up to the last canonical block. However, Devnet/Mainnet data is progressing with new blocks that must also be migrated again and again until the fork block is announced.
Incremental migration can, and probably must, be repeated a couple of times until the fork block is announced by Mina Foundation. Run the incremental migration multiple times with the latest Devnet/Mainnet database and the latest replayer checkpoint file.
Inputs
- Latest Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Replayer checkpoint from last run
- Migrated berkeley database from initial migration
Outputs
- Migrated Devnet/Mainnet database to the Berkeley format up to the last canonical block
- Replayer checkpoint which can be used for the next incremental migration
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
Phase 2: Replayer in migration mode run
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the latest checkpoint file replayer-checkpoint-XXX.json
replayer-checkpoint-XXX.json
: the latest checkpoint generated from the previous migration
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Incremental migration can be run continuously on top of the initial migration or last incremental until the fork block is announced.
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated database.
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
Note that: you can run incremental migration continuously on top of the initial migration or the last incremental until the fork block is announced.
Stage 3: Remainder migration
When the fork block is announced, you must tackle the remainder migration. This is the last migration run
you need to perform. In this stage, you close the migration cycle with the last migration of the remainder blocks between the current last canonical block and the fork block (which can be pending, so you don't need to wait 290 blocks until it would become canonical).
You must use --fork-state-hash
as an additional parameter to the berkeley-migration app.
Inputs
- Latest Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Replayer checkpoint from last run
- Migrated Berkeley database from last run
- Fork block state hash
Outputs
- Migrated devnet/mainnet database to berkeley up to fork point
- Replayer checkpoint which can be used for the next incremental migration
The migrated database output from this stage of the final migration is required to initialize your archive nodes on the upgraded network.
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path \
--keep-precomputed-blocks \
--network NETWORK \
--fork-state-hash {fork-state-hash}
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--fork-state-hash
: fork state hash
When you run the berkeley-migration app with fork-state-hash, there is no requirement for the fork state block to be canonical. The tool automatically converts all pending blocks in the subchain, including the fork block, to canonical blocks.
Phase 2: Replayer in migration mode run
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the latest checkpoint file replayer-checkpoint-XXX.json
from stage 1
replayer-checkpoint-XXX.json
: the latest checkpoint generated from the previous migration
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.
mina-berkeley-migration-verifier \
post-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--fork-config-file fork_genesis_config.json \
--migrated-replayer-output replayer-checkpoint-XXXX.json
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--migrated-replayer-output
: path to the latest checkpoint file replayer-checkpoint-XXX.json
--fork-config
: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced
Example migration steps using Mina Foundation data for Devnet using Debian
See: Worked example using March 22 data
Example migration steps using Mina Foundation data for Mainnet using Docker
See: Worked example using March 22 data
How to verify a successful migration
o1Labs and Mina Foundation make every effort to provide reliable tools of high quality. However, it is not possible to eliminate all errors and test all possible Mainnet archive variations.
All important checks are implemented in the mina-berkeley-migration-verifier
application.
However, you can use the following checklist if you want to perform the checks manually:
All transaction (user command and internal command) hashes are left intact.
Verify that the
user_command
andinternal_command
tables have the Devnet/Mainnet format of hashes. For example,CkpZirFuoLVV...
.Parent-child block relationship is preserved
Verify that a given block in the migrated archive has the same parent in the Devnet/Mainnet archive (
state_hash
andparent_hash
columns) that was used as input.Account balances remain the same
Verify the same balance exists for a given block in Mainnet and the migrated databases.
Tips and tricks
We are aware that the migration process can be very long (a couple of days). Therefore, we encourage you to use cron jobs that migrate data incrementally. The cron job requires access to Google Cloud buckets (or other storage):
- A bucket to store migrated-so-far database dumps
- A bucket to store checkpoint files
We are tightly coupled with Google Cloud infrastructure due to the precomputed block upload mechanism. This is why we are using also buckets for storing dumps and checkpoint. However, you do not have to use Google Cloud for other things than precomputed blocks. With configuration, you can use any gsutil-compatible storage backend (for example, S3).
Before running the cron job, upload an initial database dump and an initial checkpoint file.
To create the files, run these steps locally:
- Download a Devnet/Mainnet archive dump and load it into PostgreSQL.
- Create an empty database using the new archive schema.
- Run the berkeley-migration app against the Devnet/Mainnet and new databases.
- Run the replayer app in migration mode with the
--checkpoint-interval
set to a suitable value (perhaps 100) and start with the original Devnet/Mainnet ledger in the input file. - Use pg_dump to dump the migrated database and upload it.
- Upload the most recent checkpoint file.
The cron job performs the same steps in an automated fashion:
- Pulls the latest Devnet/Mainnet archive dump and loads it into PostgresQL.
- Pulls the latest migrated database and loads it into PostgreSQL.
- Pulls the latest checkpoint file.
- Runs the berkeley-migration app against the two databases.
- Runs the replayer app in migration mode using the downloaded checkpoint file; set the checkpoint interval to be smaller (perhaps 50) because there are typically only 200 or so blocks in a day.
- Uploads the migrated database.
- Uploads the most recent checkpoint file.
Be sure to monitor the cron job for errors.
Just before the Berkeley upgrade, migrate the last few blocks by running locally:
- Download the Devnet/Mainnet archive data directly from the k8s PostgreSQL node (not from the archive dump), and load it into PostgreSQL.
- Download the most recent migrated database and load it into PostgresQL.
- Download the most recent checkpoint file.
- Run the berkeley-migration app against the two databases.
- Run the replayer app in migration mode using the most recent checkpoint file.
It is worthwhile to perform these last steps as a dry run to make sure all goes well. You can run these steps as many times as needed.
Known migration problems
Please remember that rerunning after crash is always possible. After solving any of below issues you can rerun process and migration will continue form last position
Async was unable to add a file descriptor to its table of open file descriptors
For example:
("Async was unable to add a file descriptor to its table of open file descriptors"
(file_descr 18)
(error
"Attempt to register a file descriptor with Async that Async believes it is already managing.")
(backtrace
......
A remedy is to lower --block-batch-size
parameter to values up to 500.
Map.find_exn: not found
For example:
(monitor.ml.Error
(Not_found_s
("Map.find_exn: not found"
....
Usually this error means that there is a gap in canonical chain. In order to fix it please ensure that missing-block-auditor run is successful
Yojson.Json_error .. Unexpected end of input
For example:
(monitor.ml.Error
("Yojson.Json_error(\"Line 1, bytes 1003519-1003520:\\nUnexpected end of input\")")
("Raised at Yojson.json_error in file \"common.ml\", line 5, characters 19-39"
"Called from Yojson.Safe.__ocaml_lex_read_json_rec in file \"lib/read.mll\", line 215, characters 28-52"
...
This issue is caused by invalid precomputed block. Deleting the downloaded precomputed blocks should resolve this issue.
Error querying db, error: Request to ... failed: ERROR: column \"type\" does not exist
You provided the migrated schema as source one when invoking script or berkeley-migration app
Poor performance of migration when accessing remote database
We conducted migration tests with both a local database and a distant database (RDS). The migration using the local database appears to process significantly faster. We strongly suggest to use offline database installed locally
ERROR: out of shared memory
(monitor.ml.Error (Failure "Error querying for user commands with id 1686617, error Request to postgresql://user:pwd@host:port/db failed: ERROR: out of shared memory
\nHINT: You might need to increase max_pred_locks_per_transaction
Solution is either to increase max_pred_locks_per_transaction
setting in postgres database.
Alternative is to isolate database from mainnet traffic (for example by exporting dump from live database and import it on isolated environment)
Berkeley migration app is consuming all of my resources
When running a full migration, you can stumble on memory leaks that prevent you from cleanly performing the migration in one pass. A machine with 64 GB of RAM can be frozen after ~40k migrated blocks. Each 200 blocks inserted into the database increases the memory leak by 4-10 MB.
A potential workaround is to split the migration into smaller parts using cron jobs or automation scripts.
FAQ
Migrated database is missing orphaned blocks
By design, Berkeley migration omits orphaned blocks and, by default, migrates only canonical (and pending, if setup correctly) blocks.
Replayer in migration mode overrides my old checkpoints
By default, the replayer dumps the checkpoint to the current folder. All checkpoint files have a similar format:
replayer-checkpoint-{number}.json.
To prevent override of old checkpoints, use the --checkpoint-output-folder
and --checkpoint-file-prefix
parameters to modify the output folder and prefix.