Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB-SYNC doesnt move #1879

Open
NanuIjaz opened this issue Oct 21, 2024 · 12 comments
Open

DB-SYNC doesnt move #1879

NanuIjaz opened this issue Oct 21, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@NanuIjaz
Copy link

db-sync 13.5.0.2 , pg 14 , is stuck here and not moving. Eventually it fails and goes back to same stage. Higher work_mem is also set in pg.

[db-sync-node:Warning:81] [2024-10-18 11:25:58.77 UTC] Creating Indexes. This may require an extended period of time to perform. Setting a higher maintenance_work_mem from Postgres usually speeds up this process. These indexes are not used by db-sync but are meant for clients. If you want to skip some of these indexes, you can stop db-sync, delete or modify any migration-4-* files in the schema directory and restart it

error

[db-sync-node:Error:81] [2024-10-18 12:58:12.34 UTC] runDBThread: SqlError {sqlState = "", sqlExecStatus = FatalError, sqlErrorMsg = "", sqlErrorDetail = "", sqlErrorHint = ""}

please help on this.

@NanuIjaz NanuIjaz added the bug Something isn't working label Oct 21, 2024
@sgillespie
Copy link
Contributor

After you get the error, are the indices still being created? Try running this query:

select * from pg_stat_progress_create_index

Also, can you check if postgresql and cardano-node are still running at this point? Also if you could give more information about your environment, that would be helpful:

  • Are you using db-sync in docker or natively?
  • How are you running postgresql (OS package? command line? systemd?)

@NanuIjaz
Copy link
Author

Sorry i should have given more details earlier.

I am pretty confused with the nature of issues we have .

we are runnning both db-sync and pg in docker.

i ran the select query you gave , it didnt return anything.

I notice the strange behaviour. Sometimes it gives the error after waiting at this point

[db-sync-node:Info:6] [2024-10-21 11:05:09.83 UTC] Found maintenance_work_mem=2GB, max_parallel_maintenance_workers=4
ExitFailure 2

Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log

sometimes it gives the error that i mentioned earlier.

After throwing this error, container restarts and starting syncing again. I can see its waiting here now

[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Received block which is not in the db with HeaderFields {headerFieldSlot = SlotNo 137938707, headerFieldBlockNo = BlockNo 10990914, headerFieldHash = a544cd2f7bf24902ac5d9b0f674f67b02f46254b82fe8a6fafa58758f7956fba}. Time to restore consistency.
[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516

I think it will error out after this, i am watching it waits

@sgillespie
Copy link
Contributor

This message:

Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log

Indicates there is a problem running a migration, which will cause db-sync to exit. Can you post the contents of that file?

@NanuIjaz
Copy link
Author

I kept losing that file as container restarts. I am tailing the file right now, it doesnt show any messages yet.

@NanuIjaz
Copy link
Author

just now it crashed like this

[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516

[db-sync-node:Error:81] [2024-10-21 14:41:53.13 UTC] runDBThread: libpq: failed (no connection to the server
)
[db-sync-node:Error:111] [2024-10-21 14:41:53.13 UTC] recvMsgRollForward: AsyncCancelled
[db-sync-node:Error:106] [2024-10-21 14:41:53.13 UTC] ChainSyncWithBlocksPtcl: AsyncCancelled
[db-sync-node.Subscription:Error:102] [2024-10-21 14:41:53.13 UTC] Identity Application Exception: LocalAddress "/home/cardano/ipc/node.socket" SubscriberError {seType = SubscriberWorkerCancelled, seMessage = "SubscriptionWorker exiting", seStack = []}
cardano-db-sync: libpq: failed (no connection to the server
)

@NanuIjaz
Copy link
Author

this is from the logs ,

Running : migration-1-0000-20190730.sql
init

(1 row)

Running : migration-1-0001-20190730.sql
migrate

(1 row)

Running : migration-1-0002-20190912.sql
psql:/home/cardano/cardano-db-sync/schema/migration-1-0002-20190912.sql:32: NOTICE: Dropping view : "utxo_byron_view"
psql:/home/cardano/cardano-db-sync/schema/migration-1-0002-20190912.sql:32: NOTICE: Dropping view : "utxo_view"
drop_cexplorer_views

(1 row)

Running : migration-1-0003-20200211.sql
migrate

(1 row)

Running : migration-1-0004-20201026.sql
migrate

(1 row)

Running : migration-1-0005-20210311.sql
migrate

(1 row)

Running : migration-1-0006-20210531.sql
migrate

(1 row)

Running : migration-1-0007-20210611.sql
migrate

(1 row)

Running : migration-1-0008-20210727.sql
migrate

(1 row)

Running : migration-1-0009-20210727.sql
migrate

(1 row)

Running : migration-1-0010-20230612.sql
migrate

(1 row)

Running : migration-1-0011-20230814.sql
migrate

(1 row)

Running : migration-1-0012-20240211.sql
migrate

(1 row)

Running : migration-1-0013-20240318.sql
migrate

(1 row)

Running : migration-1-0014-20240411.sql
migrate

(1 row)

Running : migration-1-0015-20240724.sql
migrate

(1 row)

Running : migration-2-0001-20211003.sql
migrate

(1 row)

Running : migration-2-0002-20211007.sql
migrate

(1 row)

Running : migration-2-0003-20211013.sql
migrate

(1 row)

Running : migration-2-0004-20211014.sql
migrate

(1 row)

Running : migration-2-0005-20211018.sql
migrate

(1 row)

Running : migration-2-0006-20220105.sql
migrate

(1 row)

Running : migration-2-0007-20220118.sql
migrate

(1 row)

Running : migration-2-0008-20220126.sql
migrate

(1 row)

Running : migration-2-0009-20220207.sql
migrate

(1 row)

Running : migration-2-0010-20220225.sql
migrate

(1 row)

Running : migration-2-0011-20220318.sql
migrate

(1 row)

Running : migration-2-0012-20220502.sql
migrate

(1 row)

Running : migration-2-0013-20220505.sql
migrate

(1 row)

Running : migration-2-0014-20220505.sql
migrate

(1 row)

Running : migration-2-0015-20220505.sql
migrate

(1 row)

Running : migration-2-0016-20220524.sql
migrate

(1 row)

Running : migration-2-0017-20220526.sql
migrate

(1 row)

Running : migration-2-0018-20220604.sql
migrate

(1 row)

Running : migration-2-0019-20220615.sql
migrate

(1 row)

Running : migration-2-0020-20220919.sql
migrate

(1 row)

Running : migration-2-0021-20221019.sql
migrate

(1 row)

Running : migration-2-0022-20221020.sql
migrate

(1 row)

Running : migration-2-0023-20221019.sql
migrate

(1 row)

Running : migration-2-0024-20221020.sql
migrate

(1 row)

Running : migration-2-0025-20221020.sql
migrate

(1 row)

Running : migration-2-0026-20231017.sql
migrate

(1 row)

Running : migration-2-0027-20230713.sql
migrate

(1 row)

Running : migration-2-0028-20240117.sql
migrate

(1 row)

Running : migration-2-0029-20240117.sql
migrate

(1 row)

Running : migration-2-0030-20240108.sql
migrate

(1 row)

Running : migration-2-0031-20240117.sql
migrate

(1 row)

Running : migration-2-0032-20230815.sql
migrate

(1 row)

Running : migration-2-0033-20231009.sql
migrate

(1 row)

Running : migration-2-0034-20240301.sql
migrate

(1 row)

Running : migration-2-0035-20240308.sql
migrate

(1 row)

Running : migration-2-0036-20240318.sql
migrate

(1 row)

Running : migration-2-0037-20240403.sql
migrate

(1 row)

Running : migration-2-0038-20240603.sql
migrate

(1 row)

Running : migration-2-0039-20240703.sql
migrate

(1 row)

Running : migration-2-0040-20240626.sql
migrate

(1 row)

Running : migration-2-0041-20240711.sql
migrate

(1 row)

Running : migration-2-0042-20240808.sql
migrate

(1 row)

Running : migration-2-0043-20240828.sql
migrate

(1 row)

Running : migration-3-0001-20190816.sql
Running : migration-3-0002-20200521.sql
psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: error: connection to server was lost
ExitFailure 2

@sgillespie
Copy link
Contributor

Is it possible you're running out of memory? It seems clear from the logs that you're losing connection to the pg server

@NanuIjaz
Copy link
Author

listen_addresses = '*'
port = '5432'
max_connections = '600'
shared_buffers = '32GB'
effective_cache_size = '96GB'
maintenance_work_mem = '2GB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
random_page_cost = '1.0'
effective_io_concurrency = '200'
work_mem = '8GB'
min_wal_size = '1GB'
max_wal_size = '4GB'
max_worker_processes = '128'
max_parallel_workers_per_gather = '16'
max_parallel_workers = '64'
max_parallel_maintenance_workers = '4'
log_min_duration_statement = '2000'

this is our postgres.conf file, I do see high memory consumption , but its not 100%. do you suggest any changes to above?

@sgillespie
Copy link
Contributor

You might want to check out this tool: https://pgtune.leopard.in.ua/. This is what I used to generate my configuration. For my config, I chose "online transaction processing system"

@NanuIjaz
Copy link
Author

here is the error, i was able to drill down till this.

2024-10-23 14:57:27.050 GMT [176] LOG: could not receive data from client: Connection reset by peer
2024-10-23 14:57:27.050 GMT [176] LOG: unexpected EOF on client connection with an open transaction

@rdlrt
Copy link

rdlrt commented Oct 23, 2024

That error simply says a client connection was terminated.

You would need to look at your postgres DB crash reason (if needed , look at it outside of docker first), could be mariade of reasons [eg: Running out of infrastructure memory - for which can check oom msgs in system logs, ulimits, corrupted DB WAL markers if you haven't cleared existing DB before, etc].

IMO - github is not the right medium to help you troubleshoot system/infra issues. Maybe discord/forum/stackexchange would be better choices to search for existing or start new thread with better synopsis than what's presented here.

@Cmdv
Copy link
Contributor

Cmdv commented Dec 5, 2024

@NanuIjaz did you manage to take a look at your postgres instance as advised?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants