Neptune Core Overview
neptune-core
uses the tokio async framework and tokio's multi-threaded executor which assigns tasks to threads in a threadpool and requires the use of thread synchronization primitives. We refer to spawned tokio tasks as tasks
but you can think of them as threads if that fits your mental model better. Note that a tokio task may (or may not) run on a separate operating system thread from that task that spawned it, at tokio's discretion.
neptune-core
connects to other clients through TCP/IP and accepts calls to its RPC server via tarpc using json serialization over the serde_transport. The project also includes neptune-cli
a command-line client and neptune-dashboard
, a cli/tui wallet tool. Both interact with neptune-core
via the tarpc RPC protocol.
Long-lived async tasks of neptune-core binary
There are four classes of tasks:
main
: handles init andmain_loop
peer[]
: handlesconnect_to_peers
andpeer_loop
mining
: runsminer_loop
, has a worker and a monitor taskrpc_server[]
: handlesrpc_server
for incoming RPC requests
Channels
Long-lived tasks can communicate with each other through channels provided by the tokio framework. All communication goes through the main task. Eg, there is no way for the miner task to communicate with peer tasks.
The channels are:
- peer to main:
mpsc
, "multiple producer, single consumer". - main to peer:
broadcast
, messages can only be sent to all peer tasks. If you only want one peer task to act, the message must include an IP that represents the peer for which the action is intended. - miner to main:
mpsc
. Only one miner task (the monitor/master task) sends messages to main. Used to tell the main loop about newly found blocks. - main to miner:
watch
. Used to tell the miner to mine on top of a new block; to shut down; or that the mempool has been updated, and that it therefore is safe to mine on the next block. - rpc server to main:
mpsc
: Used to e.g. send a transaction object that is built from client-controlled UTXOs to the main task where it can be added to the mempool. This channel is also used to shut down the program when theshutdown
command is called.
Global State
All tasks that are part of Neptune Core have access to the global state and they can all read from it. Each type of task can have its own local state that is not shared across tasks, this is not what is discussed here.
The global state has five fields and they each follow some rules:
wallet_state
contains information necessary to generate new transactions and print the user's balance.chain
Blockchain state. Contains information about state of the blockchain, block height, digest of latest block etc. Onlymain
task may updatechain
.chain
consists of two field:light_state
, ephemeral, contains only latest blockarchival_state
, persistent.archival_state
consists of data stored both in a database and on disk. The blocks themselves are stored on disk, and meta-information about the blocks are stored in theblock_index
database.archival_state
also contains thearchival_mutator_set
which can be used to recover unsynced membership proofs for the mutator set.
network
, network state. Consists ofpeer_map
for storing in memory info about all connected peers andpeer_databases
for persisting info about banned peers. Both of these can be written to by main or by peer tasks.network
also contains asyncing
value (onlymain
may write) andinstance_id
which is read-only.cli
CLI arguments. The state carries around the CLI arguments. These are read-only.mempool
, in-memory data structure of a set of transactions that have not yet been mined in a block. The miner reads from themempool
to find the most valuable transactions to mine. Only the main task may write tomempool
.mempool
comes with a concept of ordering such that only the transactions that pay the highest fee per size are remembered.mempool
enforces a max size such that its size can be constrained.
Receiving a New Block
When a new block is received from a peer, it is first validated by the peer task. If the block is valid and more canonical than the current tip, it is sent to the main task. The main task is responsible for updating the GlobalState
data structure to reflect the new block. This is done by write-acquiring the single GlobalStateLock
and then calling the respective helper functions with this lock held throughout the updating process.
There are two pieces of code in the main loop that update the state with a new block: one when new blocks are received from a peer, and one for when the block is found locally by the miner task. These two functionalities are somewhat similar. In this process all databases are flushed to ensure that the changes are persisted on disk. The individual steps of updating the global state with a new block are:
-
- If block was found locally: Send it to all peers before updating state.
- If block was received from peer: Check if
sync
mode is activated and if we can leavesync
mode (see below for an explanation of synchronization).
write_block
: Write the block to disk and update theblock_index
database with the block's meta information.update_mutator_set
: Update the archival mutator set with the transaction (input and output UTXOs) from this block by applying all addition records and removal records contained in the block.update_wallet_state_with_new_block
: Check if this block contains UTXOs spent by or sent to us. Also update membership proofs for unspent UTXOs that are managed/relevant to/spendable by this client's wallet.mempool.update_with_block
: Remove transactions that were included in this block and update all mutator set data associated with all remaining transactions in the mempool- Update
light_state
with the latest block. - Flush all databases
- Tell miner
- If block was found locally: Tell miner that it can start working on next block since the
mempool
has now been updated with the latest block. - If blocks were received from peer: Tell miner to start building on top of a new chain tip.
- If block was found locally: Tell miner that it can start working on next block since the
Spending UTXOs
A transaction that spends UTXOs managed by the client can be made by calling the create_transaction
method on the GlobalState
instance. This function needs a synced wallet_db
and a chain tip in light_state
to produce a valid transaction.
For a working example, see the implementation of the send_to_many()
RPC method.
Scheduled Tasks in Main Loop
Different tasks are scheduled in the main loop every N seconds. These currently handle: peer discovery, block (batch) synchronization, and mempoool cleanup.
- Peer discovery: This is used to find new peers to connect to. The logic attempts to find peers that have a distance bigger than 2 in the network where distance 0 is defined as yourself; distance 1 are the peers you connect to at start up, and all incoming connections; distance 2 are your peers' peers and so on.
- Synchronization: Synchronization is intended for nodes to catch up if they are more than N blocks behind the longest reported chain. When a client is in synchronization mode, it will batch-download blocks in sequential order to catch up with the longest reported chain.
- Mempool cleanup: Remove from the mempool transactions that are more than 72 hours old.
A task for recovering unsynced membership proofs would fit well in here.
Design Philosophies
- Avoid state-through-instruction-pointer. This means that a request/response exchange should be handled without nesting of e.g. matched messages from another peer. So when a peer task requests a block from another peer the peer task must return to the instruction pointer where it can receive any message from the peer and not only work if it actually gets the block as the next message. The reasoning behind this is that a peer task must be able to respond to e.g. a peer discovery request message from the same peer before that peer responds with the requested block.
Central Primitives
From tokio
spawn
select!
tokio::sync::RwLock
From Std lib:
Arc
From neptune-core:
neptune_core::locks::tokio::AtomicRw
(wrapsArc<tokio::sync::RwLock>
)
Persistent Memory
We use leveldb
for our database layer with custom wrappers that make it more async-friendly, type safe, and emulate multi-table transactions.
neptune_core::database::NeptuneLevelDb
provides async wrappers for leveldb APIs to avoid blocking async tasks.
leveldb
is a simple key/value store, meaning it only allows manipulating individual strings. It does however provide a batch update facility. neptune_core::database::storage::storage_schema::DbSchema
leverages these batch updates to provide vector and singleton types that can be manipulated in rust code and then atomically written to leveldb
as a single batch update (aka transaction).
Blocks are stored on disk and their position on disk is stored in the block_index
database. Blocks are read from and written to disk using mmap
. We wrap all file-system calls with tokio's spawn_blocking()
so they will not block other async tasks.
Challenges
-
Deadlocks. We only have a single RwLock over the GlobalState. This is encapsulated in struct
GlobalStateLock
. This makes deadlocks pretty easy to avoid, following some simple rules:-
avoid deadlocking yourself. If a function has read-acquired the global lock then it must be released before write-acquiring. Likewise never attempt to write-acquire the lock twice.
-
avoid deadlocking others. Always be certain that the global lock will be released in timely fashion. In other words if you have some kind of long running task with an event loop that needs to acquire the global lock, ensure that it gets acquired+released inside the loop rather than outside.
-
-
Atomic writing to databases:
neptune-core
presently writes to the following databases: wallet_db, block_index_db, archival_mutator_set, peer_state. If one of the databases are updated but the other is not, this can leave data in an invalid state. We could fix this by storing all state in a single transactional database but that might make the code base less modular.
note: We should also add logic to rebuild the archival state from the block_index_db
and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary or a check could be performed at startup.
Tracing
A structured way of inspecting a program when designing the RPC API, is to use tracing, which is a logger, that is suitable for programs with asynchronous control flow.
- Get a feeling for the core concepts.
- Read tokio's short tutorial.
- View the 3 different formatters.
- See what we can have eventually: https://tokio.rs/tokio/topics/tracing-next-steps
The main value-proposition of tracing is that you can add #[instrument]
attribute over the function you currently work on. This will print the nested trace!("")
statements. You can also do it more advanced:
#![allow(unused)] fn main() { #[instrument(ret, skip_all, fields(particular_arg = inputarg1*2), level="debug")] fn my_func(&self, inputarg1: u32, inputarg2: u32) -> u32 { debug!("This will be visible from `stdout`"); info!("This prints"); trace!("This does not print {:#?}", inputarg2); inputarg1 * 42 + inputarg2 } }
Prints the return value, but none of the args (default behaviour is to prints all arguments with std::fmt::Debug formatter). It creates a new key with a value that is the double of the inputarg1
and prints that.
It then prints everything that is debug
level or above, where trace < debug < info < warn < error
, so here the trace!()
is omitted. You configure the lowest level you want to see with environment variable RUST_LOG=debug
.
RPC
To develop a new RPC, it can be productive to view two terminals simultaneously and run one of the following commands in each:
XDG_DATA_HOME=~/.local/share/neptune-integration-test/0/ RUST_LOG=debug cargo run -- --compose --guess --network regtest # Window1 RPC-server
XDG_DATA_HOME=~/.local/share/neptune-integration-test/0/ RUST_LOG=trace cargo run --bin rpc_cli -- --server-addr 127.0.0.1:9799 send '[{"public_key": "0399bb06fa556962201e1647a7c5b231af6ff6dd6d1c1a8599309caa126526422e", "amount":{"values":[11,0,0,0]}}]' # Window2 RPC-client
Note that the client exists quickly, so here the .pretty()
tracing subscriber is suitable, while .compact()
is perhaps better for the server.
neptune-cli client
neptune-cli
is a separate program with a separate address space. This means the state
object (see further down) is not available, and all data from Neptune Core must be received via RPC.
neptune-cli
does not have any long-lived tasks but rather receives individual commands via CLI, sends a query to neptune-core, presents the response, and exits.