Abstract:
Snapshot algorithms compute on the fly global states of underlying distributed computations for diagnosis and maintenance purposes. By a partial snapshot algorithm we understand a snapshot algorithm which only collects checkpoints (local states) of some, rather than of all processes. We propose an algorithm which computes a partial snapshot covering the direct and indirect causal dependencies (the Z-dependencies) of the initiating process without affecting the remaining, causally independent part of the underlying computation. This makes the algorithm suitable for large-scale and conceptually unbounded distributed systems. We show that it is undecidable whether this task can be accomplished not exceeding a given number of checkpoints. To control termination, two variants of the algorithm are proposed which bound the depth of Z-dependencies to be taken into account. We prove the correctness of the algorithm and its variants by establishing (1) progressive consistency which allows one to reset at any time of the snapshot computation only the processes having taken a checkpoint, (2) consistency as for incremental snapshot algorithms aiming at updating a previously computed snapshot for the unmodified variant, and (3) that the spanning trees obtained with the termination of the algorithms reflect Z-dependencies to the initiator. The main correctness proofs use an inductive argument formulated over the operational semantics of the algorithm. We further prove the number of checkpoints taken to be minimal (wrt Z-dependencies to be covered) and provide a complexity analysis.)