Mpi speculative Diff #1

AutonomicPerfectionist · 2023-12-10T21:16:06Z

This PR is not meant to be merged and is only for easily seeing the changes made by the speculative MPI branch.

Adds support for different options and number of layers per node. The per-node options are implemented as parsing command-line options from a file instead of from the command-line itself. This allows each node to have its own version of this options file. The different number of layers per-node is implemented as a new option, `mpi-layer-split`, that takes a list of percentages. These percentages are used to calculate the range of layers to delegate to each node. The ranges are calculated on the head node and then scattered to the other nodes to maintain a single source of truth.

…ighted

…t to main

…celed

…and run continuous speculation if not waiting

…e, change timing to thread cycles

* [example] batched-bench "segmentation fault" When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ``` * Update examples/batched-bench/batched-bench.cpp Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>

AutonomicPerfectionist added 30 commits November 26, 2023 13:38

Add documentation for ggml-mpi functions

1f3febc

Add code comments in MPI

d70f26c

Remove mtest (ggml-org#3177)

4bd95ae

Revert accidental removal of ggml_mpi_backend_init

f691b61

Disable warmup under MPI

d7dbb6b

Update MPI example to follow main changes

1ff69c4

Remove fprintf logs from mpi main

33185ed

Remove unrelated sections from mpi readme

f67fcbc

Replace vector with C-style array and length in llama_split_layers_we…

907f807

…ighted

Fix minor rebase errors

6b1c471

Fix MPI compilation errors

afc2cc4

Synchronize batch sequence info, fixing MPI for llama_decode()

efd73fe

Update MPI code to new KV seq rm and bos/eos model APIs

3fa2527

Fix some mpi mem leaks, add mpi-layer-split to help when using mpi

33b88d6

Fix missing layer_inp_i names

da37edc

Allow per-node threads to be set in command-line args, add mpi suppor…

51f3f8f

…t to main

Support running speculation with two processes

4cf1c76

Support setting layer splits per comm/model

8ccaf96

Fix incorrect layer split parsing

5f21688

Split orig comm to only contain root nodes of the two subnets

2ddf0fe

Fix main layer split and fix speculative prompt tokenization

2166a12

Fix kv desync

fbc3d4d

Propagate exit to worker nodes

4dc25d3

Add async decoding

ba31377

Fix draft nodes accidentally running target

1b6f75d

Re-enable async tensor send

71c6947

Begin work on decoupling tgt and dft pipelines

d73f944

Only sync required token data

4aa9b6c

Working additional run w/ reset

9b67f73

AutonomicPerfectionist added 24 commits November 26, 2023 14:20

Re-enable wait recv

e47fd5c

Don't store send requests, immediately free them

3a58fef

Switch pipeline sync back to synced send

8c44ee6

Move tensor transmissions to tag 7 and re-enable async pipeline sync

c4b8362

Switch isend to buffered send

d5b7512

Add assertions to prevent buffer overflow

a0272a1

Add additional logging

7fb2630

Correct async tgt, but break drafts

cd10f89

Mostly working async

cbe6e2c

Another partially working version

6933af6

Non-async working

b005ee1

Rearchitect MPI so head is first

1ac4484

Fix segfault and working async w/ no np

a9685bb

Fix np >= 2 by using sequence offsets

7081a7a

Fix draft model KV cache synchronization w/ double buffering

67838da

Add cancellation and multiple simultaneous speculative seqs

615e666

Mostly fix cache sync issues and simul spec runs

9f65428

Cancel after sampling

73b92c7

Force at least 2 spec runs

e797f1a

Take secondary draft sequences into account for cancellation

7674bde

Add p_recovery and move dump_kv_cache_view_seqs out of common

9606d38

Enforce message ordering with transactions, abort GGML compute if can…

c6ac680

…celed

Kinda fix no shutdown issue, add more tag definitions

a76859e

Refactor speculation for better readability, check for cancellations …

6830313

…and run continuous speculation if not waiting

AutonomicPerfectionist force-pushed the mpi-speculative branch from fffde9b to 6830313 Compare January 15, 2024 18:39

AutonomicPerfectionist added 4 commits January 15, 2024 13:36

Fix main, kinda

cfe3120

Both main and speculative mostly working, add latencies to speculativ…

d3baaf7

…e, change timing to thread cycles

Add latencies to main

d23b996

Add p_decay

d6a70a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mpi speculative Diff #1

Mpi speculative Diff #1

Uh oh!

AutonomicPerfectionist commented Dec 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mpi speculative Diff #1

Are you sure you want to change the base?

Mpi speculative Diff #1

Uh oh!

Conversation

AutonomicPerfectionist commented Dec 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants