.circleci/config.yml · e687e96761b08ee29b9f94e1eaa8e376fee12400 · Administrator / metaseq

Added test for model-parallel mp1 and mp2 (#597) · a6ef598c

Nikolay authored Jan 14, 2023

* Added test for model-parallel mp1 and mp2

* lint reformat

* small fix

* small fix

* test smaller test size

* test empty tests to go through the gpu_tests

* include 1 test

* test all tests removed

* enable second test only

* update the config.yml to get more details

* add all possible logging

* try -log_cli=true in the cofig file

* try config export PYTHONUNBUFFERED=1

* test simple Popen(ls -la)

* try different args for Popen

* try add --full-azure-upload-path to args

* try calling cli_main() directly

* try wrapping the run in a multiprocessing

* revert to the original test to share

* reformat

* try vanila test

* try exact copy of another test

* try small modification to another test

* try to re-introduce a patch to local_run

* try introduce another patch

* try revert back to the original test

* try experiment 1

* try smaller batch size

* try add patch batch size and env

* try remove num_workers parameter

* try remove num_workers parameter update

* try add back max_update and num_workers

* check that the test baseline works

* try refactored tests

* try with 1 test

* try with one test mp1

* try one test

* try with 1 test

* try with refactored code

* lint the file

* split into two models

* try remove one test file

* try remove one test

* try mp1 test with all the suite of tests

* try reduce batch size and cuda.empty_cache()

* try reduce batch size further

* fix bug in patch

* try increase batch to match atten_head

* try both tests together to reuse code

a6ef598c