-
Nikolay authored
* Added test for model-parallel mp1 and mp2 * lint reformat * small fix * small fix * test smaller test size * test empty tests to go through the gpu_tests * include 1 test * test all tests removed * enable second test only * update the config.yml to get more details * add all possible logging * try -log_cli=true in the cofig file * try config export PYTHONUNBUFFERED=1 * test simple Popen(ls -la) * try different args for Popen * try add --full-azure-upload-path to args * try calling cli_main() directly * try wrapping the run in a multiprocessing * revert to the original test to share * reformat * try vanila test * try exact copy of another test * try small modification to another test * try to re-introduce a patch to local_run * try introduce another patch * try revert back to the original test * try experiment 1 * try smaller batch size * try add patch batch size and env * try remove num_workers parameter * try remove num_workers parameter update * try add back max_update and num_workers * check that the test baseline works * try refactored tests * try with 1 test * try with one test mp1 * try one test * try with 1 test * try with refactored code * lint the file * split into two models * try remove one test file * try remove one test * try mp1 test with all the suite of tests * try reduce batch size and cuda.empty_cache() * try reduce batch size further * fix bug in patch * try increase batch to match atten_head * try both tests together to reuse code
a6ef598c