• Nikolay's avatar
    Added test for model-parallel mp1 and mp2 (#597) · a6ef598c
    Nikolay authored
    * Added test for model-parallel mp1 and mp2
    
    * lint reformat
    
    * small fix
    
    * small fix
    
    * test smaller test size
    
    * test empty tests to go through the gpu_tests
    
    * include 1 test
    
    * test all tests removed
    
    * enable second test only
    
    * update the config.yml to get more details
    
    * add all possible logging
    
    * try -log_cli=true in the cofig file
    
    * try config export PYTHONUNBUFFERED=1
    
    * test simple Popen(ls -la)
    
    * try different args for Popen
    
    * try add --full-azure-upload-path to args
    
    * try calling cli_main() directly
    
    * try wrapping the run in a multiprocessing
    
    * revert to the original test to share
    
    * reformat
    
    * try vanila test
    
    * try exact copy of another test
    
    * try small modification to another test
    
    * try to re-introduce a patch to local_run
    
    * try introduce another patch
    
    * try revert back to the original test
    
    * try experiment 1
    
    * try smaller batch size
    
    * try add patch batch size and env
    
    * try remove num_workers parameter
    
    * try remove num_workers parameter update
    
    * try add back max_update and num_workers
    
    * check that the test baseline works
    
    * try refactored tests
    
    * try with 1 test
    
    * try with one test mp1
    
    * try one test
    
    * try with 1 test
    
    * try with refactored code
    
    * lint the file
    
    * split into two models
    
    * try remove one test file
    
    * try remove one test
    
    * try mp1 test with all the suite of tests
    
    * try reduce batch size and cuda.empty_cache()
    
    * try reduce batch size further
    
    * fix bug in patch
    
    * try increase batch to match atten_head
    
    * try both tests together to reuse code
    a6ef598c