diff --git a/projects/OPT/chronicles/10_percent_update.md b/projects/OPT/chronicles/10_percent_update.md index f67ac37da17605dc4a8018031a1602b832978c8b..ac9d68092a32b9b6495ab0b1d5a76983fc400629 100644 --- a/projects/OPT/chronicles/10_percent_update.md +++ b/projects/OPT/chronicles/10_percent_update.md @@ -3,7 +3,7 @@ **Posted on:** November 17, 2021 -Over the past couple of weeks, the team has been working on launching the full 175B training run with tensor parallelism (TP), on an updated dataset taken from the Pile, CCNEWS, and Reddit. Since the last post, the following has happened: +Over the past couple of weeks, the team has been working on launching the full 175B training run with tensor parallelism (TP), on an updated dataset taken from the Pile, CCNEWS, and Pushift.io. Since the last post, the following has happened: * Kicked off 1.3B parameter "kitchen-sink" runs (experiments 14-19) with TP to test stability of TP and various configurations of new deduplicated datasets, new BPEs, learned positional embeddings (LPE), Normformer, removing embedding dropout, and ReLU <> GeLU. * Found issues with the new dataset where perplexity was unreasonably low, which ended up being a combination of: diff --git a/projects/OPT/chronicles/OPT175B_Logbook.pdf b/projects/OPT/chronicles/OPT175B_Logbook.pdf index 62ad3628cd346ec9c6e12b6ca0f2dddf19bfaa05..4fe0df5f5e34d9c7522de1dd3c7b6b43a023dea8 100644 Binary files a/projects/OPT/chronicles/OPT175B_Logbook.pdf and b/projects/OPT/chronicles/OPT175B_Logbook.pdf differ