Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !365

[jsonl] Re-use caches for symlinks.

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge reuse_indexes into main Oct 01, 2022
  • Overview 2
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: stephenroller

Patch Description We've been doing a lot of training on different corpora, and using symbolic links to create the mixture. Because of the way the caching system works, this means we rebuild our line indexes every time we create a new mix of corpora.

This patch builds the indexes next to the original data, not the symlink. This helps re-use indexes across distinct runs.

Testing steps Been using this for a week plus.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: reuse_indexes