Skip to content
GitLab
    • Explore Projects Groups Snippets
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • H headless-chrome-crawler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • yujiosaka
  • headless-chrome-crawler
  • Merge requests
  • !315

feat(hccrawler): pass previous url to skip request

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Administrator requested to merge github/fork/slonka/master into master 6 years ago
  • Overview 0
  • Commits 2
  • Pipelines 0
  • Changes 2

Created by: slonka

Hi,

I've found it useful to pass previousUrl to _skipRequest. Would love this to be included.

Best regards

Compare
  • master (base)

and
  • latest version
    e9b373af
    2 commits, 2 years ago

2 files
+ 7
- 2

    Preferences

    File browser
    Compare changes
l‎ib‎
crawl‎er.js‎ +6 -1
hccraw‎ler.js‎ +1 -1
lib/crawler.js
+ 6
- 1
  • View file @ e9b373af


@@ -277,7 +277,7 @@ class Crawler {
* @private
*/
async _collectLinks(baseUrl) {
const links = [];
let links = [];
await this._page.exposeFunction('pushToLinks', link => {
const _link = resolveUrl(link, baseUrl);
if (_link) links.push(_link);
@@ -302,6 +302,11 @@ class Crawler {
}
findLinks(window.document);
});
if (links.length === 0) {
links = (await this._page.$$eval('a[href]', el => [...el].map(a => a.href))).map(l => resolveUrl(l, baseUrl)).filter(l => l);
}
return uniq(links);
}
lib/hccrawler.js
+ 1
- 1
  • View file @ e9b373af


@@ -289,7 +289,7 @@ class HCCrawler extends EventEmitter {
* @private
*/
async _startRequest(options, depth, previousUrl) {
const skip = await this._skipRequest(options);
const skip = await this._skipRequest({ ...options, previousUrl });
if (skip) {
this.emit(HCCrawler.Events.RequestSkipped, options);
await this._markRequested(options);
0 Assignees
None
Assign to
0 Reviewers
None
Request review from
Labels
0
None
0
None
    Assign labels
  • Manage project labels

Milestone
No milestone
None
None
Time tracking
No estimate or time spent
Lock merge request
Unlocked
0
0 participants
Reference: firstcontributions/first-contributions!55400
Source branch: github/fork/slonka/master

Menu

Explore Projects Groups Snippets