Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A awesome-python
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 13
    • Issues 13
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 317
    • Merge requests 317
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Vinta Chen
  • awesome-python
  • Merge requests
  • !836

Add weboob to Web Crawling section

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Administrator requested to merge github/fork/hydrargyrum/patch-1 into master Feb 21, 2017
  • Overview 3
  • Commits 1
  • Pipelines 1
  • Changes 1

Created by: hydrargyrum

What is this Python project?

It's a framework for scraping HTML sites, and aggregating data from multiple sites from a same category (e.g. banking sites, news sites, video sites, etc.). There are ready-made modules for popular websites and ready-apps to interact with them. Think youtube-dl applied to other domains than video!

What's the difference between this Python project and similar ones?

  • It's possible to scrape new websites with declarative-style extraction rules
  • It provides a standardized API for categories of sites for dedicated tasks (e.g. banking, web forums, video sites, news sites, music lyrics sites, etc.)
    • Scraped websites are grouped in those categories
  • Scraped websites are grouped in categories for a dedicated task:
  • The project comes with many existing backends for real-life websites
  • It has an internal upgrade system
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/hydrargyrum/patch-1