Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Merge requests
  • !276

Execute SQL queries directly on one or more CSV files

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge github/fork/jeroenjanssens/master into master May 25, 2014
  • Overview 1
  • Commits 6
  • Pipelines 0
  • Changes 4

Created by: jeroenjanssens

In #259, with the new tool sql2csv, it was already suggested that you could execute SQL queries directly on CSV files using a hacky bash script. Now, with these small changes to csvsql, you can truly execute SQL queries on one or more CSV files.

This functionality is similar to TextQL and Q. However, thanks to the solid codebase of csvkit, csvsql handles both multiple tables and proper escaping of output. The following example, which joins two data sets that can be found in the examples directory, illustrates the changes:

 < iris.csv csvsql --query 'select m.usda_id, avg(i.sepal_length) as mean_sepal_length from stdin as i join irismeta as m on (i.species = m.species) group by m.species' irismeta.csv | csvlook
|----------+--------------------|
|  usda_id | mean_sepal_length  |
|----------+--------------------|
|  IRSE    | 5.006              |
|  IRVE2   | 5.936              |
|  IRVI    | 6.588              |
|----------+--------------------|

Note that you can now mix standard input and filenames. It is no longer necessary to specify --table when standard input is provided. Tables based on standard input are named "stdin" by default. Table names can still be overridden by specifying a (comma delimited) string to --tables. When --query is specified and --db is not, sqlite:///:memory: is used as the connection string and --insert is set to True. Multiple SQL queries can be specified by using ";" as a delimiter. Only the output of the last query is outputted as CSV. This allows you to execute SQL queries in memory.

No functionality has been removed. Appropriate tests have been added. I'll write the corresponding documentation once these changes have been accepted.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/jeroenjanssens/master