๐ฆ DQL
Web Scraping with Deno ย โย DOM + GraphQL
DQL
is a web scraping module for Deno and Deno Deploy that integrates the power of GraphQL Queries with the DOM tree of a remote webpage or HTML document fragment. This is a fork of DenoQL with some heavy refactoring and some additional features:
- Compatibility with the Deno Deploy architecture
- Ability to pass variables alongside all queries
- New state-management class with additional methods
- Modular project structure (as opposed to a mostly single-file design)
- Improved types and schema structure
Note: This is a work-in-progress and there is still a lot to be done.
GraphQL Playground
๐ย HackerNews Scraper
๐ย Junkyard Scraper
๐ย useQuery
The primary function exported by the module is the workhorse named useQuery
:
import { useQuery } from "https://deno.land/x/dql/mod.ts";
const data = await useQuery(`query { ... }`);
QueryOptions
You can also provide a QueryOptions
object as the second argument of useQuery
, to further control the behavior of your query requests. All properties are optional.
const data = await useQuery(`query { ... }`, {
concurrency: 8, // passed directly to PQueue initializer
fetch_options: { // passed directly to Fetch API requests
headers: {
"Authorization": "Bearer ghp_a5025a80a24defd0a7d06b4fc215bb5635a167c6",
},
},
variables: {}, // variables defined in your queries
operationName: "", // when using multiple queries
});
createServer
With Deno Deploy, you can deploy DQL
with a GraphQL Playground in only 2 lines of code:
import { createServer } from "https://deno.land/x/dql/mod.ts";
createServer(80, { endpoint: "https://dql.deno.dev" });
๐
Try the GraphQL Playground at dql.deno.dev
๐ฆ
View the source code in the Deno Playground
Command Line Usage (CLI)
deno run -A --unstable https://deno.land/x/dql/serve.ts
8080
)
Custom port (default is deno run -A https://deno.land/x/dql/serve.ts --port 3000
Warning: you need to have the Deno CLI installed first.
๐ป Examples
๐
Junkyard Scraper ยท Deno Playground ๐ฆ
import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";
serve(async (res: Request) =>
await useQuery(
`
query Junkyard (
$url: String
$itemSelector: String = "table > tbody > tr"
) {
vehicles: page(url: $url) {
totalCount: count(selector: $itemSelector)
nodes: queryAll(selector: $itemSelector) {
id: index
vin: text(selector: "td:nth-child(7)", trim: true)
sku: text(selector: "td:nth-child(6)", trim: true)
year: text(selector: "td:nth-child(1)", trim: true)
model: text(selector: "td:nth-child(2) > .notranslate", trim: true)
aisle: text(selector: "td:nth-child(3)", trim: true)
store: text(selector: "td:nth-child(4)", trim: true)
color: text(selector: "td:nth-child(5)", trim: true)
date: attr(selector: "td:nth-child(8)", name: "data-value")
image: src(selector: "td > a > img")
}
}
}`,
{
variables: {
"url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW",
},
},
)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
Deno Playground ๐ฆ
๐ HackerNews Scraper ยท import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";
serve(async (res: Request) =>
await useQuery(`
query HackerNews (
$url: String = "http://news.ycombinator.com"
$rowSelector: String = "tr.athing"
) {
page(url: $url) {
title
totalCount: count(selector: $rowSelector)
nodes: queryAll(selector: $rowSelector) {
rank: text(selector: "td span.rank", trim: true)
title: text(selector: "td.title a", trim: true)
site: text(selector: "span.sitestr", trim: true)
url: href(selector: "td.title a")
attrs: next {
score: text(selector: "span.score", trim: true)
user: text(selector: "a.hnuser", trim: true)
date: attr(selector: "span.age", name: "title")
}
}
}
}`)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
License
MIT ยฉ Nicholas Berlette, based on DenoQL.