openapi: 3.0.0 info: title: Dataflow Kit Web Scraper description: "Render Javascript driven pages, while we internally manage Headless\ \ Chrome and proxies for you. \n\n- Build a custom web scraper with our Visual\ \ point-and-click toolkit.\n- Scrape the most popular Search engines result pages\ \ (SERP).\n- Convert web pages to PDF and capture screenshots.\n***\n### Authentication\n\ Dataflow Kit API require you to sign up for an API key in order to use the API.\ \ \n\nThe API key can be found in the [DFK Dashboard](https://account.dataflowkit.com)\ \ after _free registration_.\n\nPass a secret API Key to all API requests to the\ \ server as the `api_key` query parameter. \n" termsOfService: https://dataflowkit.com/terms contact: url: https://dataflowkit.com/ version: "1.3" externalDocs: description: swagger-ui documentation url: https://dataflowkit.com/open-api servers: - url: https://api.dataflowkit.com/v1 description: Production server security: - ApiKeyAuth: [] tags: - name: fetch - name: serp - name: parse - name: url-to-pdf - name: url-to-screenshot paths: /fetch: post: tags: - fetch summary: Download web page content description: Use fetch endpoint to download web pages. operationId: fetch requestBody: description: | - _Base fetcher type_ is the right choice for fetching server-side rendered pages. It takes fewer resources and works faster than rendering HTML with _Chrome fetcher_ - But for rendering Angular, React, and Vue.js web sites, you should always specify _Chrome fetcher type_. In this case, headless chrome fetcher renders dynamic Javascript content in the same way as real web browsers would do it. Generate ready-to-run code for your favorite language at [https://dataflowkit.com/render-web](https://dataflowkit.com/render-web) content: application/json: schema: $ref: '#/components/schemas/fetchrequest' required: true responses: "200": description: Returns utf8 encoded web page content. content: text/html; charset=utf-8: example: |-
{ "ip": "178.171.21.156", "city": "Singapore", "region": null, "region_code": null, "country": "SG", "country_code": "SG", "country_code_iso3": "SGP", "country_capital": "Singapore", "country_tld": ".sg", "country_name": "Singapore", "continent_code": "AS", "in_eu": false, "postal": "18", "latitude": 1.2929, "longitude": 103.8547, "timezone": "Asia/Singapore", "utc_offset": "+0800", "country_calling_code": "+65", "currency": "SGD", "currency_name": "Dollar", "languages": "cmn,en-SG,ms-SG,ta-SG,zh-SG", "country_area": 692.7, "country_population": 4701069.0, "asn": "AS9009", "org": "M247 Ltd" } text/plain; charset=utf-8: example: https://dfk-storage-ny3.nyc3.digitaloceanspaces.com/5e5d2864ebb755000188c2c5/ipapi.co_2020-05-06_19%3A32.html?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=KMGZH6JMEM75FTB4EEVL%2F20200506%2Fnyc3%2Fs3%2Faws4_request&X-Amz-Date=20200506T193209Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=86675afdbabc18027cb1d78cb6faf35df2137940b6b8e9e526006bb66afbfae7 "401": description: Unauthorized. `api_key` parameter is missed or incorrect content: text/plain; charset=utf-8: examples: invalid: summary: Invalid API key value: Invalid API key No_api: summary: No API Key provided value: No API Key provided "400": description: Bad Request. Invalid payload specified. content: text/plain; charset=utf-8: examples: noFields: summary: No fields to scrape value: No fields to scrape invalidURL: summary: Invalid request URL value: Invalid request URL "500": description: Internal Server Error is a very general HTTP status code that means something has gone wrong on the web site's server. content: text/plain; charset=utf-8: examples: fetchFailed: summary: Process fetch failed. value: Process fetch failed. singleProcessFailed: summary: Create single process failed value: Create single process failed deprecated: false /serp: post: tags: - serp summary: Collect search results from search engines description: |- To crawl search engine result pages, you can use `/serp` endpoint. SERP collection service extracts a list of organic results, news, images, and more. Specify configuration parameters, such as country or languages, to customize output SERP data. The following search engines are supported - google - google-image - google-news - google-shopping - bing - duckduckgo - baidu - yandex Generate ready-to-run code for your favorite language at [https://dataflowkit.com/serp](https://dataflowkit.com/serp) operationId: serp requestBody: description: |link:dataflowkit.com
,site:twitter.com Bratislava
,inurl:view/view.shtml
, etc.)q
parameter is used by google, Bing, DuckDuckGo.text
is used as query holder by Yandex SE.wd
for this purpose.tbm
is a special Google parameter used to differentiate between search types| tbm=isch
- Google Images,tbm=nws
- Google Newstbm=shop
- Google Shoppinglang_{two-letter lang code}
to specify languages and |
as a delimiter. (e.g., lang_sk|lang_de
will only search Slovak and German pages). See the full list of possible values for Google. setLang=en
parameter.lang=ca
parametersk
for Slovakia, or us
for the United States).| For Google see the Country Codes page for a list of valid values. For Bing cc=at
parameter is used.|
content:
application/json:
schema:
$ref: '#/components/schemas/serprequest'
required: true
responses:
"200":
description: Returns data in the one of the follwing formats - JSON, JSON
Lines, CSV, MS Excel, XML
content:
application/json:
schema:
type: object
example:
- description_text: Dataflow kit visits the web on your behalf, processes
Javascript driven pages in the cloud, return rendered HTML, capture
screenshot or save as PDF. Dataflow Kit services. Headless Chrome
as a service. We automate dynamic web content download using the
Headless Chrome browser. ...
link_href: https://dataflowkit.com/
link_text: Turn Websites into structured data /Dataflow kit
- description_text: Dataflow kit ("DFK") is a Web Scraping framework
for Gophers. It extracts data from web pages, following the specified
CSS Selectors. You can use it in many ways for data mining, data
processing or archiving.
link_href: https://github.com/slotix/dataflowkit
link_text: 'GitHub - slotix/dataflowkit: Extract structured data from
...'
- description_text: The Department of Health - Abu Dhabi (DoH - Abu
Dhabi) leverages the DataFlow Group's specialized Primary Source
Verification (PSV) solutions to screen the credentials of professionals
working within Abu Dhabi's healthcare sector. As the regulative
body of the healthcare sector in Abu Dhabi, DoH - Abu Dhabi ensures
excellence for the ...
link_href: https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/department-of-health-abu-dhabi/
link_text: Department of Health - Abu Dhabi - Dataflow Group
- description_text: Dataflow Kit was added by slotix in Apr 2020 and
the latest update was made in May 2020. The list of alternatives
was updated Apr 2020. It's possible to update the information on
Dataflow Kit or report it as discontinued, duplicated or spam.
link_href: https://alternativeto.net/software/dataflow-kit/
link_text: Dataflow Kit Alternatives and Similar Websites and Apps
...
- description_text: Dataflow Kit Reloaded. We are so excited to introduce
a new, completely re-implemented Dataflow Kit. In particular, we
supplement our legacy custom web scraper with more focused and more
understandable web services for our users.
link_href: https://blog.dataflowkit.com/
link_text: Dataflow Kit Blog
- description_text: The Dubai Health Authority (DHA) leverages the DataFlow
Group's specialized Primary Source Verification (PSV) solutions
to screen the credentials of professionals working within Dubai's
healthcare sector. The DHA is led by a mission to ensure access
to health services, maintain and enhance the quality of these services,
improve the health ...
link_href: https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/dubai-health-authority/
link_text: Dubai Health Authority - Dataflow Group
- description_text: Dataflow Kit Reloaded. We are so excited to introduce
a new, completely re-implemented Dataflow Kit. In particular, we
supplement our legacy custom web scraper with more focused and more
understandable web services for our users.
link_href: https://blog.dataflowkit.com/reloaded/
link_text: Dataflow Kit Reloaded.
- description_text: The Dataflow Kit API allows embedding free COVID-19
live statistics web widget into sites. Methods provide data for
the USA, Spain, or the World. Developers can access live statistics
data through the DFK COVID-19 API for free. They can build widgets,
mobile apps, or integrate them into other applications.
link_href: https://www.programmableweb.com/api/dataflow-kit-rest-api-v1
link_text: Dataflow Kit REST API v1 | ProgrammableWeb
- description_text: Description. Dataflow kit is a Scraping framework
for Gophers. DFK extracts structured data from web pages, following
the specified extractors. It can be used in many ways for data mining,
data processing or archiving.
link_href: https://go.libhunt.com/dataflowkit-alternatives
link_text: Dataflow kit Alternatives - Go Text Processing | LibHunt
- description_text: Point, click and extract. Work on any interactive
site Scrape a website behind a login form Extract data from multiple
pages. Scrape infinite scrolled pages. Crawl details; Extract and
follow lin
link_href: https://www.startupranking.com/dataflow-kit
link_text: Dataflow Kit - Fast extraction of structured data from
...
application/x-ndjson:
example: |
{"description_text":"We offer Dataflow kit Proxies service to get around content download restrictions from specific websites or send requests through proxies to obtain country-specific versions of target websites. Just specify the target country from 100+ supported global locations to send your web/ SERPs scraping API requests.","link_href":"https://dataflowkit.com/","link_text":"Turn Websites into structured data /Dataflow kit"}
{"description_text":"Dataflow kit. Dataflow kit (\"DFK\") is a Web Scraping framework for Gophers. It extracts data from web pages, following the specified CSS Selectors.","link_href":"https://github.com/slotix/dataflowkit","link_text":"GitHub - slotix/dataflowkit: Extract structured data from ..."}
{"description_text":"The Department of Health - Abu Dhabi (DoH - Abu Dhabi) leverages the DataFlow Group's specialized Primary Source Verification (PSV) solutions to screen the credentials of professionals working within Abu Dhabi's healthcare sector. As the regulative body of the healthcare sector in Abu Dhabi, DoH - Abu Dhabi ensures excellence for the ...","link_href":"https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/department-of-health-abu-dhabi/","link_text":"Department of Health - Abu Dhabi - Dataflow Group"}
{"description_text":"Dataflow Kit Reloaded. We are so excited to introduce a new, completely re-implemented Dataflow Kit. In particular, we supplement our legacy custom web scraper with more focused and more understandable web services for our users.","link_href":"https://blog.dataflowkit.com/","link_text":"Dataflow Kit Blog"}
{"description_text":"Coronavirus Developer Resource Center. COVID-19 APIs, SDKs, coverage, open source code and other related dev resources »","link_href":"https://www.programmableweb.com/api/dataflow-kit-covid-19-tracking","link_text":"Dataflow Kit COVID-19 Tracking API | ProgrammableWeb"}
{"description_text":"Dataflow Kit Reloaded. We are so excited to introduce a new, completely re-implemented Dataflow Kit. In particular, we supplement our legacy custom web scraper with more focused and more understandable web services for our users.","link_href":"https://blog.dataflowkit.com/reloaded/","link_text":"Dataflow Kit Reloaded."}
{"description_text":"The Dubai Health Authority (DHA) leverages the DataFlow Group's specialized Primary Source Verification (PSV) solutions to screen the credentials of professionals working within Dubai's healthcare sector. The DHA is led by a mission to ensure access to health services, maintain and enhance the quality of these services, improve the health ...","link_href":"https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/dubai-health-authority/","link_text":"Dubai Health Authority - Dataflow Group"}
{"description_text":"Dataflow Kit Extract information from web sites with a visual point-and-click toolkit. Turn websites into useful data. Automate data workflows on the web, process, and transform data at any scale.","link_href":"https://alternativeto.net/software/dataflow-kit/","link_text":"Dataflow Kit Alternatives and Similar Websites and Apps ..."}
{"description_text":"The Dataflow Kit API allows embedding free COVID-19 live statistics web widget into sites. Developers can access live statistics data through the DFK COVID-19 API for free. They can build widgets, mobile apps, or integrate them into other applications.","link_href":"https://www.programmableweb.com/api/dataflow-kit-0","link_text":"Dataflow Kit API | ProgrammableWeb"}
{"description_text":"Coronavirus info widgets. Embed free COVID-19 live statistics web widget into your site.","link_href":"https://covid-19.dataflowkit.com/","link_text":"COVID-19 Coronavirus live statistics"}
text/csv:
example: |
link_href,link_text,description_text
https://dataflowkit.com/,Turn Websites into structured data /Dataflow kit,"Dataflow kit visits the web on your behalf, processes Javascript driven pages in the cloud, return rendered HTML, capture screenshot or save as PDF. Dataflow Kit services. Headless Chrome as a service. We automate dynamic web content download using the Headless Chrome browser. ..."
https://github.com/slotix/dataflowkit,GitHub - slotix/dataflowkit: Extract structured data from ...,"Dataflow kit (""DFK"") is a Web Scraping framework for Gophers. It extracts data from web pages, following the specified CSS Selectors. You can use it in many ways for data mining, data processing or archiving."
https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/department-of-health-abu-dhabi/,Department of Health - Abu Dhabi - Dataflow Group,"The Department of Health - Abu Dhabi (DoH - Abu Dhabi) leverages the DataFlow Group's specialized Primary Source Verification (PSV) solutions to screen the credentials of professionals working within Abu Dhabi's healthcare sector. As the regulative body of the healthcare sector in Abu Dhabi, DoH - Abu Dhabi ensures excellence for the ..."
https://blog.dataflowkit.com/,Dataflow Kit Blog,"Dataflow Kit Reloaded. We are so excited to introduce a new, completely re-implemented Dataflow Kit. In particular, we supplement our legacy custom web scraper with more focused and more understandable web services for our users."
https://www.programmableweb.com/api/dataflow-kit-covid-19-tracking,Dataflow Kit COVID-19 Tracking API | ProgrammableWeb,"Coronavirus Developer Resource Center. COVID-19 APIs, SDKs, coverage, open source code and other related dev resources »"
https://blog.dataflowkit.com/reloaded/,Dataflow Kit Reloaded.,"Dataflow Kit Reloaded. We are so excited to introduce a new, completely re-implemented Dataflow Kit. In particular, we supplement our legacy custom web scraper with more focused and more understandable web services for our users."
https://corp.dataflowgroup.com/verification-services/start-your-verification/healthcare/dubai-health-authority/,Dubai Health Authority - Dataflow Group,"The Dubai Health Authority (DHA) leverages the DataFlow Group's specialized Primary Source Verification (PSV) solutions to screen the credentials of professionals working within Dubai's healthcare sector. The DHA is led by a mission to ensure access to health services, maintain and enhance the quality of these services, improve the health ..."
https://alternativeto.net/software/dataflow-kit/,Dataflow Kit Alternatives and Similar Websites and Apps ...,"Popular Alternatives to Dataflow Kit for Web, Windows, Mac, Linux, Software as a Service (SaaS) and more. Explore 25 websites and apps like Dataflow Kit, all suggested and ranked by the AlternativeTo user community."
https://www.programmableweb.com/api/dataflow-kit-0,Dataflow Kit API | ProgrammableWeb,"The Dataflow Kit API allows embedding free COVID-19 live statistics web widget into sites. Developers can access live statistics data through the DFK COVID-19 API for free. They can build widgets, mobile apps, or integrate them into other applications."
https://covid-19.dataflowkit.com/,COVID-19 Coronavirus live statistics,Coronavirus info widgets. Embed free COVID-19 live statistics web widget into your site.
"401":
description: Unauthorized. `api_key` parameter is missed or incorrect
content:
text/plain; charset=utf-8:
examples:
invalid:
summary: Invalid API key
value: Invalid API key
No_api:
summary: No API Key provided
value: No API Key provided
"400":
description: Bad Request. Invalid payload specified.
content:
text/plain; charset=utf-8:
examples:
noFields:
summary: No fields to scrape
value: No fields to scrape
invalidURL:
summary: Invalid request URL
value: Invalid request URL
"500":
description: Internal Server Error is a very general HTTP status code that
means something has gone wrong on the web site's server.
content:
text/plain; charset=utf-8:
examples:
fetchFailed:
summary: Process fetch failed.
value: Process fetch failed.
singleProcessFailed:
summary: Create single process failed
value: Create single process failed
deprecated: false
/parse:
post:
tags:
- parse
summary: Extract structured data from web pages
description: "Dataflow kit uses CSS selectors to find HTML elements in web pages\
\ for later data extraction.\n\nOpen [visual point-and-click toolkit](https://dataflowkit.com/dfk)\
\ and click desired elements on a page to specify extracting data. \n\n\n\
\ Then you can send generated payload to `/parse` endpoint. We crawl web pages\
\ and extract data like text, links, or images for you following the specified\
\ rules. \n\n\nExtracted data is returned in CSV, MS Excel, JSON, JSON(Lines)\
\ or XML format.\n"
operationId: parse
requestBody:
description: "### Field types and attributes\n \n- **Text**. Extract human-readable\
\ text from the selected element and all its child elements. HTML tags are\
\ stripped, and only text returned.\n \n- **Link**. Capture link `href`\
\ attribute and link text. Or specify a special _Path_ option for website\
\ navigation. When Path option is true, all other selectors ignored, and\
\ no results from the current page returned.\n \n- **Image**. Image type\
\ extracts `src` (URL) and `alt` attributes of an image\n\n\n***\n### Filters\n\
Filters are used to manipulate text data when extracting.\n\nHere is the\
\ list of available filters\n\n\n- **Trim** removes leading and trailing\
\ white spaces from the _field text or attribute_\n\n- **Normal** leaves\
\ the case and capitalization of text/ attribute exactly as is.\n\n- **UPPERCASE**\
\ makes all of the letters in the Field's text/ attribute uppercase.\n\n\
- **lowercase** makes all of the letters in the Field's text/ attribute\
\ lowercase.\n\n- **Capitalize** capitalizes the first letter of each word\
\ in the Field's text/ attribute\n\n- **Concatinate** joins text array element\
\ into a single string\n\n***\n### Regular Expressions\n\nFor more advanced\
\ text formatting regular expression can be used. Some useful examples are\
\ listed below\n\n\n| Input text | Regex | Result |\n| ---------- | -----\
\ | ------ |\n| price- 10.99€ | [0-9]+\\.[0-9]+
| 10.99 |\n\
| phone- 0 (944) 244-18-22 | \\w+
| 09442441822 |\n\n\n***\n\
### Details. Chaining.\nThe Link field type serves as a navigation link\
\ to a details page containing more data.\nA special _Path_ option is used\
\ for navigation only. When the Path option specified, no results from the\
\ current page returned. But grouped results from details pages will be\
\ pulled instead. You can use chaining functionality of Dataflow Kit scraper\
\ to retrieve all the detail page data at the same time.\n"
content:
application/json:
schema:
$ref: '#/components/schemas/parserequest'
required: true
responses:
"200":
description: Returns data in the one of the follwing formats - JSON, JSON
Lines, CSV, MS Excel, XML
content:
application/json:
schema:
type: object
example:
- Name_href: https://test.dataflowkit.com/persons/1
Name_text: Ethan Aguirre
Number_text: "1"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-1.svg
- Name_href: https://test.dataflowkit.com/persons/2
Name_text: Melodie Holder
Number_text: "2"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-2.svg
- Name_href: https://test.dataflowkit.com/persons/3
Name_text: Meghan Reyes
Number_text: "3"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-3.svg
- Name_href: https://test.dataflowkit.com/persons/4
Name_text: Lane Vinson
Number_text: "4"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-4.svg
- Name_href: https://test.dataflowkit.com/persons/5
Name_text: Philip Tillman
Number_text: "5"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-5.svg
- Name_href: https://test.dataflowkit.com/persons/6
Name_text: Theodore Mcclain
Number_text: "6"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-6.svg
- Name_href: https://test.dataflowkit.com/persons/7
Name_text: Neville Kane
Number_text: "7"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-7.svg
- Name_href: https://test.dataflowkit.com/persons/8
Name_text: Lila Vazquez
Number_text: "8"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-8.svg
- Name_href: https://test.dataflowkit.com/persons/9
Name_text: Ulysses Peters
Number_text: "9"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-9.svg
- Name_href: https://test.dataflowkit.com/persons/10
Name_text: Camden Young
Number_text: "10"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-10.svg
- Name_href: https://test.dataflowkit.com/persons/11
Name_text: Solomon Petty
Number_text: "11"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-11.svg
- Name_href: https://test.dataflowkit.com/persons/12
Name_text: Ahmed Robbins
Number_text: "12"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-12.svg
- Name_href: https://test.dataflowkit.com/persons/13
Name_text: William Olsen
Number_text: "13"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-13.svg
- Name_href: https://test.dataflowkit.com/persons/14
Name_text: Ahmed Vaughan
Number_text: "14"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-14.svg
- Name_href: https://test.dataflowkit.com/persons/15
Name_text: Howard Kemp
Number_text: "15"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-15.svg
- Name_href: https://test.dataflowkit.com/persons/16
Name_text: Channing Flores
Number_text: "16"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-16.svg
- Name_href: https://test.dataflowkit.com/persons/17
Name_text: Brandon Bauer
Number_text: "17"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-17.svg
- Name_href: https://test.dataflowkit.com/persons/18
Name_text: Colt Morrow
Number_text: "18"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-18.svg
- Name_href: https://test.dataflowkit.com/persons/19
Name_text: Kaye Garner
Number_text: "19"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-19.svg
- Name_href: https://test.dataflowkit.com/persons/20
Name_text: Clayton Justice
Number_text: "20"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-20.svg
- Name_href: https://test.dataflowkit.com/persons/21
Name_text: Hiroko Mills
Number_text: "21"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-21.svg
- Name_href: https://test.dataflowkit.com/persons/22
Name_text: Melvin Lloyd
Number_text: "22"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-22.svg
- Name_href: https://test.dataflowkit.com/persons/23
Name_text: Marshall Mayo
Number_text: "23"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-23.svg
- Name_href: https://test.dataflowkit.com/persons/24
Name_text: Rae Casey
Number_text: "24"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-24.svg
- Name_href: https://test.dataflowkit.com/persons/25
Name_text: Astra Snyder
Number_text: "25"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-25.svg
- Name_href: https://test.dataflowkit.com/persons/26
Name_text: Simon Mckinney
Number_text: "26"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-26.svg
- Name_href: https://test.dataflowkit.com/persons/27
Name_text: Graiden Riggs
Number_text: "27"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-27.svg
- Name_href: https://test.dataflowkit.com/persons/28
Name_text: Jaden Stewart
Number_text: "28"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-28.svg
- Name_href: https://test.dataflowkit.com/persons/29
Name_text: Christian Galloway
Number_text: "29"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-29.svg
- Name_href: https://test.dataflowkit.com/persons/30
Name_text: Signe Sykes
Number_text: "30"
Picture_alt: ""
Picture_src: https://test.dataflowkit.com/static/img/avataaars-30.svg
"401":
description: Unauthorized. `api_key` parameter is missed or incorrect
content:
text/plain; charset=utf-8:
examples:
invalid:
summary: Invalid API key
value: Invalid API key
No_api:
summary: No API Key provided
value: No API Key provided
"400":
description: Bad Request. Invalid payload specified.
content:
text/plain; charset=utf-8:
examples:
noFields:
summary: No fields to scrape
value: No fields to scrape
invalidURL:
summary: Invalid request URL
value: Invalid request URL
"500":
description: Internal Server Error is a very general HTTP status code that
means something has gone wrong on the web site's server.
content:
text/plain; charset=utf-8:
examples:
fetchFailed:
summary: Process fetch failed.
value: Process fetch failed.
singleProcessFailed:
summary: Create single process failed
value: Create single process failed
deprecated: false
/convert/url/pdf:
post:
tags:
- url-to-pdf
summary: Save web page as PDF
description: |-
Automate URL to PDF Conversion right in your application.
Specify request parameters like URL, Proxy, and actions to render web pages to PDF using Headless Chrome.
Get resulted PDF even from websites blocked in your area for some reason utilizing our worldwide pool of proxies.
Simulate real-world human interaction with the page. For example, before saving a web page to PDF, you may need to scroll it.
Generate ready-to-run code for your favorite language at [https://dataflowkit.com/url-to-pdf](https://dataflowkit.com/url-to-pdf)
operationId: url-to-pdf
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/url2pdfrequest'
required: true
responses:
"200":
description: A PDF file.
content:
application/pdf:
schema:
type: string
format: binary
text/plain; charset=utf-8:
example: https://dfk-storage-ny3.nyc3.digitaloceanspaces.com/5e5d2864ebb755000188c2c5/url_pdf2020-05-06_20%3A00.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=KMGZH6JMEM75FTB4EEVL%2F20200506%2Fnyc3%2Fs3%2Faws4_request&X-Amz-Date=20200506T200046Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=737740ad471acf45120709a07de7440287e1daa17599a44b8668f6586030e6be
"401":
description: Unauthorized. `api_key` parameter is missed or incorrect
content:
text/plain; charset=utf-8:
examples:
invalid:
summary: Invalid API key
value: Invalid API key
No_api:
summary: No API Key provided
value: No API Key provided
"400":
description: Bad Request. Invalid payload specified.
content:
text/plain; charset=utf-8:
examples:
noFields:
summary: No fields to scrape
value: No fields to scrape
invalidURL:
summary: Invalid request URL
value: Invalid request URL
"500":
description: Internal Server Error is a very general HTTP status code that
means something has gone wrong on the web site's server.
content:
text/plain; charset=utf-8:
examples:
fetchFailed:
summary: Process fetch failed.
value: Process fetch failed.
singleProcessFailed:
summary: Create single process failed
value: Create single process failed
deprecated: false
/convert/url/screenshot:
post:
tags:
- url-to-screenshot
summary: Capture web page Screenshots.
description: |-
Automate URL to Screenshot Conversion right in your application.
Specify request parameters like URL, Proxy, and actions to convert web pages to screenshots using Headless Chrome.
Get resulted pictures in JPG or PNG formats even from websites blocked in your area for some reason utilizing our worldwide pool of proxies.
Simulate real-world human interaction with the page. For example, before capturing a web page, you may need to scroll it.
Generate ready-to-run code for your favorite language at [https://dataflowkit.com/url-to-screenshot](https://dataflowkit.com/url-to-screenshot)
operationId: url-to-screenshot
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/url2screenshotrequest'
required: true
responses:
"200":
description: Returns jpg or png file.
content:
image/png:
schema:
type: string
format: binary
image/jpeg:
schema:
type: string
format: binary
text/plain; charset=utf-8:
example: https://dfk-storage-ny3.nyc3.digitaloceanspaces.com/5e5d2864ebb755000188c2c5/url_screenshot2020-05-06_20%3A02.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=KMGZH6JMEM75FTB4EEVL%2F20200506%2Fnyc3%2Fs3%2Faws4_request&X-Amz-Date=20200506T200305Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=1c1c39fab3e9b0a8806fc4fd6690b335758d12a926a635c08c7301f87ff9279b
"401":
description: Unauthorized. `api_key` parameter is missed or incorrect
content:
text/plain; charset=utf-8:
examples:
invalid:
summary: Invalid API key
value: Invalid API key
No_api:
summary: No API Key provided
value: No API Key provided
"400":
description: Bad Request. Invalid payload specified.
content:
text/plain; charset=utf-8:
examples:
noFields:
summary: No fields to scrape
value: No fields to scrape
invalidURL:
summary: Invalid request URL
value: Invalid request URL
"500":
description: Internal Server Error is a very general HTTP status code that
means something has gone wrong on the web site's server.
content:
text/plain; charset=utf-8:
examples:
fetchFailed:
summary: Process fetch failed.
value: Process fetch failed.
singleProcessFailed:
summary: Create single process failed
value: Create single process failed
deprecated: false
components:
schemas:
fetchrequest:
title: Fetch request
required:
- type
- url
type: object
properties:
url:
type: string
description: Specify URL to download.
type:
type: string
description: If set to `base`, the Base fetcher is used for downloading
web page content. Use `chrome` for fetching content with a Headless chrome
browser. If omitted `base` fetcher is used by default.
enum:
- base
- chrome
proxy:
type: string
description: Specify proxy by adding [country ISO code](https://en.wikipedia.org/wiki/ISO_3166-2)
to `country-` value to send requests through a proxy in the specified
country. Use `country-any` to use random geo-targets.
example: country-sk
waitDelay:
type: number
description: Specify a wait delay (in seconds). This may be useful if certain
elements of the web site need to be rendered after the initial page load.
_(Chrome fetcher type only)_
initialCookies:
type: array
description: The "Initial Cookies" option is useful for crawling websites
that require a login. The simplest solution to get an array of cookies
for specific websites is to use a web browser "EditThisCookie" extension.
Copy a cookie array with "EditThisCookie" and paste it into the "Initial
cookie" field.
items:
$ref: '#/components/schemas/initialCookie'
default: []
ignoreHTTPStatusErrCodes:
type: boolean
description: The HTTP 200 OK success status response code indicates that
the request has succeeded. Sometimes a server returns normal HTML content
even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode
option is useful when you need to force the return of HTML content. Defaults
to "false."
actions:
type: array
description: Use actions to automate manual workflows while rendering web
pages. They simulate real-world human interaction with pages. _(Chrome
fetcher type only)_
items:
$ref: '#/components/schemas/action'
default: []
output:
type: string
description: If set to _file_, the content of downloaded HTML is uploaded
to Dataflow Kit Storage first. Then the link to this file is returned.
Overwise, downloaded content is returned in the response body.
default: buffer
enum:
- buffer
- file
example:
proxy: country-any
type: base
url: https://ipapi.co/json/
actions:
- waitFor:
waitForSelector: :root
output: buffer
serprequest:
title: SERP request
required:
- format
- name
- proxy
- type
- url
type: object
properties:
name:
type: string
description: Collection name.
url:
type: string
description: url holds the link to a Search Engine to use, and other optional
parameters like languages or country.
type:
type: string
description: For SERP requests you should _always_ use `chrome` type to
fetch content with a Headless chrome browser
example: chrome
proxy:
type: string
description: Always specify proxy for sending SERP requests. Add choosen
[country ISO code](https://en.wikipedia.org/wiki/ISO_3166-2) to `country-`
value to send requests through a proxy in the specified country. Use `country-any`
to use random geo-targets.
example: country-any
fields:
type: array
description: |
Specify CSS selectors (patterns) used to gather data from Search Engine Result Pages.
Ready-to-use payloads for collecting search results from the most popular Search Engines are available. These payloads are customizable, though.
items:
$ref: '#/components/schemas/field'
pageNum:
type: integer
description: Specify number of pages to crawl.
default: 1
format:
title: Format
type: string
description: Extracted data is returned either in CSV, MS Excel, JSON, JSON(Lines)
or XML format.
enum:
- csv
- json
- jsonl
- excel
- xml
example:
name: duckduckgo
request:
url: https://duckduckgo.com/?q=Dataflow+kit&ia=web
proxy: country-any
type: chrome
fields:
- name: link
selector: .result__a
attrs:
- href
- text
type: 2
filters:
- name: trim
- name: description
selector: .js-result-snippet
attrs:
- text
type: 1
filters:
- name: trim
commonParent: div[id].result
format: json
parserequest:
title: Parse request
required:
- fields
- format
- name
- proxy
- type
- url
type: object
properties:
name:
type: string
description: Collection name.
request:
$ref: '#/components/schemas/fetchrequest'
commonParent:
type: string
description: Specifies common ancestor block for a set of fields used to
extract data from a web page. _(CSS Selector)_
example: .common-block
fields:
type: array
description: |
Define a set of fields used to extract data from a web page. A Field represents a given chunk of extracted data from every block on each page.
items:
$ref: '#/components/schemas/field'
paginator:
$ref: '#/components/schemas/Paginator'
path:
title: Path
type: boolean
description: Path is a special parameter specifying navigation pages only.
It collects information from detailed pages. No results from the current
page return. Defaults to false.
default: false
format:
title: Format
type: string
description: Extracted data is returned either in CSV, MS Excel, JSON, JSON(Lines)
or XML format.
enum:
- csv
- json
- jsonl
- excel
- xml
example:
name: test.dataflowkit.com
request:
url: https://test.dataflowkit.com/persons/page-0
type: chrome
fields:
- name: Number
selector: .badge-primary
attrs:
- text
type: 1
filters:
- name: trim
- name: Name
selector: '#cards a'
attrs:
- href
- text
type: 2
filters:
- name: trim
- name: Picture
selector: .card-img-top
attrs:
- src
- alt
type: 0
filters:
- name: trim
paginator:
nextPageSelector: .page-item:nth-child(2) .page-link
pageNum: 2
path: false
format: json
url2pdfrequest:
title: URL to PDF request
required:
- url
type: object
properties:
url:
type: string
description: The full URL address (including HTTP/HTTPS) of a web page that
you want to save as PDF
proxy:
type: string
description: Specify proxy by adding [country ISO code](https://en.wikipedia.org/wiki/ISO_3166-2)
to `country-` value to send requests through a proxy in the specified
country. Use `country-any` to use random geo-targets.
example: country-any
landscape:
type: boolean
description: Paper orientation. Parameter landscape = false means portrait
orientation. Set landscape to true for landscape page oriantation.
default: false
paperSize:
type: string
description: Page size parameter consists of the most popular page formats.
default: A4
enum:
- A3
- A4
- A5
- A6
- Letter
- Legal
- Tabloid
printBackground:
type: boolean
description: Print background graphics in the PDF.
default: false
pageRanges:
type: string
description: Specify page ranges to convert. Defaults to the empty value,
which means convert all pages.
example: 1-4, 6, 10-12
scale:
type: number
description: By default, PDF document content is generated according to
dimensions of the original web page content. Using the `scale` parameter,
you can specify a custom zoom factor from 0.1 to 5.0 of the webpage rendering.
default: 1
printHeaderFooter:
type: boolean
description: printHeaderFooter parameter consists of the date, name of
the web page, the page URL, and how many pages the document you are printing.
default: false
marginTop:
type: number
description: Top Margin of the PDF (in inches)
default: 0.4
marginLeft:
type: number
description: Left Margin of the PDF (in inches)
default: 0.4
marginRight:
type: number
description: Right Margin of the PDF (in inches)
default: 0.4
marginBottom:
type: number
description: Bottom Margin of the PDF (in inches)
default: 0.4
waitDelay:
type: number
description: Specify a wait delay (in seconds). This may be useful if certain
elements of the web site need to be rendered after the initial page load.
default: 0.5
initialCookies:
type: array
description: The "Initial Cookies" option is useful for crawling websites
that require a login. The simplest solution to get an array of cookies
for specific websites is to use a web browser "EditThisCookie" extension.
Copy a cookie array with "EditThisCookie" and paste it into the "Initial
cookie" field.
items:
$ref: '#/components/schemas/initialCookie'
default: []
ignoreHTTPStatusErrCodes:
type: boolean
description: The HTTP 200 OK success status response code indicates that
the request has succeeded. Sometimes a server returns normal HTML content
even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode
option is useful when you need to force the return of HTML content. Defaults
to "false."
actions:
type: array
description: Use actions to automate manual workflows while rendering web
pages. They simulate real-world human interaction with pages.
items:
$ref: '#/components/schemas/action'
default: []
output:
type: string
description: If set to _file_, the resulted PDF is uploaded to Dataflow
Kit Storage first. Then the link to this file is returned. Overwise, PDF
content is returned in the response body.
default: buffer
enum:
- buffer
- file
example:
url: https://dataflowkit.com
paperSize: A4
landscape: false
printBackground: false
printHeaderFooter: false
scale: 1
pageRanges: ""
marginTop: 0.4
marginLeft: 0.4
marginRight: 0.4
marginBottom: 0.4
waitDelay: 0.5
url2screenshotrequest:
title: URL to Screenshot request
required:
- url
type: object
properties:
url:
type: string
description: The full URL address (including HTTP/HTTPS) of a web page that
you want to capture
proxy:
type: string
description: Specify proxy by adding [country ISO code](https://en.wikipedia.org/wiki/ISO_3166-2)
to `country-` value to send requests through a proxy in the specified
country. Use `country-any` to use random geo-targets.
example: country-any
fullPage:
type: boolean
description: takes a screenshot of a full web page. It ignores offsetX,
offsety, width and height argument values.
default: false
width:
type: integer
description: Rectangle width in device independent pixels (dip).
default: 800
height:
type: integer
description: Rectangle height in device independent pixels (dip).
default: 600
offsetx:
type: integer
description: X offset in device independent pixels (dip).
default: 0
offsety:
type: integer
description: Y offset in device independent pixels (dip).
default: 0
printBackground:
type: boolean
description: Print background graphics in the PDF.
default: false
clipSelector:
type: string
description: Captures a screenshot of specified CSS element on a web page.
example: '#css-element'
format:
type: string
description: Sets the Format of output image
default: png
enum:
- png
- jpeg
quality:
type: integer
description: Sets the Quality of output image. Compression quality from
range [0..100] (jpeg only).
default: 80
scale:
type: number
description: Image scale factor. range [0.1 .. 3]
default: 1
waitDelay:
type: number
description: Specify a wait delay (in seconds). This may be useful if certain
elements of the web site need to be rendered after the initial page load.
default: 0.5
initialCookies:
type: array
description: The "Initial Cookies" option is useful for crawling websites
that require a login. The simplest solution to get an array of cookies
for specific websites is to use a web browser "EditThisCookie" extension.
Copy a cookie array with "EditThisCookie" and paste it into the "Initial
cookie" field.
items:
$ref: '#/components/schemas/initialCookie'
default: []
ignoreHTTPStatusErrCodes:
type: boolean
description: The HTTP 200 OK success status response code indicates that
the request has succeeded. Sometimes a server returns normal HTML content
even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode
option is useful when you need to force the return of HTML content. Defaults
to "false."
actions:
type: array
description: Use actions to automate manual workflows while rendering web
pages. They simulate real-world human interaction with pages.
items:
$ref: '#/components/schemas/action'
default: []
output:
type: string
description: If set to _file_, the resulted screenshot is uploaded to Dataflow
Kit Storage first. Then the link to this file is returned. Overwise, web
site screenshot is returned in the response body.
default: buffer
enum:
- buffer
- file
example:
url: https://dataflowkit.com
width: 1920
height: 1080
scale: 1
format: jpeg
quality: 80
waitDelay: 0.5
field:
title: Field
required:
- attrs
- name
- selector
- type
type: object
properties:
name:
type: string
description: Field name is used to aggregate results.
selector:
type: string
description: Selector represents a CSS selector for data extraction within
the given block.
example: '#cards a'
type:
type: integer
description: Selector type. ( 0 - image, 1 - text, 2 - link)
enum:
- 0
- 1
- 2
attrs:
type: array
description: A set of attributes to extract from a Field. Find more information
about attributes
items:
type: string
enum:
- text
- href
- src
- alt
filters:
type: array
description: Filters are used to pre-processing of text data when extracting.
items:
anyOf:
- type: object
properties:
name:
type: string
enum:
- trim
- normal
- uppercase
- lowercase
- capitalize
- concatinate
- type: object
properties:
name:
type: string
example: regex
param:
type: string
example: '[\\d.]+'
details:
description: Details themself represent independent Parse request that extracts
data from linked pages.
allOf:
- $ref: '#/components/schemas/parserequest'
initialCookie:
title: initialCookie
type: object
properties:
domain:
type: string
example: .twitter.com
expirationDate:
type: number
example: 1762900726.409761
hostOnly:
type: boolean
example: false
httpOnly:
type: boolean
example: false
name:
type: string
example: auth_token
path:
type: string
example: /
sameSite:
type: string
enum:
- unspecified
- strict
- lax
- no_restriction
secure:
type: boolean
example: true
session:
type: boolean
example: true
storeID:
type: string
example: "1"
value:
type: string
example: 46fd9fed1ab8b0b0e231ac3f
id:
type: number
example: 1
description: InitialCookie structure keep cookies that optionally can be passed
to the new fetcher crawl a website that requires a login. Generate Cookies
array with EditThisCookie chrome extension.
action:
title: Action
type: object
anyOf:
- title: input
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector
example: '#search-form-editbox'
value:
type: string
description: The value to input.
example: web scraper
ignoreIfNotPresent:
type: boolean
example: false
description: Sets the value of an input field as if you had typed it in. You
can also set the value of combo boxes, checkboxes, etc., using this action.
In these cases, the value must be the value of the selected option, not
visible text.
- title: sendKeys
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector
example: '#search-form-editbox'
value:
type: string
description: Sequence of keys to send. Keys can include keystrokes such
as ALT+A, ENTER, BACKSPACE, etc.
example: web scraper
ignoreIfNotPresent:
type: boolean
example: false
description: The Send Keys action simulates real user input of key by key
into a given string. It mimics real user behavior, such as the inability
to type into invisible or read-only DOM elements. This action is useful
for cases where explicit keystroke events are required, like auto-completing
combo boxes. Unlike a similar 'input' action, which forces a specified value
directly into an input selector, this action does not overwrite existing
content.
- title: click
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector
example: .click-me
ignoreIfNotPresent:
type: boolean
description: This optional parameter is useful when the target element
occasionally may not be present in the DOM.
example: false
skipLastIteration:
type: boolean
description: It is only used for click action inside a loop only. Skips
the last iteration.
example: true
description: Clicks on a target element (such as a link, button, checkbox,
or radio button) with specified CSS Selector.
- title: doubleClick
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector
example: .double-click-me
ignoreIfNotPresent:
type: boolean
description: This optional parameter is useful when the target element
occasionally may not be present in the DOM.
example: false
skipLastIteration:
type: boolean
description: It is only used for click action inside a loop only. Skips
the last iteration.
example: true
description: Double clicks on a target element (such as a link, button, checkbox,
or radio button) with specified CSS Selector.
- title: jsclick
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector for the target element.
example: .js-click-me
ignoreIfNotPresent:
type: boolean
description: This optional parameter is useful when the target element
occasionally may not be present in the DOM.
example: false
skipLastIteration:
type: boolean
description: It is only used for click action inside a loop only. Skips
the last iteration.
example: true
description: Click on an element with the specified CSS Selector. JS Click
internally invokes a script (Javascript) that clicks the element.
- title: submit
type: object
properties:
selector:
type: string
description: Must be an any valid CSS Selector inside the parent form
to submit.
example: .some-element-inside-form
description: Submit the specified form. This action is useful for forms without
explicit submit buttons, such as single-input Search forms.
- title: waitVisible
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector for the target element.
example: :root
description: Wait for the target element to become visible on the page.
- title: waitNotVisible
type: object
properties:
selector:
type: string
description: Must be a valid CSS Selector for the target element.
example: '#some-element'
description: Wait for the target element to become invisible on the page.
- title: pause
type: object
properties:
waitDelay:
type: string
description: Wait time (in milliseconds).
example: "5000"
description: Wait for the specified amount of time.
- title: execute
type: object
properties:
script:
type: string
description: The JavaScript snippet to run
example: console.log("It works!")
description: Executes the Javascript passes as 'script' parameter
- title: loop .. times
type: object
properties:
actions:
type: array
description: list of actions combined in the loop are executed step-by-step
items:
$ref: '#/components/schemas/action'
default: []
times:
type: number
description: the number of times to execute the wrapped actions within
the 'loop .. times' construction.
example: 5
description: Loop action combines a set of actions and executes it as many
times as specified in the "times" parameter.
- title: getcontent
type: object
properties:
skipLastIteration:
type: boolean
description: It is only used for loop actions only. Skips the last iteration.
example: true
description: Sometimes it is necessary to retrieve the HTML content of a web
page multiple times in a single request. This action is for that.
- title: scroll
type: object
properties:
times:
type: integer
description: The number of times to scroll down a web page.
example: 3
selector:
type: string
description: Some websites require clicking 'More' button while scrolling
a page. Put here 'More' button valid CSS Selector.
example: .more-button
scrollByPixels:
type: number
description: Scrolls a web page by the number of pixels specified by 'scrollByPixels'
parameter.
example: 650
scrollingElementSelector:
type: string
description: Optionally specify here a valid CSS Selector of scrolling
element.
example: '#scroll-panel'
description: Scroll a page down to load more content, simulating user interaction
with infinite scrolled pages. Or specify the element's CSS Selector to click
for loading more content.
Paginator:
type: object
properties:
nextPageSelector:
type: string
example: .page-link
pageNum:
type: integer
example: 10
description: Specify _Next link_ paginator on pages containing a link pointing
to the next page. The next page link is extracted from a document by querying
href attribute of a given element's CSS selector.
responses:
NotFound:
description: The specified resource was not found
content:
text/plain; charset=utf-8:
example: 404. Not found
Unauthorized:
description: Unauthorized. `api_key` parameter is missed or incorrect
content:
text/plain; charset=utf-8:
examples:
invalid:
summary: Invalid API key
value: Invalid API key
No_api:
summary: No API Key provided
value: No API Key provided
BadRequest:
description: Bad Request. Invalid payload specified.
content:
text/plain; charset=utf-8:
examples:
noFields:
summary: No fields to scrape
value: No fields to scrape
invalidURL:
summary: Invalid request URL
value: Invalid request URL
InternalServerError:
description: Internal Server Error is a very general HTTP status code that means
something has gone wrong on the web site's server.
content:
text/plain; charset=utf-8:
examples:
fetchFailed:
summary: Process fetch failed.
value: Process fetch failed.
singleProcessFailed:
summary: Create single process failed
value: Create single process failed
securitySchemes:
ApiKeyAuth:
type: apiKey
name: api_key
in: query