SDKs overview

import.io provides Software Development Kits (SDKs) and libraries to enable developers to programmatically interact with its web data extraction platform. These tools streamline the process of integrating collected web data into proprietary applications, databases, or analytics systems. The primary function of import.io SDKs is to facilitate access, retrieval, and management of data extracted through their platform or managed services. This programmatic access is particularly useful for enterprises requiring automated, large-scale data ingestion for competitive intelligence, market research, and price monitoring. While import.io primarily offers a managed data service and a platform solution, the SDKs serve as the bridge for technical teams to automate data workflows and embed web-scraped data directly into their operational systems. Developers leveraging these SDKs can automate data export, query specific datasets, and initiate data re-extraction tasks without manual intervention, supporting continuous data feeds for various business intelligence applications.

The availability of SDKs across multiple programming languages ensures compatibility with diverse development environments, allowing teams to utilize their preferred language for integration. These SDKs abstract the underlying RESTful API, simplifying complex HTTP requests and responses into native language objects and methods. This abstraction reduces development time and effort, letting developers focus on data utilization rather than API communication specifics. The SDKs are designed to handle authentication, error handling, and data parsing, providing a more robust and efficient integration experience compared to direct API calls for most use cases. For example, a Python SDK might provide methods to fetch a list of extractors, retrieve data from a specific extractor, or check the status of a data crawl, all through simple function calls.

Official SDKs by language

import.io maintains official SDKs for several popular programming languages, offering developers direct pathways to integrate with their platform. These SDKs are designed to provide stable, supported, and documented interfaces for interacting with the import.io API. The official offerings typically include clients for languages commonly used in enterprise data engineering and application development. These SDKs typically allow users to list available extractors, retrieve extracted data, manage data exports, and monitor job statuses. For detailed documentation and the most current versions, developers should consult the official import.io developer documentation.

Language Package/Library Install Command Example Maturity
Python importio pip install importio Stable
Java com.import.io Maven/Gradle dependency Stable
Node.js @import.io/client npm install @import.io/client Stable
.NET (C#) Importio.Client Install-Package Importio.Client Stable

Each official SDK provides a consistent set of functionalities, ensuring that developers can perform common data extraction tasks regardless of their chosen language. For instance, all SDKs will provide mechanisms to authenticate with the import.io platform using API keys, submit data extraction jobs, poll for job completion, and retrieve the resulting structured data. The design principles often follow idiomatic patterns for each language, making the SDKs feel natural to developers familiar with that ecosystem. For example, the Python SDK might use iterators for data retrieval, while the Java SDK might employ futures or reactive programming patterns for asynchronous operations.

Installation

Installation of import.io SDKs typically follows standard package management practices for each respective programming language. Below are common installation methods for the primary official SDKs. Ensure that you have the appropriate package manager installed and configured for your development environment.

Python

The Python SDK is distributed via Python Package Index (PyPI). Use pip for installation:

pip install importio

It is often recommended to install Python packages within a Python virtual environment to manage dependencies effectively and avoid conflicts with other projects.

Java

For Java projects, the import.io client library is typically managed through build automation tools like Maven or Gradle. Add the appropriate dependency declaration to your pom.xml (for Maven) or build.gradle (for Gradle) file. Specific version numbers should be obtained from the official import.io developer documentation.

Maven Example

<dependencies>
    <dependency>
        <groupId>com.import.io</groupId>
        <artifactId>importio-client</artifactId>
        <version>[latest_version]</version>
    </dependency>
</dependencies>

Gradle Example

dependencies {
    implementation 'com.import.io:importio-client:[latest_version]'
}

Node.js

The Node.js SDK is available via npm (Node Package Manager). Install it using the npm install command:

npm install @import.io/client

Ensure Node.js and npm are installed on your system. You can verify their installation by running node -v and npm -v in your terminal.

.NET (C#)

For .NET applications, the SDK is distributed as a NuGet package. Use the NuGet Package Manager Console in Visual Studio or the .NET CLI:

Package Manager Console

Install-Package Importio.Client

.NET CLI

dotnet add package Importio.Client

After installation, you can include the necessary namespaces in your C# code to access the SDK functionalities.

Quickstart example

This quickstart example demonstrates how to retrieve data using a hypothetical Python SDK for import.io. The specific API calls and object structures may vary slightly across different SDK versions and languages, but the general workflow of authentication, specifying an extractor, and retrieving data remains consistent. For this example, assume you have an active import.io account, an API key, and an existing data extractor configured on the platform.

To begin, ensure you have the Python SDK installed as described in the installation section. Replace YOUR_API_KEY and YOUR_EXTRACTOR_ID with your actual credentials and the ID of your target extractor.

import importio

# Initialize the client with your API key
# You can find your API key in your import.io account settings.
client = importio.Client(api_key="YOUR_API_KEY")

# Define the ID of the extractor you want to query.
# This ID is available in the import.io platform for each extractor.
extractor_id = "YOUR_EXTRACTOR_ID"

try:
    # Fetch data from the specified extractor.
    # The 'get_data' method typically returns a generator or an iterable list of records.
    print(f"Fetching data from extractor: {extractor_id}...")
    data_iterator = client.get_data(extractor_id)

    # Iterate through the retrieved data and print each record.
    # Each 'record' will usually be a dictionary or a custom object representing a row of extracted data.
    record_count = 0
    for record in data_iterator:
        print(record)
        record_count += 1
        if record_count >= 5: # Limit to 5 records for demonstration purposes
            break

    if record_count == 0:
        print("No data found for this extractor or the query returned no results.")
    else:
        print(f"Successfully retrieved {record_count} records.")

except importio.exceptions.ImportIOError as e:
    print(f"An import.io specific error occurred: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

# For more advanced scenarios, such as passing input parameters to an extractor (e.g., start URLs, search terms),
# you would typically pass these as a dictionary to a method like 'run_extractor' or 'schedule_extractor'.
# Example (conceptual):
# job = client.run_extractor(
#     extractor_id,
#     inputs=[
#         {"url": "https://example.com/page1"},
#         {"url": "https://example.com/page2"}
#     ]
# )
# print(f"Extractor job initiated: {job.id}")
# # Then, you would poll the job status and retrieve results once complete.

This example initiates a client, targets a specific extractor by its ID, and retrieves a limited number of records. In a production environment, you would typically process all records, potentially filter them, or store them in a database. Error handling is included to catch potential issues with API communication or data retrieval. For comprehensive details on all available methods, parameters, and error codes, refer to the official import.io documentation.

Community libraries

While import.io provides official SDKs, the broader developer community may also contribute unofficial libraries or wrappers that extend functionality or offer alternative interfaces. These community-driven projects can sometimes fill gaps for specific use cases, provide language support not covered by official SDKs, or offer different abstraction layers. However, community libraries typically come with varying levels of support, documentation, and maintenance compared to official offerings.

When considering a community library, developers should evaluate its active maintenance, the size and responsiveness of its community, and its compatibility with the latest import.io API versions. Resources like GitHub, Stack Overflow, and technical blogs are common places to discover such projects. For example, a developer might create a wrapper in a less common language, or a specialized tool for integrating import.io data with a particular analytics platform that isn't directly supported by an official SDK. These contributions demonstrate the flexibility of the underlying import.io API, which can be accessed via standard HTTP requests and parsed, as detailed in specifications like the W3C HTTP/1.1 specification.

Before integrating any third-party library, it is advisable to review its source code for security vulnerabilities and ensure it adheres to best practices for API interaction and credential handling. While import.io primarily focuses on enterprise solutions with extensive managed services, the open nature of APIs allows for these community extensions. Developers seeking specific integrations not found in official SDKs often resort to direct API calls or develop custom wrappers tailored to their environment, which can then be shared with the community. Always cross-reference any community library's claims against the official import.io API documentation to ensure accuracy and functionality.