Skip to main content

Salesforce Connector (Technical Reference)

Philippe Trussart avatar
Written by Philippe Trussart
Updated over a week ago

Generated from connector source on 2025-09-04

Overview

The Salesforce connector pulls data from the Salesforce API into your Bytespree data lake.

Key characteristics:

  • API base: resolved from instance_url after login.

  • API version: v50.0.

  • Authentication: OAuth 2.0 (refresh token grant). Initial authorization code exchange supported.

  • Discovery: dynamic; enumerates all sObjects that are both queryable and replicateable, excluding a small set (see below).

  • Schema: dynamic per sObject using the Describe endpoint; field names are lowercased. Field lists are emitted in alphabetical order.

  • Uniqueness: per-stream unique_keys: ['id'] (Salesforce record Id).

  • Query engine: SOQL via /queryAll, ordered by Id.

  • Wide-object handling: columns split across two queries if >450 fields (well below Salesforce’s 900-column SELECT limit) and then merged by Id.

Setup & Configuration

Provide the following settings (populated automatically during auth where possible):

{ "access_token": "<ephemeral; set post-login>", "client_id": "<Connected App Consumer Key>", "client_secret": "<Connected App Consumer Secret>", "environment_url": "login.salesforce.com | test.salesforce.com | <custom hostname>", "environment_url_custom": "<optional https://your-domain.my.salesforce.com>", "instance_url": "<set post-login>", "oauth_callback": "<URL used during initial authorization>", "refresh_token": "<OAuth refresh token>" }

Environment / OAuth host resolution

environment_url (or environment_url_custom) is used to build the token URL: https://{hostname}/services/oauth2/token.

Discovery & Object Selection

  • Lists sObjects from /services/data/v50.0/sobjects/, then includes only those with:

    • queryable == true and replicateable == true, and

    • not in EXCLUDED_TABLES = ['IdeaComment', 'Vote'].

  • For each included object, the connector calls Describe: /sobjects/{object}/describe/ to derive the schema and types.

  • The connector also records which fields are indexed (via FieldDefinition) for reference.

Schema Handling & Types

  • Field names are lowercased; record keys are lowercased before emission.

  • Field ordering in emitted schemas is alphabetical.

  • Type mapping:

Salesforce type

Connector type

int

integer

boolean

boolean

currency, percent, double

number (float)

datetime, date

string (ISO-8601 text)

address, junctionidlist

object (JSON)

anyType and all others

string

  • A column_length hint is set using the field’s precision/length (defaults to 256 if length is 0).

Sync Behavior

  • Incremental filter: If a usable date field exists on the object, the connector picks the first available from:

    1. lastmodifieddate, 2) createddate, 3) systemmodstamp.

    • Query 1 filters with WHERE {filter_column} >= :timestamp.

    • Query 2 (the "second half" of columns) always filters on lastmodifieddate.

  • Timestamp source: The per-object bookmark is stored in connector state at state.bookmarks.<object>.last_started (ISO 8601). On completion, the connector writes the current run’s timestamp back to that bookmark.

  • Ordering: Both queries ORDER BY id for deterministic pagination.

  • Pagination: Uses REST query pagination via nextRecordsUrl; continues until both queries return no nextRecordsUrl.

  • Query target: Uses /queryAll to include records that may be archived or soft-deleted in Salesforce.

  • Merging wide objects: If an object has more than 450 fields, the connector runs two SELECTs splitting the column list, merges the two result sets by Id, and emits combined records.

Rate Limits & Usage Guardrail

  • The connector can check org limits at /limits. Every 500 API calls it checks DailyApiRequests and aborts if usage exceeds 70% of the daily quota.

Error Handling & Retries

  • HTTP requests are retried up to 5 attempts with a 15s delay by default.

  • For token exchange:

    • Uses OAuth refresh token grant to obtain access_token and instance_url.

    • If the refresh token is invalid/expired, the connector surfaces a clear error requiring re-authorization.

  • Responses are validated; unhandled status codes trigger retry up to the max attempts; messages are surfaced using Salesforce’s standard error texts (e.g., 400/401/403/414/431/500).

Known Limitations / Notes

  • Dynamic schema: New fields added in Salesforce appear automatically on next Discovery; field types are inferred from Describe.

  • Date filter in split queries: The second query is hardcoded to filter on lastmodifieddate. If an object lacks this field but has createddate/systemmodstamp, the second query may not filter as expected.

  • Initial bookmark: The connector expects a last_started bookmark per object. If unset, the computed timestamp may evaluate to the current time, yielding an empty incremental window. Initialize bookmarks appropriately.

  • URL length / column limits: To avoid the 16,384‑byte URI limit and Salesforce’s 900‑column SELECT limit, the connector caps each column chunk at 450 fields.

  • Object eligibility: Only queryable + replicateable sObjects will be synced; objects lacking either capability are skipped by design.

  • Lowercasing: All emitted field names are lowercase; consumers should not expect Salesforce’s mixed‑case API names.

  • Deleted/archived records: Because /queryAll is used, records in the Recycle Bin or archived activities may be included unless filtered downstream.

Endpoints Used (representative)

  • Token: https://{host}/services/oauth2/token

  • List objects: /services/data/v50.0/sobjects/

  • Describe object: /services/data/v50.0/sobjects/{object}/describe/

  • Indexed fields: /services/data/v50.0/query/?q=SELECT+QualifiedApiName+FROM+FieldDefinition+WHERE+IsIndexed+=+true+AND+EntityDefinition.QualifiedApiName+=+'{object}'

  • Limits: /services/data/v50.0/limits/

  • Query (paged): /services/data/v50.0/queryAll/?q=SELECT+<fields>+FROM+{object}+WHERE+<filter>+ORDER+BY+Id

State & Metrics

  • State write‑back: state.bookmarks.<object>.last_started = <run start timestamp> at the end of each object sync.

  • Metrics: Emits a metric record_count per object with the number of records written.

Notable Behaviors

  • Merges multi‑query results by Id across pages before writing.

  • Logs object/index/column diagnostics during Discovery.

  • Performs an access‑token refresh prior to run if needed and persists instance_url + access_token to settings for reuse.


Quick Operator Tips

  • To limit API usage, consider turning off the limits check only if you fully control concurrency and quotas.

  • If you see empty increments on first run, seed bookmarks to an older date (e.g., your go‑live date) before kicking off a backfill.

  • If a specific object repeatedly fails, verify it is both queryable and replicateable, and that it includes one of the recognized date fields.

Did this answer your question?