Generated from connector source on 2025-09-04
Overview
The Salesforce connector pulls data from the Salesforce API into your Bytespree data lake.
Key characteristics:
API base: resolved from
instance_url
after login.API version:
v50.0
.Authentication: OAuth 2.0 (refresh token grant). Initial authorization code exchange supported.
Discovery: dynamic; enumerates all sObjects that are both queryable and replicateable, excluding a small set (see below).
Schema: dynamic per sObject using the Describe endpoint; field names are lowercased. Field lists are emitted in alphabetical order.
Uniqueness: per-stream
unique_keys: ['id']
(Salesforce record Id).Query engine: SOQL via
/queryAll
, ordered byId
.Wide-object handling: columns split across two queries if >450 fields (well below Salesforce’s 900-column SELECT limit) and then merged by
Id
.
Setup & Configuration
Provide the following settings (populated automatically during auth where possible):
{ "access_token": "<ephemeral; set post-login>", "client_id": "<Connected App Consumer Key>", "client_secret": "<Connected App Consumer Secret>", "environment_url": "login.salesforce.com | test.salesforce.com | <custom hostname>", "environment_url_custom": "<optional https://your-domain.my.salesforce.com>", "instance_url": "<set post-login>", "oauth_callback": "<URL used during initial authorization>", "refresh_token": "<OAuth refresh token>" }
Environment / OAuth host resolution
environment_url
(or environment_url_custom
) is used to build the token URL: https://{hostname}/services/oauth2/token
.
Discovery & Object Selection
Lists sObjects from
/services/data/v50.0/sobjects/
, then includes only those with:queryable == true
andreplicateable == true
, andnot in
EXCLUDED_TABLES = ['IdeaComment', 'Vote']
.
For each included object, the connector calls Describe:
/sobjects/{object}/describe/
to derive the schema and types.The connector also records which fields are indexed (via
FieldDefinition
) for reference.
Schema Handling & Types
Field names are lowercased; record keys are lowercased before emission.
Field ordering in emitted schemas is alphabetical.
Type mapping:
Salesforce type | Connector type |
|
|
|
|
|
|
|
|
|
|
|
|
A
column_length
hint is set using the field’sprecision
/length
(defaults to 256 if length is 0).
Sync Behavior
Incremental filter: If a usable date field exists on the object, the connector picks the first available from:
lastmodifieddate
, 2)createddate
, 3)systemmodstamp
.
Query 1 filters with
WHERE {filter_column} >= :timestamp
.Query 2 (the "second half" of columns) always filters on
lastmodifieddate
.
Timestamp source: The per-object bookmark is stored in connector state at
state.bookmarks.<object>.last_started
(ISO 8601). On completion, the connector writes the current run’s timestamp back to that bookmark.Ordering: Both queries
ORDER BY id
for deterministic pagination.Pagination: Uses REST query pagination via
nextRecordsUrl
; continues until both queries return nonextRecordsUrl
.Query target: Uses
/queryAll
to include records that may be archived or soft-deleted in Salesforce.Merging wide objects: If an object has more than 450 fields, the connector runs two SELECTs splitting the column list, merges the two result sets by
Id
, and emits combined records.
Rate Limits & Usage Guardrail
The connector can check org limits at
/limits
. Every 500 API calls it checks DailyApiRequests and aborts if usage exceeds 70% of the daily quota.
Error Handling & Retries
HTTP requests are retried up to 5 attempts with a 15s delay by default.
For token exchange:
Uses OAuth refresh token grant to obtain
access_token
andinstance_url
.If the refresh token is invalid/expired, the connector surfaces a clear error requiring re-authorization.
Responses are validated; unhandled status codes trigger retry up to the max attempts; messages are surfaced using Salesforce’s standard error texts (e.g., 400/401/403/414/431/500).
Known Limitations / Notes
Dynamic schema: New fields added in Salesforce appear automatically on next Discovery; field types are inferred from Describe.
Date filter in split queries: The second query is hardcoded to filter on
lastmodifieddate
. If an object lacks this field but hascreateddate
/systemmodstamp
, the second query may not filter as expected.Initial bookmark: The connector expects a
last_started
bookmark per object. If unset, the computed timestamp may evaluate to the current time, yielding an empty incremental window. Initialize bookmarks appropriately.URL length / column limits: To avoid the 16,384‑byte URI limit and Salesforce’s 900‑column SELECT limit, the connector caps each column chunk at 450 fields.
Object eligibility: Only queryable + replicateable sObjects will be synced; objects lacking either capability are skipped by design.
Lowercasing: All emitted field names are lowercase; consumers should not expect Salesforce’s mixed‑case API names.
Deleted/archived records: Because
/queryAll
is used, records in the Recycle Bin or archived activities may be included unless filtered downstream.
Endpoints Used (representative)
Token:
https://{host}/services/oauth2/token
List objects:
/services/data/v50.0/sobjects/
Describe object:
/services/data/v50.0/sobjects/{object}/describe/
Indexed fields:
/services/data/v50.0/query/?q=SELECT+QualifiedApiName+FROM+FieldDefinition+WHERE+IsIndexed+=+true+AND+EntityDefinition.QualifiedApiName+=+'{object}'
Limits:
/services/data/v50.0/limits/
Query (paged):
/services/data/v50.0/queryAll/?q=SELECT+<fields>+FROM+{object}+WHERE+<filter>+ORDER+BY+Id
State & Metrics
State write‑back:
state.bookmarks.<object>.last_started = <run start timestamp>
at the end of each object sync.Metrics: Emits a metric
record_count
per object with the number of records written.
Notable Behaviors
Merges multi‑query results by
Id
across pages before writing.Logs object/index/column diagnostics during Discovery.
Performs an access‑token refresh prior to run if needed and persists
instance_url
+access_token
to settings for reuse.
Quick Operator Tips
To limit API usage, consider turning off the limits check only if you fully control concurrency and quotas.
If you see empty increments on first run, seed bookmarks to an older date (e.g., your go‑live date) before kicking off a backfill.
If a specific object repeatedly fails, verify it is both queryable and replicateable, and that it includes one of the recognized date fields.