// structured data extraction

How to Extract Structured Data from Text via API

You paste in an invoice, a resume, or a raw email. You want JSON back — the same schema, every time, with fields you can rely on. Here's why that's harder than it looks, and how to do it in one API call.

The problem with raw LLM calls

Calling an LLM directly and asking for JSON works maybe 80% of the time. The other 20% you get:

Markdown code fences wrapping the JSON (```json ... ```)
Trailing commas that break JSON.parse()
Fields that appear sometimes and not others
Numbers returned as strings, or strings as numbers
The model inventing fields that weren't in the source text

So you write a sanitization layer. Then a retry loop. Then a validation step. Then you discover the model hallucinated a field name. Now you have 15 lines of glue code around a one-sentence task.

The Slopshop approach

POST to /v1/tasks/run with task: "extract_structured_data". You get back a response with a guarantees object that tells you exactly what was validated. No sanitization layer. No retry logic. No schema drift.

Invoice extractioncurl

# Extract structured data from an invoice
curl -X POST https://slopshop.gg/v1/tasks/run \
  -H "Authorization: Bearer demo_key_slopshop" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "extract_structured_data",
    "text": "Invoice #4521 — Billed to: John Doe (john@acme.com). Services: API consulting 8h @ $150/h = $1,200. Tax 10% = $120. Total due: $1,320. Due date: 2026-04-15.",
    "schema": {
      "invoice_id": "string",
      "client_name": "string",
      "client_email": "string",
      "subtotal": "number",
      "tax": "number",
      "total": "number",
      "due_date": "string"
    }
  }'

Responsejson

{
  "ok": true,
  "output": {
    "invoice_id": "4521",
    "client_name": "John Doe",
    "client_email": "john@acme.com",
    "subtotal": 1200,
    "tax": 120,
    "total": 1320,
    "due_date": "2026-04-15"
  },
  "guarantees": {
    "schema_valid": true,
    "validated": true,
    "no_hallucinated_fields": true
  },
  "credits_used": 10,
  "_engine": "real"
}

Resume extraction

Resume / CV parsingcurl

curl -X POST https://slopshop.gg/v1/tasks/run \
  -H "Authorization: Bearer demo_key_slopshop" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "extract_structured_data",
    "text": "Sarah Chen — Senior Engineer at Stripe (2021-present). Previously Cloudflare (2018-2021). MIT CS 2018. Skills: Rust, Go, distributed systems, Kafka. GitHub: github.com/schen.",
    "schema": {
      "name": "string",
      "current_company": "string",
      "current_role": "string",
      "years_experience": "number",
      "education": "string",
      "skills": "array",
      "github": "string"
    }
  }'

Email extractioncurl

curl -X POST https://slopshop.gg/v1/tasks/run \
  -H "Authorization: Bearer demo_key_slopshop" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "extract_structured_data",
    "text": "From: marcus@vendor.io Subject: PO #8810 ready. Hi team, the purchase order for 500 units of SKU-220 at $4.50 each is approved. Delivery by March 30. Please confirm receipt.",
    "schema": {
      "sender_email": "string",
      "po_number": "string",
      "sku": "string",
      "quantity": "number",
      "unit_price": "number",
      "total_value": "number",
      "delivery_date": "string",
      "action_required": "string"
    }
  }'

Without vs. with Slopshop

Without Slopshop — 15+ lines

// Call the LLM
const res = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{role:"user", content:
    `Extract JSON from: ${text}. Schema: ${schema}`}]
})

// Strip markdown fences
let raw = res.choices[0].message.content
  .replace(/```json\n?/g, '')
  .replace(/```/g, '').trim()

// Try to parse, retry on failure
let parsed
try { parsed = JSON.parse(raw) }
catch { /* retry logic here... */ }

// Validate schema manually
if (!parsed.invoice_id || !parsed.total) {
  throw new Error('Schema mismatch')
}

With Slopshop — 1 curl

# That's it.
curl -X POST slopshop.gg/v1/tasks/run \
  -H "Authorization: Bearer $KEY" \
  -d '{
    "task": "extract_structured_data",
    "text": "'"$TEXT"'",
    "schema": '"$SCHEMA"'
  }'

# Response always:
# { ok: true,
#   output: { ...your fields },
#   guarantees: { schema_valid: true } }

# No sanitization.
# No retry loop.
# No schema drift.

What the guarantees field means

schema_valid: true

Every field in your schema is present. Types match. No extras.

validated: true

Output was parsed and re-validated before being returned to you.

no_hallucinated_fields: true

Only fields you defined in schema appear in output. Nothing invented.

Pricing

Structured data extraction costs 10 credits per call. You get 500 free credits on signup — that's 200 extractions before you spend a cent. After that, credits start at $9 for 10,000. See full pricing.

API Docs Browse All Tools Getting Started Pricing Playbooks

Try it now

500 free credits. No credit card required. Works in 60 seconds.

$ npm install -g slopshop
$ slopshop signup
✓ API key created. 500 credits added.

Get Started Free Read the Docs