Commit Graph

3 Commits

Author SHA1 Message Date
spouliot da2bb46d5a Tighten Prismatic scrape parsing after live smoke test
Validated against live product pages; fixed three edge cases (also present in
the original JS scraper) surfaced by specialty AkzoNobel products:

- Sample image: only accept real product images on the NIC CDN
  (images.nicindustries.com/prismatic/products), preferring full-size over
  thumbnail. Dropped the loose "prismatic|powder|color" fallback that grabbed
  the site logo on products with no image.
- SDS/TDS/app-guide links: require the href to be an actual document (NIC CDN
  or a .pdf) so a generic /documents nav link isn't captured as the SDS.
- Description: also stop at PRODUCT SUPPORT / PRODUCT COLLECTIONS / CUSTOMER
  SERVICE so less page footer is captured (app-side StripBoilerplate cleans the
  rest).

Structural fields (sku, color, price tiers) verified correct on live data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 12:41:47 -04:00
spouliot 843d1c3c51 Add token-authenticated catalog import API endpoint
POST /PowderCatalog/ImportApi accepts the JSON scrape format in the request
body, authenticated by a shared secret in the X-Import-Token header (matched
constant-time against CatalogImport:Token), with the vendor in X-Vendor-Name.
Runs through the same ImportJsonAsync -> shared upsert as the manual upload, so
the offline PrismaticSync tool can push unattended.

ImportJsonAsync refactored to take a Stream (the form upload now passes
file.OpenReadStream()). Endpoint is AllowAnonymous + IgnoreAntiforgeryToken
(it's token-gated, not cookie-auth) and returns 401 until a token is configured,
so it's inert by default. README updated with the route + token wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:35:30 -04:00
spouliot c59d55529f Add PrismaticSync console tool for unattended Prismatic catalog sync
Standalone .NET 8 console app (not part of the main solution) that scrapes the
Prismatic Powders catalog via Playwright and pushes it into the app's catalog
import. Prismatic has no API, so this runs on a workstation (Task Scheduler),
never the deployed server.

- Discovery: incremental newest-first via ?category=created_at (stops once it
  reaches already-known URLs — cheap, finds new colors) and a full all-colors
  crawl for occasional reconcile.
- Scraper: resumable product-page scrape (sku/color/description/price tiers/
  SDS/TDS/app-guide/image), with --refresh-older-than to re-scrape stale
  products and catch price changes. Output matches the app import format so it
  flows through the same shared upsert as the Columbia sync.
- Resilience: brisk randomized base delay, escalating 403 cooldown-and-retry to
  avoid hard bans, periodic rest. All configurable.
- Visibility: streams every product + the inter-product wait to the console
  (colored) and a log file, with an up-front ETA.
- Push: token-authenticated POST to the app import endpoint (skips to manual
  upload when unconfigured).

The app-side token import endpoint is a separate follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:30:47 -04:00