Validated against live product pages; fixed three edge cases (also present in the original JS scraper) surfaced by specialty AkzoNobel products: - Sample image: only accept real product images on the NIC CDN (images.nicindustries.com/prismatic/products), preferring full-size over thumbnail. Dropped the loose "prismatic|powder|color" fallback that grabbed the site logo on products with no image. - SDS/TDS/app-guide links: require the href to be an actual document (NIC CDN or a .pdf) so a generic /documents nav link isn't captured as the SDS. - Description: also stop at PRODUCT SUPPORT / PRODUCT COLLECTIONS / CUSTOMER SERVICE so less page footer is captured (app-side StripBoilerplate cleans the rest). Structural fields (sku, color, price tiers) verified correct on live data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PrismaticSync
A standalone .NET console tool that scrapes the Prismatic Powders catalog and pushes it into the Powder Coating Logix catalog import endpoint. It exists because Prismatic has no API (unlike Columbia Coatings) — so the data has to be scraped via browser automation.
Runs on a workstation you control — never on the deployed app server. Scraping from the cloud app's IP would get blocked and isn't appropriate. This tool is deliberately not part of
PowderCoating.sln; build and run it independently.
First-time setup (per machine)
cd "scripts/Prismatic Data Scraper"
dotnet build
pwsh bin/Debug/net8.0/playwright.ps1 install chromium # one-time browser download
Commands
dotnet run -- run # default: discover-new + scrape (new + stale >30d) + push
dotnet run -- discover-new # cheap: find newly-added colors (newest-first, stops at known)
dotnet run -- discover-full # heavy: crawl all color filters (reconcile whole set / removals)
dotnet run -- scrape # scrape product pages from product-urls.txt (resumable)
dotnet run -- scrape --refresh-older-than=30 # also re-scrape products older than 30 days (price changes)
dotnet run -- push # push prismatic_powders.json to the import endpoint
Options: --max-products=N, --retry-errors, --headed (show the browser for debugging).
Everything streams to the console live (warnings/errors in color) and to prismatic-sync.log.
Operating model (suggested cadence)
| Run | Command | Cadence | Why |
|---|---|---|---|
| Find new colors | run (does discover-new + scrape-new) |
Weekly | Cheap; Prismatic adds colors often |
| Price refresh | scrape --refresh-older-than=30 then push |
Monthly | Re-scrapes stale products to catch price changes (slow, ~hours) |
| Full reconcile | discover-full then scrape |
Quarterly | Catches removed/discontinued colors |
A full scrape of ~5,000 products takes hours (polite delays). It saves after every product and is fully resumable, so stop/restart any time.
Politeness / anti-block
Configurable in appsettings.json: randomized 6–14s base delay, an escalating cooldown + retry on
403 (so a temporary block doesn't get you hard-banned mid-run), and a periodic long rest. Leave
these conservative — getting blocked is worse than being slow, and Prismatic is a partner.
Pushing into the app
Set in appsettings.json:
Sync.Import.EndpointUrl→https://<your-app>/PowderCatalog/ImportApiSync.Import.Token→ the same secret as the app'sCatalogImport:Tokenconfig
The tool POSTs the JSON with an X-Import-Token header (and X-Vendor-Name: Prismatic Powders) to
that endpoint, which authenticates the token and runs the records through the same upsert as the
Columbia sync. If the endpoint/token isn't configured here, push is skipped and you upload
prismatic_powders.json manually via the Powder Catalog admin page instead.
App side: set
CatalogImport:Tokenin the web app's config (Azure App Setting in prod). The endpoint returns 401 until a token is set, so it's inert by default.
Scheduling (Windows Task Scheduler)
Point a scheduled task at the published exe (or dotnet run). Example weekly task command:
Program/script: C:\Tools\PrismaticSync\PrismaticSync.exe
Arguments: run
Start in: C:\Tools\PrismaticSync
Publish a self-contained build to drop on the workstation:
dotnet publish -c Release -r win-x64 --self-contained false -o C:\Tools\PrismaticSync
pwsh C:\Tools\PrismaticSync\playwright.ps1 install chromium
The long game
This is the interim path. The durable endgame is a real Prismatic API (the partnership), at which point this tool is replaced by a clean in-app sync like Columbia's — reusing the same upsert, propagation, and discontinued handling.