Add PrismaticSync console tool for unattended Prismatic catalog sync
Standalone .NET 8 console app (not part of the main solution) that scrapes the Prismatic Powders catalog via Playwright and pushes it into the app's catalog import. Prismatic has no API, so this runs on a workstation (Task Scheduler), never the deployed server. - Discovery: incremental newest-first via ?category=created_at (stops once it reaches already-known URLs — cheap, finds new colors) and a full all-colors crawl for occasional reconcile. - Scraper: resumable product-page scrape (sku/color/description/price tiers/ SDS/TDS/app-guide/image), with --refresh-older-than to re-scrape stale products and catch price changes. Output matches the app import format so it flows through the same shared upsert as the Columbia sync. - Resilience: brisk randomized base delay, escalating 403 cooldown-and-retry to avoid hard bans, periodic rest. All configurable. - Visibility: streams every product + the inter-product wait to the console (colored) and a log file, with an up-front ETA. - Push: token-authenticated POST to the app import endpoint (skips to manual upload when unconfigured). The app-side token import endpoint is a separate follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,82 @@
|
||||
# PrismaticSync
|
||||
|
||||
A standalone .NET console tool that scrapes the Prismatic Powders catalog and pushes it into the
|
||||
Powder Coating Logix catalog import endpoint. It exists because Prismatic has **no API** (unlike
|
||||
Columbia Coatings) — so the data has to be scraped via browser automation.
|
||||
|
||||
> **Runs on a workstation you control — never on the deployed app server.** Scraping from the cloud
|
||||
> app's IP would get blocked and isn't appropriate. This tool is deliberately *not* part of
|
||||
> `PowderCoating.sln`; build and run it independently.
|
||||
|
||||
## First-time setup (per machine)
|
||||
|
||||
```powershell
|
||||
cd "scripts/Prismatic Data Scraper"
|
||||
dotnet build
|
||||
pwsh bin/Debug/net8.0/playwright.ps1 install chromium # one-time browser download
|
||||
```
|
||||
|
||||
## Commands
|
||||
|
||||
```powershell
|
||||
dotnet run -- run # default: discover-new + scrape (new + stale >30d) + push
|
||||
dotnet run -- discover-new # cheap: find newly-added colors (newest-first, stops at known)
|
||||
dotnet run -- discover-full # heavy: crawl all color filters (reconcile whole set / removals)
|
||||
dotnet run -- scrape # scrape product pages from product-urls.txt (resumable)
|
||||
dotnet run -- scrape --refresh-older-than=30 # also re-scrape products older than 30 days (price changes)
|
||||
dotnet run -- push # push prismatic_powders.json to the import endpoint
|
||||
```
|
||||
|
||||
Options: `--max-products=N`, `--retry-errors`, `--headed` (show the browser for debugging).
|
||||
|
||||
Everything streams to the console live (warnings/errors in color) **and** to `prismatic-sync.log`.
|
||||
|
||||
## Operating model (suggested cadence)
|
||||
|
||||
| Run | Command | Cadence | Why |
|
||||
|-----|---------|---------|-----|
|
||||
| Find new colors | `run` (does discover-new + scrape-new) | Weekly | Cheap; Prismatic adds colors often |
|
||||
| Price refresh | `scrape --refresh-older-than=30` then `push` | Monthly | Re-scrapes stale products to catch price changes (slow, ~hours) |
|
||||
| Full reconcile | `discover-full` then `scrape` | Quarterly | Catches removed/discontinued colors |
|
||||
|
||||
A full scrape of ~5,000 products takes hours (polite delays). It saves after every product and is
|
||||
fully resumable, so stop/restart any time.
|
||||
|
||||
## Politeness / anti-block
|
||||
|
||||
Configurable in `appsettings.json`: randomized 6–14s base delay, an escalating **cooldown + retry on
|
||||
403** (so a temporary block doesn't get you hard-banned mid-run), and a periodic long rest. Leave
|
||||
these conservative — getting blocked is worse than being slow, and Prismatic is a partner.
|
||||
|
||||
## Pushing into the app
|
||||
|
||||
Set `Sync.Import.EndpointUrl` + `Sync.Import.Token` in `appsettings.json`. The tool POSTs the JSON
|
||||
with an `X-Import-Token` header to the app's token-authenticated import endpoint, which runs it
|
||||
through the same upsert as the Columbia sync. If the endpoint isn't configured, `push` is skipped and
|
||||
you upload `prismatic_powders.json` manually via the Powder Catalog admin page.
|
||||
|
||||
> **App-side dependency:** the token-authenticated import endpoint must exist in the web app for
|
||||
> unattended push to work. Until then, use the manual upload.
|
||||
|
||||
## Scheduling (Windows Task Scheduler)
|
||||
|
||||
Point a scheduled task at the published exe (or `dotnet run`). Example weekly task command:
|
||||
|
||||
```
|
||||
Program/script: C:\Tools\PrismaticSync\PrismaticSync.exe
|
||||
Arguments: run
|
||||
Start in: C:\Tools\PrismaticSync
|
||||
```
|
||||
|
||||
Publish a self-contained build to drop on the workstation:
|
||||
|
||||
```powershell
|
||||
dotnet publish -c Release -r win-x64 --self-contained false -o C:\Tools\PrismaticSync
|
||||
pwsh C:\Tools\PrismaticSync\playwright.ps1 install chromium
|
||||
```
|
||||
|
||||
## The long game
|
||||
|
||||
This is the interim path. The durable endgame is a real Prismatic **API** (the partnership), at which
|
||||
point this tool is replaced by a clean in-app sync like Columbia's — reusing the same upsert,
|
||||
propagation, and discontinued handling.
|
||||
Reference in New Issue
Block a user