843d1c3c51
POST /PowderCatalog/ImportApi accepts the JSON scrape format in the request body, authenticated by a shared secret in the X-Import-Token header (matched constant-time against CatalogImport:Token), with the vendor in X-Vendor-Name. Runs through the same ImportJsonAsync -> shared upsert as the manual upload, so the offline PrismaticSync tool can push unattended. ImportJsonAsync refactored to take a Stream (the form upload now passes file.OpenReadStream()). Endpoint is AllowAnonymous + IgnoreAntiforgeryToken (it's token-gated, not cookie-auth) and returns 401 until a token is configured, so it's inert by default. README updated with the route + token wiring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
87 lines
3.9 KiB
Markdown
87 lines
3.9 KiB
Markdown
# PrismaticSync
|
||
|
||
A standalone .NET console tool that scrapes the Prismatic Powders catalog and pushes it into the
|
||
Powder Coating Logix catalog import endpoint. It exists because Prismatic has **no API** (unlike
|
||
Columbia Coatings) — so the data has to be scraped via browser automation.
|
||
|
||
> **Runs on a workstation you control — never on the deployed app server.** Scraping from the cloud
|
||
> app's IP would get blocked and isn't appropriate. This tool is deliberately *not* part of
|
||
> `PowderCoating.sln`; build and run it independently.
|
||
|
||
## First-time setup (per machine)
|
||
|
||
```powershell
|
||
cd "scripts/Prismatic Data Scraper"
|
||
dotnet build
|
||
pwsh bin/Debug/net8.0/playwright.ps1 install chromium # one-time browser download
|
||
```
|
||
|
||
## Commands
|
||
|
||
```powershell
|
||
dotnet run -- run # default: discover-new + scrape (new + stale >30d) + push
|
||
dotnet run -- discover-new # cheap: find newly-added colors (newest-first, stops at known)
|
||
dotnet run -- discover-full # heavy: crawl all color filters (reconcile whole set / removals)
|
||
dotnet run -- scrape # scrape product pages from product-urls.txt (resumable)
|
||
dotnet run -- scrape --refresh-older-than=30 # also re-scrape products older than 30 days (price changes)
|
||
dotnet run -- push # push prismatic_powders.json to the import endpoint
|
||
```
|
||
|
||
Options: `--max-products=N`, `--retry-errors`, `--headed` (show the browser for debugging).
|
||
|
||
Everything streams to the console live (warnings/errors in color) **and** to `prismatic-sync.log`.
|
||
|
||
## Operating model (suggested cadence)
|
||
|
||
| Run | Command | Cadence | Why |
|
||
|-----|---------|---------|-----|
|
||
| Find new colors | `run` (does discover-new + scrape-new) | Weekly | Cheap; Prismatic adds colors often |
|
||
| Price refresh | `scrape --refresh-older-than=30` then `push` | Monthly | Re-scrapes stale products to catch price changes (slow, ~hours) |
|
||
| Full reconcile | `discover-full` then `scrape` | Quarterly | Catches removed/discontinued colors |
|
||
|
||
A full scrape of ~5,000 products takes hours (polite delays). It saves after every product and is
|
||
fully resumable, so stop/restart any time.
|
||
|
||
## Politeness / anti-block
|
||
|
||
Configurable in `appsettings.json`: randomized 6–14s base delay, an escalating **cooldown + retry on
|
||
403** (so a temporary block doesn't get you hard-banned mid-run), and a periodic long rest. Leave
|
||
these conservative — getting blocked is worse than being slow, and Prismatic is a partner.
|
||
|
||
## Pushing into the app
|
||
|
||
Set in `appsettings.json`:
|
||
- `Sync.Import.EndpointUrl` → `https://<your-app>/PowderCatalog/ImportApi`
|
||
- `Sync.Import.Token` → the same secret as the app's `CatalogImport:Token` config
|
||
|
||
The tool POSTs the JSON with an `X-Import-Token` header (and `X-Vendor-Name: Prismatic Powders`) to
|
||
that endpoint, which authenticates the token and runs the records through the same upsert as the
|
||
Columbia sync. If the endpoint/token isn't configured here, `push` is skipped and you upload
|
||
`prismatic_powders.json` manually via the Powder Catalog admin page instead.
|
||
|
||
> **App side:** set `CatalogImport:Token` in the web app's config (Azure App Setting in prod). The
|
||
> endpoint returns 401 until a token is set, so it's inert by default.
|
||
|
||
## Scheduling (Windows Task Scheduler)
|
||
|
||
Point a scheduled task at the published exe (or `dotnet run`). Example weekly task command:
|
||
|
||
```
|
||
Program/script: C:\Tools\PrismaticSync\PrismaticSync.exe
|
||
Arguments: run
|
||
Start in: C:\Tools\PrismaticSync
|
||
```
|
||
|
||
Publish a self-contained build to drop on the workstation:
|
||
|
||
```powershell
|
||
dotnet publish -c Release -r win-x64 --self-contained false -o C:\Tools\PrismaticSync
|
||
pwsh C:\Tools\PrismaticSync\playwright.ps1 install chromium
|
||
```
|
||
|
||
## The long game
|
||
|
||
This is the interim path. The durable endgame is a real Prismatic **API** (the partnership), at which
|
||
point this tool is replaced by a clean in-app sync like Columbia's — reusing the same upsert,
|
||
propagation, and discontinued handling.
|