Loading
Loading
The system runs every morning and the sourcing research is just there. We don't touch it.
โ Australian Importer, Alibaba Intelligence Platform
Most importers are still doing product research the same way they did ten years ago โ manually, slowly, and reactively.
Our client is an Australian business importing and reselling products sourced from Chinese suppliers on Alibaba. Their sourcing workflow was entirely manual. Team members visited individual supplier storefronts one by one, copied product names, prices, and MOQs into spreadsheets by hand, and periodically checked Alibaba's new listings to spot emerging products.
New product launches slipped through the gaps. Price changes were only caught during the next manual check. As the business grew, the problem compounded - more suppliers to monitor, more categories to track, more opportunities to miss.
What they needed was not more staff hours. They needed the whole process to run itself - every day, automatically - and deliver clean, structured data they could act on immediately.
Not a generic scraping tool. Built function by function against Alibaba's actual behaviour - evolving from 10 fields per function at initial build to 47 unique data fields across the full module.
Automatically tracks the full product catalogues of the client's key suppliers - spotting new additions and capturing complete product data every day without anyone visiting a single page.
Every day the system monitors Alibaba's new listings by product category, capturing 60+ new products per run. The client now knows what is entering the market before most competitors have noticed.
Automated searches run across Alibaba based on the client's sourcing criteria. The most data-rich function in the module - returning 27 structured fields per product.
For products of interest, the system pulls complete details in a single automated request - title, full description, all product images, quantity-based pricing tiers, full specifications, and supplier information. Everything needed to make a sourcing decision, without opening a browser tab.
Powered by reverse-engineered Alibaba visual search API
The fifth function enables product discovery through visual search rather than keyword queries. Instead of a text search, the function accepts any Alibaba CDN image URL and returns the products Alibaba considers visually similar - up to 1,584 results per query across 33 pages.
The implementation required reverse-engineering two undocumented internal Alibaba endpoints. Both operate via pure curl_cffi with no Playwright dependency - making this the fastest and most lightweight function in the module.
The key advantage: visual search bypasses keyword ambiguity entirely. A product image returns exact or near-exact matches regardless of how the supplier titled the listing.
Bot detection, CAPTCHAs, and IP-based blocking stop most tools immediately. Off-the-shelf scraping libraries fail on Alibaba without significant customisation.
We built a custom anti-detection layer across all five functions - the reason the platform runs reliably every day where generic solutions do not.
Playwright-based automation configured to mimic real browser signatures โ headers, viewport, user agent, and timing patterns.
IP rotation via residential proxies ensures requests originate from real addresses across geographies, avoiding IP-based blocks.
Automated CAPTCHA solving via CapSolver handles challenge pages without human intervention - keeping the daily schedule uninterrupted.
Exponential backoff retry logic, per-run deduplication, and full summary reporting ensure data integrity across every scheduled run.
The module grew from ~10 fields per function at initial build to 47 unique fields โ the result of QA cycles, client feedback, and proactive additions identified during production use.
| Function | Fields | Key Additional Fields |
|---|---|---|
scrape_store | 10 + price_tiers | product_id, title, price, currency, moq, image_url, product_url, store_url, company_name, page_position |
scrape_new_launches | 10 + price_tiers | Same structure as store โ enables direct comparison |
scrape_search | 27 + price_tiers | soldOrder, reviewCount, reviewScore, discount, badges, certifications, goldSupplierYears, shippingScore, supplierServiceScore, productScore |
scrape_product_detail | 11 + price_tiers | Full description, all_images, specifications, multi-tier quantity pricing |
scrape_image_search | 21 + summary | company_id, countryCode, goldSupplierYears, reviewScore, supplierService, shippingScore, certifications, isPaid, sellingPoints |
| Total unique fields: 47 | ||
Functions 1-4: Playwright for JS-rendered pages.
Function 5: curl_cffi only, no browser dependency.
The system runs every morning and the sourcing research is just there. We don't touch it.
โ Australian Importer, Alibaba Intelligence Platform
Since deployment in May 2026, the platform replaced 8-10 hours of manual weekly research. The client now receives a daily automated feed covering monitored supplier stores, 60+ new product launches per category run, and hundreds of structured product records. What used to be a weekly research block is now 15 minutes of daily review, delivered before the work day begins.
The relationship did not end at delivery. Since going live, the client has commissioned additional features - including the visual search function - a clear signal of the trust built and the ongoing value the platform delivers.
For agencies, this is the model. Your client gets daily automated data. You get the credit. We stay invisible. Whether you need one scraper or ten, the same pipeline approach applies โ scheduled runs, accuracy checks, and a fix within 48 hours if anything breaks.
If you work with clients who need structured product or competitor data and you are still doing it manually, we would like to show you what this looks like for your agency.
Or email us directly at hello@xpersivelabs.com