Case Study · Automation & Web Scraping

Automated 8 to 10 hours of weekly research for an e-commerce importer

Australian Importer·E-Commerce Automation·Live in Production·May 2026

8-10 hrs/wkManual research replaced

60+Products captured per run

47Unique data fields

5Automated functions

“

The system runs every morning and the sourcing research is just there. We don't touch it.

— Australian Importer, Alibaba Intelligence Platform

The Problem

Hours wasted. Opportunities missed. Every week.

Most importers are still doing product research the same way they did ten years ago — manually, slowly, and reactively.

Our client is an Australian business importing and reselling products sourced from Chinese suppliers on Alibaba. Their sourcing workflow was entirely manual. Team members visited individual supplier storefronts one by one, copied product names, prices, and MOQs into spreadsheets by hand, and periodically checked Alibaba's new listings to spot emerging products.

New product launches slipped through the gaps. Price changes were only caught during the next manual check. As the business grew, the problem compounded - more suppliers to monitor, more categories to track, more opportunities to miss.

What they needed was not more staff hours. They needed the whole process to run itself - every day, automatically - and deliver clean, structured data they could act on immediately.

The Solution

A five-function supplier intelligence platform

Not a generic scraping tool. Built function by function against Alibaba's actual behaviour - evolving from 10 fields per function at initial build to 47 unique data fields across the full module.

Supplier Store Monitoring

Automatically tracks the full product catalogues of the client's key suppliers - spotting new additions and capturing complete product data every day without anyone visiting a single page.

product_idtitlepricecurrencymoqimage_urlproduct_urlstore_urlcompany_namepage_position

10 fields + price_tiers

New Product Launch Detection

60+ products per run

Every day the system monitors Alibaba's new listings by product category, capturing 60+ new products per run. The client now knows what is entering the market before most competitors have noticed.

product_idtitlepricecurrencymoqimage_urlproduct_urlstore_urlcompany_namepage_position

10 fields + price_tiers

Keyword-Based Product Search

Automated searches run across Alibaba based on the client's sourcing criteria. The most data-rich function in the module - returning 27 structured fields per product.

product_idtitlepricecurrencymoqimage_urlproduct_urlstore_urlcompany_namepage_positionsoldOrderreviewCountreviewScorediscountbadgescertificationsgoldSupplierYearsshippingScoresupplierServiceScoreproductScore

27 fields + price_tiers

Full Product Detail Extraction

For products of interest, the system pulls complete details in a single automated request - title, full description, all product images, quantity-based pricing tiers, full specifications, and supplier information. Everything needed to make a sourcing decision, without opening a browser tab.

titledescriptionall_imagesspecificationspricemoqcompany_namestore_urlproduct_urlproduct_id

11 fields + multi-tier pricing

New

Visual Product Search

The fifth function enables product discovery through visual search rather than keyword queries. Instead of a text search, the function accepts any Alibaba CDN image URL and returns the products Alibaba considers visually similar - up to 1,584 results per query across 33 pages.

The implementation required reverse-engineering two undocumented internal Alibaba endpoints. Both operate via pure curl_cffi with no Playwright dependency - making this the fastest and most lightweight function in the module.

The key advantage: visual search bypasses keyword ambiguity entirely. A product image returns exact or near-exact matches regardless of how the supplier titled the listing.

product_idproduct_urltitlepricecurrencymoqcompany_namecompany_idcountry_codegold_supplier_yearsreview_scorereview_countsupplier_serviceshipping_scoreproduct_scoreimagescertificationsis_paidselling_pointspage_numberpage_position

21 fields + summary

The Technical Challenge

Alibaba actively blocks automated access

Bot detection, CAPTCHAs, and IP-based blocking stop most tools immediately. Off-the-shelf scraping libraries fail on Alibaba without significant customisation.

We built a custom anti-detection layer across all five functions - the reason the platform runs reliably every day where generic solutions do not.

Browser Fingerprint Spoofing

Playwright-based automation configured to mimic real browser signatures — headers, viewport, user agent, and timing patterns.

Residential Proxy Rotation

IP rotation via residential proxies ensures requests originate from real addresses across geographies, avoiding IP-based blocks.

CAPTCHA Resolution

Automated CAPTCHA solving via CapSolver handles challenge pages without human intervention - keeping the daily schedule uninterrupted.

Retry Logic & Deduplication

Exponential backoff retry logic, per-run deduplication, and full summary reporting ensure data integrity across every scheduled run.

Data Architecture

47 unique fields across the module

The module grew from ~10 fields per function at initial build to 47 unique fields — the result of QA cycles, client feedback, and proactive additions identified during production use.

Function	Fields	Key Additional Fields
`scrape_store`	10 + price_tiers	product_id, title, price, currency, moq, image_url, product_url, store_url, company_name, page_position
`scrape_new_launches`	10 + price_tiers	Same structure as store — enables direct comparison
`scrape_search`	27 + price_tiers	soldOrder, reviewCount, reviewScore, discount, badges, certifications, goldSupplierYears, shippingScore, supplierServiceScore, productScore
`scrape_product_detail`	11 + price_tiers	Full description, all_images, specifications, multi-tier quantity pricing
`scrape_image_search`	21 + summary	company_id, countryCode, goldSupplierYears, reviewScore, supplierService, shippingScore, certifications, isPaid, sellingPoints
Total unique fields: 47

PythonPlaywrightcurl_cffiParselBeautifulSoupDataImpulse Residential ProxiesCapSolver

Functions 1-4: Playwright for JS-rendered pages.
Function 5: curl_cffi only, no browser dependency.

The Outcome

Research that used to take hours now happens before 8am.

“

The system runs every morning and the sourcing research is just there. We don't touch it.

— Australian Importer, Alibaba Intelligence Platform

8-10 hrs of manual research per week replaced

60+ products detected per category run

v1.3.0 live since May 2026

Since deployment in May 2026, the platform replaced 8-10 hours of manual weekly research. The client now receives a daily automated feed covering monitored supplier stores, 60+ new product launches per category run, and hundreds of structured product records. What used to be a weekly research block is now 15 minutes of daily review, delivered before the work day begins.

The relationship did not end at delivery. Since going live, the client has commissioned additional features - including the visual search function - a clear signal of the trust built and the ongoing value the platform delivers.

For agencies, this is the model. Your client gets daily automated data. You get the credit. We stay invisible. Whether you need one scraper or ten, the same pipeline approach applies — scheduled runs, accuracy checks, and a fix within 48 hours if anything breaks.

If you work with clients who need structured product or competitor data and you are still doing it manually, we would like to show you what this looks like for your agency.

Get in Touch

Or email us directly at hello@xpersivelabs.com

Automated 8 to 10 hours of weekly research for an e-commerce importer

Australian Importer·E-Commerce Automation·Live in Production·May 2026

8-10 hrs/wkManual research replaced

60+Products captured per run

47Unique data fields

5Automated functions

“

The system runs every morning and the sourcing research is just there. We don't touch it.

— Australian Importer, Alibaba Intelligence Platform

Hours wasted. Opportunities missed. Every week.

Most importers are still doing product research the same way they did ten years ago — manually, slowly, and reactively.

What they needed was not more staff hours. They needed the whole process to run itself - every day, automatically - and deliver clean, structured data they could act on immediately.

Alibaba actively blocks automated access

Bot detection, CAPTCHAs, and IP-based blocking stop most tools immediately. Off-the-shelf scraping libraries fail on Alibaba without significant customisation.

We built a custom anti-detection layer across all five functions - the reason the platform runs reliably every day where generic solutions do not.

Function

Fields

Key Additional Fields

scrape_store

10 + price_tiers

product_id, title, price, currency, moq, image_url, product_url, store_url, company_name, page_position

scrape_new_launches

10 + price_tiers

Same structure as store — enables direct comparison

scrape_search

27 + price_tiers

soldOrder, reviewCount, reviewScore, discount, badges, certifications, goldSupplierYears, shippingScore, supplierServiceScore, productScore

scrape_product_detail

11 + price_tiers

Full description, all_images, specifications, multi-tier quantity pricing

scrape_image_search

21 + summary

company_id, countryCode, goldSupplierYears, reviewScore, supplierService, shippingScore, certifications, isPaid, sellingPoints

Total unique fields: 47

Research that used to take hours now happens before 8am.

“

The system runs every morning and the sourcing research is just there. We don't touch it.

— Australian Importer, Alibaba Intelligence Platform

8-10 hrs of manual research per week replaced

60+ products detected per category run

v1.3.0 live since May 2026