# Desired corpus content Product-centric. For a given product: - Reviews - Images. Limit resolution and angles somehow? Don't need thumbnails, for sure. - Description(s) - Ratings - Info for all styles # Desired corpus layout For researchers, something simple, like Tamara Berg's attribute discovery dataset --- a hierarchical directory structure (keeping a manageable number of files in each leaf directory) with task-specific items for each product, organized by file name.