unserialize Yahoo! search results

Share this article

Via John Cox, Yahoo! have opened up a PHP Development Center for their search APIs and, more interestingly, have started exposing their search data as serialized PHP strings. That’s “serialized” as in the serialize function.

This is very cool but think a little caution is needed when using it, given that it wasn’t designed to be a wire format but rather for local storage of PHP data, within a trusted environment.

Is this format safe?

First there’s a problem of trust and potentially a security issue. I guess we can trust Yahoo! OK but they need to make very sure that they’re escaping the data they publish this way correctly – make sure no-one they’ve gathered search results for is able to inject anything in there. Why?

Well for primitive data types (strings, int, PHP arrays) there’s more or less no problem – you can unserialize the result you get back without issues. That said, perhaps the Hardened-PHP Project needs to look at this – are there issues like deeply nested arrays, infinite recursion or very large data structures?

The potential (but low risk) security issue though is if any objects are being serialized this way. When you unserialize, PHP is going to attempt to create instances of those objects (assuming it can find the corresponding class), which is going to result in the constuctor being fired the __wakeup function if one exists in the class definition, and the destructor in PHP5. Given that PHP5 is growing more and more built-in classes, there’s a chance this could become a significant problem.

I’ve dealt with this before in JPSpan, which also used this format across the wire, and have mentioned it to Josh here. The solution I took was to screen the serialized string first with some regexes and I’m 99% sure this approach works – implementation here (JPSpan_Unserializer_PHP::getClasses) and unit tests here – see TestOfJPSpan_Unserializer_PHP_getClasses.

For Yahoo!, it the following function would do it;


function hasObjects($string) {
    
    // Stip any string representations (which might contain object syntax)
    $string = preg_replace('/s:[0-9]+:".*"/Us','',$string);
    
    // Pull out the class named
    preg_match_all('/O:[0-9]+:"(.*)"/U',$string,$matches,PREG_PATTERN_ORDER);
    
    return count($matches[1]) > 0;
}

$serializedString = file_get_contents('http://api.search.yahoo.com ... (etc.) ... &output=php');

if ( hasObjects($serializedString) ) {
    die ("Its got objects. I object!");
}

$result = unserialize($serializedString);

Given that most people will be fetching this data from Yahoo! in cleartext, there’s a chance an attacker might be able to modify it in transit, so calling that function to check for objects is probably a smart move.

Character Encoding?

Now Yahoo! have been smart in encoding the data as UTF-8. Here’s the HTTP response headers for a request I made (side note: perhaps Yahoo! might consider using some HTTP caching a little, given this is REST?);


HTTP/1.x 200 OK
Date: Thu, 23 Feb 2006 08:14:38 GMT
P3p: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="[snip]"
Connection: close
Content-Type: text/php; charset="utf-8"

Checked with some multibyte results and Yahoo are correctly counting the multi string lengths in terms of bytes in the PHP results they return, so no problem there. You probably do need to be aware of what encoding you are using on the site displaying the results though. John, for example, is using ISO-8859-1 (which is very common), so you’ll want to convert UTF-8 to ISO-8859-1 using utf8_decode (which is purely for use with ISO-8859-1!) on the data elements, after the call to unserialize(). People with more exotic character encodings will need to turn to iconv. In Firefox, open a page on your site, right click and select “Page Info” – this will tell you your character encoding.

Example with PEAR::HTTP_Request

Yahoo! provide examples with curl and file_get_contents(). Here’s an alternative example using PEAR::HTTP_Request;


< ?php
// Include PEAR::HTTP_Request
require_once 'HTTP/Request.php';

// Something to search for
$searchword = 'Zürich';

// Build Yahoo! web search URL
$yahoo_url = 'http://api.search.yahoo.com/WebSearchService'.
    '/V1/webSearch?appid=YahooDemo&results=10&output=php'.
    '&query='.$searchword;

// Create the HTTP_Request object, specifying the URL
$Request = &new HTTP_Request($yahoo_url);

// Set proxy server as necessary
// $Request->setProxy('proxy.myisp.com', '8080', 'harryf', 'secret');

// Send the request for the feed to the remote server
$status = $Request->sendRequest();

// Check for errors
if (PEAR::isError($status)) {
    // Do something friendlier than die...
    die("Connection problem: " . $status->toString()."<br />");
}

// Check we got an HTTP 200 status code (if not there's a problem)
if ($Request->getResponseCode() != '200') {
    // Do something friendlier than die...
    die("Request failed: " . $Request->getResponseCode()."<br />");
}

// Get the PHP serialized string from Yahoo!
$phps = $Request->getResponseBody();

// Function to test that the serialized string doesn't contain any objects
function hasObjects($string) {
    
    // Stip any string representations (which might contain object syntax)
    $string = preg_replace('/s:[0-9]+:".*"/Us','',$string);
    
    // Pull out the class named
    preg_match_all('/O:[0-9]+:"(.*)"/U',$string,$matches,PREG_PATTERN_ORDER);
    
    return count($matches[1]) > 0;
}

// Check this before unserializing
if ( hasObjects($phps) ) {
    // Do something friendlier than die...
    die("Serialized string contains objects<br />");
}

// Unserialize...
$data = unserialize($phps);

// Note the charset!
header('Content-type: text/html; charset=utf-8');

echo '<pre>';
foreach ( $data['ResultSet']['Result'] as $row ) {
    print_r($row);
}
echo '</pre>';

Not just PHP

Encoders and decoders for this format have actually been done in quite a few different languages. Some that I’m aware of;

Move over encoded SOAPtext/php is taking over :-)

Harry FuecksHarry Fuecks
View Author

Harry Fuecks is the Engineering Project Lead at Tamedia and formerly the Head of Engineering at Squirro. He is a data-driven facilitator, leader, coach and specializes in line management, hiring software engineers, analytics, mobile, and marketing. Harry also enjoys writing and you can read his articles on SitePoint and Medium.

Read Next
Quick Tip: How to Animate Text Gradients and Patterns in CSS
Quick Tip: How to Animate Text Gradients and Patterns in CSS
Ralph Mason
Sending Email Using Node.js
Sending Email Using Node.js
Craig Buckler
Creating a Navbar in React
Creating a Navbar in React
Vidura Senevirathne
A Complete Guide to CSS Logical Properties, with Cheat Sheet
A Complete Guide to CSS Logical Properties, with Cheat Sheet
Ralph Mason
Using JSON Web Tokens with Node.js
Using JSON Web Tokens with Node.js
Lakindu Hewawasam
How to Build a Simple Web Server with Node.js
How to Build a Simple Web Server with Node.js
Chameera Dulanga
Building a Digital Fortress: How to Strengthen DNS Against DDoS Attacks?
Building a Digital Fortress: How to Strengthen DNS Against DDoS Attacks?
Beloslava Petrova
Crafting Interactive Scatter Plots with Plotly
Crafting Interactive Scatter Plots with Plotly
Binara Prabhanga
GenAI: How to Reduce Cost with Prompt Compression Techniques
GenAI: How to Reduce Cost with Prompt Compression Techniques
Suvoraj Biswas
How to Use jQuery’s ajax() Function for Asynchronous HTTP Requests
How to Use jQuery’s ajax() Function for Asynchronous HTTP Requests
Aurelio De RosaMaria Antonietta Perna
Quick Tip: How to Align Column Rows with CSS Subgrid
Quick Tip: How to Align Column Rows with CSS Subgrid
Ralph Mason
15 Top Web Design Tools & Resources To Try in 2024
15 Top Web Design Tools & Resources To Try in 2024
SitePoint Sponsors
7 Simple Rules for Better Data Visualization
7 Simple Rules for Better Data Visualization
Mariia Merkulova
Cloudways Autonomous: Fully-Managed Scalable WordPress Hosting
Cloudways Autonomous: Fully-Managed Scalable WordPress Hosting
SitePoint Team
Best Programming Language for AI
Best Programming Language for AI
Lucero del Alba
Quick Tip: How to Add Gradient Effects and Patterns to Text
Quick Tip: How to Add Gradient Effects and Patterns to Text
Ralph Mason
Logging Made Easy: A Beginner’s Guide to Winston in Node.js
Logging Made Easy: A Beginner’s Guide to Winston in Node.js
Vultr
How to Optimize Website Content for Featured Snippets
How to Optimize Website Content for Featured Snippets
Dipen Visavadiya
Psychology and UX: Decoding the Science Behind User Clicks
Psychology and UX: Decoding the Science Behind User Clicks
Tanya Kumari
Build a Full-stack App with Node.js and htmx
Build a Full-stack App with Node.js and htmx
James Hibbard
Digital Transformation with AI: The Benefits and Challenges
Digital Transformation with AI: The Benefits and Challenges
Priyanka Prajapat
Quick Tip: Creating a Date Picker in React
Quick Tip: Creating a Date Picker in React
Dianne Pena
How to Create Interactive Animations Using React Spring
How to Create Interactive Animations Using React Spring
Yemi Ojedapo
10 Reasons to Love Google Docs
10 Reasons to Love Google Docs
Joshua KrausZain Zaidi
How to Use Magento 2 for International Ecommerce Success
How to Use Magento 2 for International Ecommerce Success
Mitul Patel
5 Exciting New JavaScript Features in 2024
5 Exciting New JavaScript Features in 2024
Olivia GibsonDarren Jones
Tools and Strategies for Efficient Web Project Management
Tools and Strategies for Efficient Web Project Management
Juliet Ofoegbu
Choosing the Best WordPress CRM Plugin for Your Business
Choosing the Best WordPress CRM Plugin for Your Business
Neve Wilkinson
ChatGPT Plugins for Marketing Success
ChatGPT Plugins for Marketing Success
Neil Jordan
Managing Static Files in Django: A Comprehensive Guide
Managing Static Files in Django: A Comprehensive Guide
Kabaki Antony
The Ultimate Guide to Choosing the Best React Website Builder
The Ultimate Guide to Choosing the Best React Website Builder
Dianne Pena
Exploring the Creative Power of CSS Filters and Blending
Exploring the Creative Power of CSS Filters and Blending
Joan Ayebola
How to Use WebSockets in Node.js to Create Real-time Apps
How to Use WebSockets in Node.js to Create Real-time Apps
Craig Buckler
Best Node.js Framework Choices for Modern App Development
Best Node.js Framework Choices for Modern App Development
Dianne Pena
SaaS Boilerplates: What They Are, And 10 of the Best
SaaS Boilerplates: What They Are, And 10 of the Best
Zain Zaidi
Understanding Cookies and Sessions in React
Understanding Cookies and Sessions in React
Blessing Ene Anyebe
Enhanced Internationalization (i18n) in Next.js 14
Enhanced Internationalization (i18n) in Next.js 14
Emmanuel Onyeyaforo
Essential React Native Performance Tips and Tricks
Essential React Native Performance Tips and Tricks
Shaik Mukthahar
How to Use Server-sent Events in Node.js
How to Use Server-sent Events in Node.js
Craig Buckler
Five Simple Ways to Boost a WooCommerce Site’s Performance
Five Simple Ways to Boost a WooCommerce Site’s Performance
Palash Ghosh
Elevate Your Online Store with Top WooCommerce Plugins
Elevate Your Online Store with Top WooCommerce Plugins
Dianne Pena
Unleash Your Website’s Potential: Top 5 SEO Tools of 2024
Unleash Your Website’s Potential: Top 5 SEO Tools of 2024
Dianne Pena
How to Build a Chat Interface using Gradio & Vultr Cloud GPU
How to Build a Chat Interface using Gradio & Vultr Cloud GPU
Vultr
Enhance Your React Apps with ShadCn Utilities and Components
Enhance Your React Apps with ShadCn Utilities and Components
David Jaja
10 Best Create React App Alternatives for Different Use Cases
10 Best Create React App Alternatives for Different Use Cases
Zain Zaidi
Control Lazy Load, Infinite Scroll and Animations in React
Control Lazy Load, Infinite Scroll and Animations in React
Blessing Ene Anyebe
Building a Research Assistant Tool with AI and JavaScript
Building a Research Assistant Tool with AI and JavaScript
Mahmud Adeleye
Understanding React useEffect
Understanding React useEffect
Dianne Pena
Web Design Trends to Watch in 2024
Web Design Trends to Watch in 2024
Juliet Ofoegbu
Building a 3D Card Flip Animation with CSS Houdini
Building a 3D Card Flip Animation with CSS Houdini
Fred Zugs
How to Use ChatGPT in an Unavailable Country
How to Use ChatGPT in an Unavailable Country
Dianne Pena
An Introduction to Node.js Multithreading
An Introduction to Node.js Multithreading
Craig Buckler
How to Boost WordPress Security and Protect Your SEO Ranking
How to Boost WordPress Security and Protect Your SEO Ranking
Jaya Iyer
Understanding How ChatGPT Maintains Context
Understanding How ChatGPT Maintains Context
Dianne Pena
Building Interactive Data Visualizations with D3.js and React
Building Interactive Data Visualizations with D3.js and React
Oluwabusayo Jacobs
JavaScript vs Python: Which One Should You Learn First?
JavaScript vs Python: Which One Should You Learn First?
Olivia GibsonDarren Jones
13 Best Books, Courses and Communities for Learning React
13 Best Books, Courses and Communities for Learning React
Zain Zaidi
5 jQuery.each() Function Examples
5 jQuery.each() Function Examples
Florian RapplJames Hibbard
Implementing User Authentication in React Apps with Appwrite
Implementing User Authentication in React Apps with Appwrite
Yemi Ojedapo
AI-Powered Search Engine With Milvus Vector Database on Vultr
AI-Powered Search Engine With Milvus Vector Database on Vultr
Vultr
Understanding Signals in Django
Understanding Signals in Django
Kabaki Antony
Why React Icons May Be the Only Icon Library You Need
Why React Icons May Be the Only Icon Library You Need
Zain Zaidi
View Transitions in Astro
View Transitions in Astro
Tamas Piros
Getting Started with Content Collections in Astro
Getting Started with Content Collections in Astro
Tamas Piros
What Does the Java Virtual Machine Do All Day?
What Does the Java Virtual Machine Do All Day?
Peter Kessler
Become a Freelance Web Developer on Fiverr: Ultimate Guide
Become a Freelance Web Developer on Fiverr: Ultimate Guide
Mayank Singh
Layouts in Astro
Layouts in Astro
Tamas Piros
.NET 8: Blazor Render Modes Explained
.NET 8: Blazor Render Modes Explained
Peter De Tender
Mastering Node CSV
Mastering Node CSV
Dianne Pena
A Beginner’s Guide to SvelteKit
A Beginner’s Guide to SvelteKit
Erik KückelheimSimon Holthausen
Brighten Up Your Astro Site with KwesForms and Rive
Brighten Up Your Astro Site with KwesForms and Rive
Paul Scanlon
Which Programming Language Should I Learn First in 2024?
Which Programming Language Should I Learn First in 2024?
Joel Falconer
Managing PHP Versions with Laravel Herd
Managing PHP Versions with Laravel Herd
Dianne Pena
Accelerating the Cloud: The Final Steps
Accelerating the Cloud: The Final Steps
Dave Neary
An Alphebetized List of MIME Types
An Alphebetized List of MIME Types
Dianne Pena
The Best PHP Frameworks for 2024
The Best PHP Frameworks for 2024
Claudio Ribeiro
11 Best WordPress Themes for Developers & Designers in 2024
11 Best WordPress Themes for Developers & Designers in 2024
SitePoint Sponsors
Top 10 Best WordPress AI Plugins of 2024
Top 10 Best WordPress AI Plugins of 2024
Dianne Pena
20+ Tools for Node.js Development in 2024
20+ Tools for Node.js Development in 2024
Dianne Pena
The Best Figma Plugins to Enhance Your Design Workflow in 2024
The Best Figma Plugins to Enhance Your Design Workflow in 2024
Dianne Pena
Harnessing the Power of Zenserp for Advanced Search Engine Parsing
Harnessing the Power of Zenserp for Advanced Search Engine Parsing
Christopher Collins
Build Your Own AI Tools in Python Using the OpenAI API
Build Your Own AI Tools in Python Using the OpenAI API
Zain Zaidi
The Best React Chart Libraries for Data Visualization in 2024
The Best React Chart Libraries for Data Visualization in 2024
Dianne Pena
7 Free AI Logo Generators to Get Started
7 Free AI Logo Generators to Get Started
Zain Zaidi
Turn Your Vue App into an Offline-ready Progressive Web App
Turn Your Vue App into an Offline-ready Progressive Web App
Imran Alam
Clean Architecture: Theming with Tailwind and CSS Variables
Clean Architecture: Theming with Tailwind and CSS Variables
Emmanuel Onyeyaforo
How to Analyze Large Text Datasets with LangChain and Python
How to Analyze Large Text Datasets with LangChain and Python
Matt Nikonorov
6 Techniques for Conditional Rendering in React, with Examples
6 Techniques for Conditional Rendering in React, with Examples
Yemi Ojedapo
Introducing STRICH: Barcode Scanning for Web Apps
Introducing STRICH: Barcode Scanning for Web Apps
Alex Suzuki
Using Nodemon and Watch in Node.js for Live Restarts
Using Nodemon and Watch in Node.js for Live Restarts
Craig Buckler
Task Automation and Debugging with AI-Powered Tools
Task Automation and Debugging with AI-Powered Tools
Timi Omoyeni
Quick Tip: Understanding React Tooltip
Quick Tip: Understanding React Tooltip
Dianne Pena
12 Outstanding AI Tools that Enhance Efficiency & Productivity
12 Outstanding AI Tools that Enhance Efficiency & Productivity
Ilija Sekulov
React Performance Optimization
React Performance Optimization
Blessing Ene Anyebe
Introducing Chatbots and Large Language Models (LLMs)
Introducing Chatbots and Large Language Models (LLMs)
Timi Omoyeni
Migrate to Ampere on OCI with Heterogeneous Kubernetes Clusters
Migrate to Ampere on OCI with Heterogeneous Kubernetes Clusters
Ampere Computing
Scale Your React App with Storybook and Chromatic
Scale Your React App with Storybook and Chromatic
Daine Mawer
10 Tips for Implementing Webflow On-page SEO
10 Tips for Implementing Webflow On-page SEO
Milan Vracar
Create Dynamic Web Experiences with Interactive SVG Animations
Create Dynamic Web Experiences with Interactive SVG Animations
Patricia Egyed
5 React Architecture Best Practices for 2024
5 React Architecture Best Practices for 2024
Sebastian Deutsch
Get the freshest news and resources for developers, designers and digital creators in your inbox each week