Manipulating JPEG EXIF Headers

Reading and writing EXIF headers into JPEGs is not something that PHP offers a lot of help with. Outside of one helper function for reading, exif_read_data, which returns a wide variety of meta data about the image (including non-EXIF basic properties of the file), there is no easy way to read or edit them. To make matters worse one of the more accessible image libraries (PHP GD) drops all of the original EXIF headers. If you want to keep any information from the raw photo through to a GD-resized image you're going to need to do some work.

First I wanted to take a look at just how much information some of my raw photos had. The amount was a bit staggering.

  1. Make: Canon

  2. Model: Canon PowerShot G9

  3. Orientation: 1

  4. XResolution: 180/1

  5. YResolution: 180/1

  6. DateTime: 2013:05:18 06:45:22 (note the annoying lack of timezone here)

  7. YCbCrPositioning: 1

  8. Thumbnail: (raw photo data)

  9. ... and then like 40 more fields ...

The data I lost by using GD for resizing was depressing. Every single EXIF header was gone, as well as any IPTC or XMP that may have snuck in at some point. Not all of the metadata was important to me - like XResolution and YResolution - but there was enough to get me motivated to recover it. Also, I wanted to add some new fields, like copyright information and such. The first step was chunking up a raw image into separate pieces so I could view the raw headers and work with them.

  1. // chunks up image to ease manipulation of segments

  2. function chunk_image($filename)

  3. {

  4. $image_array = array();

  5. $image_data = file_get_contents($filename);

  6. $image_data_length = strlen($image_data);

  7. // yes, we need to loop through the string one character at a time

  8. for($i = 0; $i < $image_data_length; $i += 2)

  9. {

  10. // characters within this range signify headers

  11. if(

  12. ord(substr($image_data, $i + 1, 1)) < 0xD0 ||

  13. ord(substr($image_data, $i + 1, 1)) > 0xD7)

  14. {

  15. // what kind of segment are we dealing with

  16. $segment_type = substr($image_data, $i + 3, 1);

  17. $segment_type = ord($segment_type);

  18. // how big the segment is, stored after the header

  19. $segment_size = substr($image_data, $i + 4, 2);

  20. $segment_size = unpack('n', $segment_size);

  21. $segment_size = array_pop($segment_size);

  22. // pull the data, housed after the size, using the length

  23. $segment_data = substr($image_data, $i + 6, $segment_size - 2);

  24. // store each header

  25. $image_array[] = array(

  26. 'type' => $segment_type,

  27. 'data' => $segment_data);

  28. // okay, we can advance the internal pointer

  29. $i += $segment_size;

  30. // oh, the last segment was the SOS, and the compressed image data is next

  31. if($segment_type == 0xDA)

  32. {

  33. // pull all of the image data to the end marker

  34. $end_of_image = strpos($image_data, "\xFF\xD9");

  35. $raw_image = substr($image_data, $i + 4, $end_of_image - ($i + 4));

  36. // not the best way to pass the image data, but it works

  37. $image_array[] = array(

  38. 'type' => 'raw_image',

  39. 'data' => $raw_image);

  40. break;

  41. }

  42. }

  43. }

  44. return $image_array;

  45. }

If you give this function an acceptable jpeg filename it will chew through the raw bytes looking for segment headers, then parse those and pull meta about them. Each segment header defines a type of data, everything from generic comments to structured EXIF to Huffman optimization tables, and includes size information and the raw data afterwards. There's some assumptions here but I feel like everything is broken and readable enough for easy debugging.

Also, this is structured as a function for a reason. While most of the code examples on this blog are written as procedural code and the production code that I work on is exclusively object-orientated I felt this had to be in a function. It's very handy for running through the targeted image and doing the initial chunk as well as running through the original, pre-GD-resized image to pull original EXIF headers. You could also use it to test post-run.

Okay, once you have the initial headers it's time to get working on the EXIF. I defined my headers in an array and then just looped through it to create the segment. There are plenty of other headers that you can work with as listed here.

  1. $new_exif_array = array(

  2. array(

  3. 'type' => 270, // image description, could be title or alt

  4. 'data' => 'Image Description'),

  5. array(

  6. 'type' => 315, // your name, probably

  7. 'data' => 'Image author'),

  8. array(

  9. 'type' => 306, // datetime, which could be when it was taken

  10. 'data' => date('Y:m:d H:i:s')),

  11. array(

  12. 'type' => 33432, // generic copyright information

  13. 'data' => 'Image Copyright'));

  14. // let's build the exif segment based on our desired headers

  15. $exif = '';

  16. $exif .= 'Exif';

  17. $exif .= "\x00\x00";

  18. $exif .= 'MM'; // motorola encoding

  19. $exif .= pack('n', 42); // tiff id

  20. $exif .= pack('N', 8); // an inital offset

  21. $exif .= pack('n', count($new_exif_array));

  22. $segment_length = 2 + count($new_exif_array) * 12 + 4; // how long the ifd will be

  23. $segment_head = '';

  24. $segment_body = '';

  25. foreach($new_exif_array as $row)

  26. {

  27. $segment_head .= pack('n', $row['type']);

  28. $segment_head .= pack('n', 2); // ascii data type

  29. $data = $row['data'] . "\x00";

  30. $data = str_pad($data, 4, "\x00");

  31. $segment_head .= pack('N', strlen($data));

  32. // if the data is too long we need to append it at the end, not within the head

  33. if(strlen($data) > 4)

  34. {

  35. $offset = 8 + $segment_length + strlen($segment_body);

  36. $segment_head .= pack('N', $offset);

  37. $segment_body .= $data;

  38. }

  39. else

  40. $segment_head .= $data;

  41. }

  42. $exif .= $segment_head;

  43. $exif .= pack('N', 0);

  44. $exif .= $segment_body;

Here the assumptions start taking over. I use a Canon Powershot G9 for most of my photos, which saves jpegs using the Motorola spec, which uses specific binary encoding. If you are using a device that saves with the Intel spec then you'll have to change a lot of the 'pack' functions a bit. Also, all of the headers I want are ASCII data types. If you are working with other headers, like Orientation or Resolution, you'll have to modify the format a bit.

Well, we can pull segments now and create the EXIF segment… It's time to manipulate the image! In my case I wanted to remove two segments, the COM and APP0/JFIF (both are inserted by image gd, I think) and add a new one. If you wanted to remove other segments and/or add new ones be careful not to touch any of the internal pieces, like the optimization table or huffman stuff. Bad things happen if you touch those.

  1. // okay, let's pull the current image data

  2. $image_array = chunk_image('YOUR FILENAME HERE');

  3. foreach($image_array as $key => $row)

  4. {

  5. // we want to remove some header information, COM and APP0

  6. if($row['type'] == 0xE0 || $row['type'] == 0xFE)

  7. unset($image_array[$key]);

  8. }

  9. // prepend the APP1 (exif) onto the image info

  10. array_unshift($image_array, array(

  11. 'type' => 0xE1,

  12. 'data' => $exif));

  13. // okay, now it's time to put it all together

  14. $new_image = "\xFF" . "\xD8";

  15. foreach($image_array as $row)

  16. {

  17. if($row['type'] == 'raw_image')

  18. {

  19. $compressed_image_data = $row['data'];

  20. continue;

  21. }

  22. $new_image .= sprintf("\xFF%c", $row['type']);

  23. $new_image .= pack('n', strlen($row['data']) + 2);

  24. $new_image .= $row['data'];

  25. }

  26. $new_image .= $compressed_image_data;

  27. $new_image .= "\xFF" . "\xD9";

  28. file_put_contents('YOUR FILENAME HERE', $new_image);

And that does it! If you need any help debugging or anything I'd recommend steering clear of exif_read_data(), as the data it returns is not 'pure' exif headers, and using other tools (there are Chrome extensions, a few (massive) PHP libraries, and probably countless others a simple Google search away).

So, now that this is all done, there are plenty of other pieces left just a few short hops away. I'd like to have a thumbnail embedded and some additional EXIF fields, as well as maybe some XMP and IPTC segments. None of these items are too difficult to add at this point, just have to reuse some of the code above and maybe abstract a few pieces out a bit. Also I'd like to get it to work with my image resizer, though I need to create a better workflow for that.