API¶

Data¶

carpedm.data.download¶

Download scripts.

This module provides the interface for downloading raw datasets from their source.

Datasets Currently Available for Download¶
ID	Dataset
pmjtc	Pre-Modern Japanese Text Character Shapes Dataset (日本古典籍字形データセット), provided by the Center for Open Data in the Humanities (CODH).

Example

Data may be downloaded externally using the provided script:

$ download_data --data-dir <download/to/this/directory> --data-id pmjtc

Note

If an expected data subdirectory already exists in the specified target data-dir that data will not be downloaded, even if the subdirectory is empty. This should be fixed in a future version.

Todo

Update get_books_list once list is included in downloadables.
Check subdirectory contents.
Generalize download structure for other datasets.

carpedm.data.download.get_books_list(dataset='pmjtc')[source]¶

Retrieve list of books/images in dataset.

Parameters:	dataset (str) – Identifier for dataset for which to retrieve information.
Returns:	Names of dataset subdirectories and/or files.
Return type:	`list` of `str`

carpedm.data.download.maybe_download(directory, dataset='pmjtc')[source]¶

Download character dataset if BOOKS not in directory.

Parameters:	directory (str) – Directory where dataset is located or should be saved. dataset (str) – Identifier for dataset to download.

carpedm.data.io¶

Input and output.

This module provides functionality for reading and writing data.

Todo

Tests
- DataWriter
- CSVParser

class carpedm.data.io.CSVParser(csv_file, data_dir, bib_id)[source]¶

Utility class for parsing coordinate CSV files.

character(row)[source]¶

Convert CSV row to a Character object.

Returns:	The next character
Return type:	Character

characters()[source]¶

Generates rest of characters in CSV.

Yields:	`carpedm.data.util.Character` – The next character.

parse_characters(charset)[source]¶

Generate metadata for single character images.

Parameters:	charset (CharacterSet) – Character set.

A more efficient implementation of parse_sequences when image_scope='seq' and seq_len=1.

Only characters in the character set are included.

Returns:	Single character image meta data.
Return type:	`list` of `carpedm.data.util.ImageMeta`

parse_lines()[source]¶

Generate metadata for vertical lines of characters.

Characters not in character set or vocabulary will be labeled as unknown when converted to integer IDs.

Returns:	Line image meta data.
Return type:	`list` of `carpedm.data.util.ImageMeta`

parse_pages()[source]¶

Genereate metadata for full page images.

Includes every character on page. Characters not in character set or vocabulary will be labeled as unknown when converted to integer IDs.

Returns:	Page image meta data.
Return type:	`list` of `carpedm.data.util.ImageMeta`

parse_sequences(charset, len_min, len_max)[source]¶

Generate metadata for images of character sequences.

Only includes sequences of chars in the desired character set. If len_min == len_max, sequence length is deterministic, else each sequence is of random length from [len_min, len_max].

Parameters:	charset (CharacterSet) – The character set. len_min (int) – Minimum sequence length. len_max (int) – Maximum sequence length.
Returns:	Sequence image meta data.
Return type:	`list` of `carpedm.data.util.ImageMeta`

class carpedm.data.io.DataWriter(format_out, images, image_shape, vocab, chunk, character, line, label, bbox, subdirs)[source]¶

Utility for writing data to disk in various formats.

available_formats¶: list – The available formats.

References

Heavy modification of _process_dataset in the input pipeline for the TensorFlow im2txt models.

write(fname_prefix, num_threads, num_shards)[source]¶

Write data to disk.

Parameters:	fname_prefix (str) – Path base for data files. num_threads (int) – Number of threads to run in parallel. num_shards (int) – Total number of shards to write, if any.
Returns:	Total number of examples written.
Return type:	int

carpedm.data.lang¶

Language-specific and unicode utilities.

Todo

Variable UNK token in Vocabulary

class carpedm.data.lang.CharacterSet(charset, name=None)[source]¶

Character set abstract class.

in_charset(unicode)[source]¶

Check if a character is in the defined character set.

Parameters:	unicode (str) – String representation of unicode value.

presets¶

Pre-defined character sets.

Returns:	Character set IDs.
Return type:	`list` of `str`

class carpedm.data.lang.JapaneseUnicodes(charset)[source]¶

Utility for accessing and manipulating Japanese character unicodes.

Inherits from CharacterSet.

Unicode ranges taken from [1] with edits for exceptions.

References

[1] http://www.unicode.org/charts/

presets()[source]¶

Pre-defined character sets.

Returns:	Character set IDs.
Return type:	`list` of `str`

class carpedm.data.lang.Vocabulary(reserved, vocab)[source]¶

Simple vocabulary wrapper.

References

Lightly modified TensorFlow “im2txt” Vocabulary.

char_to_id(char)[source]¶: Returns the integer id of a character string.

get_num_classes()[source]¶: Returns number of classes, includes <UNK>.

get_num_reserved()[source]¶: Returns number of reserved IDs.

id_to_char(char_id)[source]¶: Returns the character string of a integer id.

carpedm.data.lang.char2code(unicode)[source]¶

Returns the ASCII code for a unicode character.

Parameters:	unicode (str) –
Raises:	`TypeError` – string is length two.

carpedm.data.lang.code2char(code)[source]¶: Returns the unicode string for the character.

carpedm.data.lang.code2hex(code)[source]¶

Returns hex integer for a unicode string.

The argument code could either be an ascii representation, (e.g. U+3055, <UNK>) or a unicode character.

Parameters:	code (str) – Code to convert.
Returns:
Return type:	int

carpedm.data.meta¶

Image metadata management.

This module loads and manages metadata stored as CSV files in the raw data directory.

carpedm.data.meta.DEFAULT_SEED¶: int – The default random seed.

Examples

import carpedm as dm

Load, view, and generate a dataset of single kana characters.

single_kana = dm.data.MetaLoader(data_dir=dm.data.sample, image_scope='char', charset=dm.data.CharacterSet('kana'))
single_kana.view_images(subset='train', shape=(64,64))
single_kana.generate_dataset(out_dir='/tmp/pmjtc_data', subset='train')

Load and view a dataset of sequences of 3 kanji.

kanji_seq = dm.data.MetaLoader(data_dir=dm.data.sample, image_scope='seq', seq_len=3, charset=dm.data.CharacterSet('kanji'))
kanji_seq.view_images(subset='dev', shape=(None, 64))

Load and view a dataset of full pages.

full_page = dm.data.MetaLoader(data_dir=dm.data.sample, image_scope='page', charset=dm.data.CharacterSet('all'))
full_page.view_images(subset='test', shape=None)

Note

Unless stated otherwise, image shape arguments in this module should be a tuple (height, width). Tuple values may be one of the following:

int

specifies the absolute size (in pixels) for that axis
float

specifies a rescale factor relative to the original image size
None

the corresponding axis size will be computed such that the aspect ratio is maintained. If both height and width are None, no resize is performed.

Caution

If the new shape is smaller than the original, information will be lost due to interpolation.

Todo

Tests
- generate_dataset
Sort characters by reading order, i.e. character ID.
Rewrite data as CSV following original format
Data generator option instead of writing data.
Output formats and/or generator return types for generate_dataset
- numpy
- hdf5
- pandas DataFrame
Chunked generate_dataset option to include partial characters.
Low-priority:
- Fix bounding box display error in view_images
- specify number of character type in sequence
  
  e.g. 2 Kanji, 1 kana
- Instead of padding, fill specified shape with surrounding

class carpedm.data.meta.MetaLoader(data_dir, test_split='hnsd00000', dev_split=0.1, dev_factor=1, vocab_size=None, min_freq=0, reserved=('<PAD>', '<GO>', '<END>', '<UNK>'), charset=<carpedm.data.lang.JapaneseUnicodes object>, image_scope='char', seq_len=None, seq_maxlen=None, verbose=False, seed=None)[source]¶

Class for loading image metadata.

data_stats(which_sets=('train', 'dev', 'test'), which_stats=('majority', 'frequency', 'unknowns'), save_dir=None, include=(None, None))[source]¶

Print or show data statistics.

Parameters:	which_sets (tuple) – Data subsets to see statistics for. which_stats (tuple) – Statistics to view. Default gives all options. save_dir (str) – If not None, save figures/files to this directory. include (tuple) – Include class IDs from this range.

generate_dataset(out_dir, subset, format_store='tfrecords', shape_store=None, shape_in=None, num_shards=8, num_threads=4, target_id='image/seq/char/id', sparse_labels=False, chunk=False, character=True, line=False, label=True, bbox=False, overwrite=False)[source]¶

Generate data usable by machine learning algorithm.

Parameters:	out_dir (str) – Directory to write the data to if ‘generator’ not in `format_store`. subset (str) – The subset of data to generate. format_store (str) – Format to save the data as. shape_store (tuple or None) – Size to which images are resized for storage (on disk). The default is to not perform any resize. Please see this note on image shape for more information. shape_in (tuple or None) – Size to which images are resized by interpolation or padding before being input to a model. Please see this note on image shape for more information. num_shards (int) – Number of sharded output files. num_threads (int) – Number of threads to run in parallel. target_id (str) – Determines the target feature (one of keys in dict returned by ImageMeta.generate_features). sparse_labels (bool) – Provide sparse_labels, only used for TFRecords. chunk (bool) – Instead of using the original image, extract non-overlapping chunks and corresponding features from the original image on a regular grid. Pad the original image to divide by `shape` evenly. Note Currently only characters that fit entirely in the block will be propagated to appropriate features. character (bool) – Include character info, e.g. label, bbox. line (bool) – Include line info (bbox) in features. label (bool) – Include label IDs in features. bbox (str or None) – If not None, include bbox in features as unit (e.g. ‘pixel’, ‘ratio’ [of image])) overwrite (bool) – Overwrite any existing data.
Returns:	Object for accessing batches of data.
Return type:	`carpedm.data.providers.DataProvider`

max_image_size(subset, static_shape=(None, None))[source]¶

Retrieve the maximum image size (in pixels).

Parameters:	subset (str or None) – Data subset from which to get image sizes. If None, return max sizes of all images. static_shape (`tuple` of `int`) – Define static dimensions. Axes that are None will be of variable size.
Returns:	Maximum size (height, width)
Return type:	tuple

view_images(subset, shape=None)[source]¶

View and explore images in a data subset.

Parameters:	subset (str) – The subset to iterate through. One of {‘train’, ‘dev’, ‘test’}. shape (tuple or None) – Shape to which images are resized. Please see this note on image shape for more information.

carpedm.data.meta.num_examples_per_epoch(data_dir, subset)[source]¶

Retrieve number of examples per epoch.

Parameters:	data_dir (str) – Directory where processed dataset is stored. subset (str) – Data subset.
Returns:	Number of examples.
Return type:	int

carpedm.data.ops¶

Data operations.

This module contains several non-module-specific data operations.

Todo

Tests
- to_sequence_example, parse_sequence_example
- sparsify_label
- shard_batch
- same_line
- ixs_in_region
- seq_norm_bbox_values

carpedm.data.ops.in_line(xmin_line, xmax_line, ymin_line, xmin_new, xmax_new, ymax_new)[source]¶

Heuristic for determining whether a character is in a line.

Note

Currently dependent on the order in which characters are added. For example, a character may vertically overlap with a line, but adding it to the line would be out of reading order. This should be fixed in a future version.

Parameters:	xmin_line (`list` of `int`) – Minimum x-coordinate of characters in the line the new character is tested against. xmax_line (`list` of `int`) – Maximum x-coordinate of characters in the line the new character is tested against. ymin_line (int) – Minimum y-coordinate of line the new character is tested against. xmin_new (int) – Minimum x-coordinate of new character. xmax_new (int) – Maximum x-coordinate of new character. ymax_new (int) – Maximum y-coordinate of new character.
Returns:	The new character vertically overlaps with the “average” character in the line.
Return type:	bool

carpedm.data.ops.in_region(obj, region, entire=True)[source]¶

Test if an object is in a region.

Parameters:	obj (tuple or BBox) – Object bounding box (xmin, xmax, ymin, ymax) or point (x, y). region (tuple or BBox) – Region (xmin, xmax, ymin, ymax). entire (bool) – Object is entirely contained in region.
Returns:	Result
Return type:	bool

carpedm.data.ops.ixs_in_region(bboxes, y1, y2, x1, x2)[source]¶

Heuristic for determining objects in a region.

Parameters:	bboxes (`list` of `carpedm.data.util.BBox`) – Bounding boxes for object boundaries. y1 (int) – Top (lowest row index) of region. y2 (int) – Bottom (highest row index) of region. x1 (int) – left side (lowest column index) of region. x2 (int) – right side (highest column index) of region.
Returns:	Indices of objects inside region.
Return type:	`list` of `int`

carpedm.data.ops.parse_sequence_example(serialized)[source]¶

Parse a sequence example.

Parameters:	serialized (`tf.Tensor`) – Serialized 0-D tensor of type string.
Returns:	Dictionary of features.
Return type:	dict

carpedm.data.ops.seq_norm_bbox_values(bboxes, height, width)[source]¶

Sequence and normalize bounding box values.

Parameters:

bboxes (list of carpedm.data.util.BBox) – Bounding boxes to process.
width (int) – Width (in pixels) of image bboxes are in.
height (int) – Height (in pixels) of image bboxes are in.

Returns:

tuple containing:

list of float: Normalized minimum x-values

list of float: Normalized minimum y-values

list of float: Normalized maximum x-values

list of float: Normalized maximum y-values

Return type:

tuple

carpedm.data.ops.shard_batch(features, labels, batch_size, num_shards)[source]¶

Shard a batch of examples.

Parameters:	features (dict) – Dictionary of features. labels (`tf.Tensor`) – labels batch_size (int) – The batch size. num_shards (int) – Number of shards into which batch is split.
Returns:	Features as a list of dictionaries.
Return type:	`list` of `dict`

carpedm.data.ops.sparsify_label(label, length)[source]¶

Convert a regular Tensor into a SparseTensor.

Parameters:	label (`tf.Tensor`) – The label to convert. length (`tf.Tensor`) – Length of the label
Returns:	tf.SparseTensor

carpedm.data.ops.to_sequence_example(feature_dict)[source]¶

Convert features to TensorFlow SequenceExample.

Parameters:	feature_dict (dict) – Dictionary of features.
Returns:	`tf.train.SequenceExample`

carpedm.data.preproc¶

Preprocessing methods.

This module provides methods for preprocessing images.

Todo

Tests
- convert_to_grayscale
- normalize
- pad_borders
Fix and generalize distort_image

carpedm.data.preproc.convert_to_grayscale(image)[source]¶: Convert RGB image to grayscale.

carpedm.data.preproc.normalize(image)[source]¶: Rescale pixels values (to [-1, 1]).

carpedm.data.preproc.pad_borders_or_shrink(image, char_bbox, line_bbox, shape, maintain_aspect=True)[source]¶

Pad or resize the image.

If the desired shape is larger than the original, then that axis is padded equally on both sides with the mean pixel value in the image. Otherwise, the image is resized with BILINEAR interpolation such that the aspect ratio is maintained.

Parameters:	image (`tf.Tensor`) – Image tensor [height, width, channels]. char_bbox (`tf.Tensor`) – Character bounding box [4]. line_bbox (`tf.Tensor`) – Line bounding box [4]. shape (`tuple` of `int`) – Output shape. maintain_aspect (bool) – Maintain the aspect ratio.
Returns:	Resized image. `tf.Tensor`: Adjusted character bounding boxes. `tf.Tensor`: Adjusted line bounding boxes.
Return type:	`tf.Tensor`

carpedm.data.providers¶

Data providers for Task input function.

This module provides a generic interface for providing data useable by machine learning algorithms.

A provider may either (1) receive data from the method that initialized it, or (2) receive a directory path where the data to load is stored.

Todo

Generator
- numpy
- pandas DataFrame

class carpedm.data.providers.DataProvider(target_id)[source]¶

Data provider abstract class.

make_batch(batch_size)[source]¶

Generator method that returns a new batch with each call.

Parameters:	batch_size (int) – Number of examples per batch.
Returns:	Batch features. array_like: Batch targets.
Return type:	dict

class carpedm.data.providers.TFDataSet(target_id, data_dir, subset, num_examples, pad_shape, sparse_labels)[source]¶

TensorFlow DataSet provider from TFRecords stored on disk.

make_batch(batch_size, single_char=False)[source]¶

Generator method that returns a new batch with each call.

Parameters:	batch_size (int) – Number of examples per batch.
Returns:	Batch features. array_like: Batch targets.
Return type:	dict

carpedm.data.util¶

Data utilities.

This module provides utility methods/classes used by other data modules.

Todo

Tests
- generate_features
Refactor generate_features
Fix class_mask for overlapping characters.

class carpedm.data.util.BBox(xmin, xmax, ymin, ymax)[source]¶: Bounding box helper class.

class carpedm.data.util.Character(label, image_id, x, y, block_id, char_id, w, h)[source]¶: Helper class for storing a single character.

class carpedm.data.util.ImageMeta(filepath, full_image=False, first_char=None)[source]¶

Class for storing and manipulating image metadata.

add_char(char)[source]¶

Add a character to the image.

Parameters:	char (Character) – The character to add.

char_bboxes¶

Bounding boxes for characters.

Returned bounding boxes are relative to (xmin(), ymin()).

Returns:	The return values.
Return type:	`list` of `carpedm.data.util.BBox`

char_labels¶

Character labels

Returns:	The return value.
Return type:	`list` of `str`

char_mask¶

Generate pseudo-pixel-level character mask.

Pixels within character bounding boxes are assigned to positive class (1), others assigned negative class (0).

Returns:	Character mask of shape (height, width, 1)
Return type:	`numpy.ndarray`

class_mask(vocab)[source]¶

Generate a character class image mask.

Note

Where characters overlap, the last character added is arbitrarily the one that will be represented in the mask. This should be fixed in a future version.

Parameters:	vocab (Vocabulary) – The vocabulary for converting to ID.
Returns:	Class mask of shape (height, width, 1)
Return type:	`numpy.ndarray`

combine_with(images)[source]¶

Parameters:	images (list of ImageMeta) –

full_h¶

Height (in pixels) of full raw parent image.

Returns:	The return value.
Return type:	int

full_w¶

Width (in pixels) of full raw parent image.

Returns:	The return value.
Return type:	int

generate_features(image_shape, vocab, chunk, character, line, label, bbox)[source]¶

Parameters:	image_shape (tuple or None) – Shape (height, width) to which images are resized, or the size of each chunk if chunks == True. vocab (Vocabulary or None) – Vocabulary for converting characters to IDs. Required `if character and label`. chunk (bool) – Instead of using the original image, return a list of image chunks and corresponding features extracted from the original image on a regular grid. The original image is padded to divide evenly by chunk shape. character (bool) – Include character info (ID, bbox). line (bool) – Include line info (bbox) in features. label (bool) – Include label IDs in features. bbox (str or None) – If not None, include bbox in features as unit (e.g. ‘pixel’, ‘ratio’ [of image]))
Returns:	Feature dictionaries.
Return type:	`list` of `dict`

height¶

Height (in pixels) in full parent image original scale.

Returns:	The return value.
Return type:	int

line_bboxes¶

Bounding boxes for lines in the image,

Note: Currently only meaningful when using full page image.

Returns:	The return values.
Return type:	`list` of `BBox`

line_mask¶

Generate pseudo-pixel-level line mask.

Pixels within line bounding boxes are assigned to positive class (1), others assigned negative class (0).

Returns:	Line mask of shape (height, width, 1)
Return type:	`numpy.ndarray`

load_image(shape)[source]¶

Load image and resize to shape.

If shape is None or (None, None), original size is maintained.

Parameters:	shape (tuple or None) – Output dimensions (height, width).
Returns:	Resized image.
Return type:	`numpy.ndarray`

new_shape(shape, ratio=False)[source]¶

Resolves (and computes) input shape to a consistent type.

Parameters:	shape (tuple or None) – New shape of image (height, width), with potentially inconsistent types. ratio (bool) – Return new size as ratio of original size.
Returns:	Absolute or relative height int or float: Absolute or relative width
Return type:	int or float

num_chars¶

Number of characters in the image.

Returns:	The return value.
Return type:	int

valid_char(char, same_line=False)[source]¶

Check if char is a valid character to include in image.

Parameters:	char (Character) – The character to validate. same_line (bool) – Consider whether char is in the same line as those already in the image example.
Returns:	True for valid, False otherwise.
Return type:	bool

width¶

Width (in pixels) in full parent image original scale.

Returns:	The return value.
Return type:	int

xmax¶

Image’s maximum x-coordinate (column) in raw parent image.

Returns:	The return value.
Return type:	int

xmin¶

Image’s minimum x-coordinate (column) in raw parent image.

Returns:	The return value.
Return type:	int

ymax¶

Image’s maximum y-coordinate (row) in raw parent image.

Returns:	The return value.
Return type:	int

ymin¶

Image’s minimum y-coordinate (row) in raw parent image.

Returns:	The return value.
Return type:	int

class carpedm.data.util.ImageTFOps[source]¶: Helper class for decoding and resizing images.

carpedm.data.util.image_path(data_dir, bib_id, image_id)[source]¶

Generate path to a specified image.

Parameters:	data_dir (str) – Path to top-level data directory. bib_id (str) – Bibliography ID. image_id (str) – Image ID.

Returns: String

Neural Networks¶

carpedm.nn.conv¶

Convolutional layers and components.

class carpedm.nn.conv.CNN(kernel_size=((3, 3), (3, 3), (3, 3), (3, 3)), num_filters=(64, 96, 128, 160), padding='same', pool_size=((2, 2), (2, 2), (2, 2), (2, 2)), pool_stride=(2, 2, 2, 2), pool_every_n=1, pooling_fn=<MagicMock name='mock.max_pooling2d' id='140502995016616'>, activation_fn=<MagicMock name='mock.relu' id='140502994992376'>, *args, **kwargs)[source]¶

Modular convolutional neural network layer class.

name¶

Unique identifier for the model.

The model name will serve as directory name for model-specific results and as the top-level tf.variable_scope.

Returns:	The model name.
Return type:	str

carpedm.nn.op¶

Operations for transforming network layer or input.

carpedm.nn.rnn¶

Recurrent layers and components.

carpedm.nn.util¶

Utilities for managing and visualizing neural network layers.

carpedm.nn.util.activation_summary(x)[source]¶

Helper to create summaries for activations. Creates a summary that provides a histogram of activations. Creates a summary that measures the sparsity of activations. :param x: Tensor

Returns:	nothing

carpedm.nn.util.name_nice(raw)[source]¶

Convert tensor name to a nice format.

Remove ‘tower_[0-9]/’ from the name in case this is a multi-GPU training session. This helps the clarity of presentation on tensorboard.

Models¶

carpedm.models.generic¶

This module defines base model classes.

class carpedm.models.generic.Model[source]¶

Abstract class for models.

forward_pass(features, data_format, axes_order, is_training)[source]¶

Main model functionality.

Must be implemented by subclass.

Parameters:	features (array_like or dict) – Input features. data_format (str) – Image format expected for computation, ‘channels_last’ (NHWC) or ‘channels_first’ (NCHW). axes_order (list or None) – If not None, is a list defining the axes order to which image input should be transposed in order to match data_format. is_training (bool) – Training if true, else evaluating.
Returns:	The return value, e.g. class logits.
Return type:	array_like or dict

initialize_pretrained(pretrained_dir)[source]¶

Initialize a pre-trained model or sub-model.

Parameters:	pretrained_dir (str) – Path to directory where pretrained model is stored. May be used to extract model/sub-model name. For example: name = pretrained_dir.split('/')[-1].split('_')[0]
Returns:	Map from pre-trained variable to model variable.
Return type:	dict

name¶

Unique identifier for the model.

Used to identify results generated with the model.

Must be implemented by subclass.

Returns:	The model name.
Return type:	str

class carpedm.models.generic.TFModel[source]¶

Abstract class for TensorFlow models.

_forward_pass(features, data_format, axes_order, is_training, reuse)[source]¶

Main model functionality.

Must be implemented by subclass.

forward_pass(features, data_format, axes_order, is_training, new_var_scope=False, reuse=False)[source]¶

Wrapper for making nested variable scopes.

Extends Model.

Parameters:	new_var_scope (bool) – Use a new variable scope. reuse (bool) – Reuse variables with same scope.

name¶

Unique identifier for the model.

The model name will serve as directory name for model-specific results and as the top-level tf.variable_scope.

Returns:	The model name.
Return type:	str

Tasks¶

carpedm.tasks.generic¶

Base task class.

Todo

Get rid of model_fn dependency on input_fn.
LONG TERM: Training methods other than TensorFlow Estimator.

class carpedm.tasks.generic.Task(data_dir, task_dir, test_split='hnsd00000', dev_split=0.1, dev_factor=1, dataset_format='tfrecords', num_shards=8, num_threads=8, shape_store=None, shape_in=None, vocab_size=None, min_frequency=0, seed=None, **kwargs)[source]¶

Abstract class for Tasks.

__init__(data_dir, task_dir, test_split='hnsd00000', dev_split=0.1, dev_factor=1, dataset_format='tfrecords', num_shards=8, num_threads=8, shape_store=None, shape_in=None, vocab_size=None, min_frequency=0, seed=None, **kwargs)[source]¶

Initializer.

Parameters:

data_dir (str) – Directory where raw data is stored.
task_dir (str) – Top-level directory for storing tasks data and results.
test_split (float or str) – Either the ratio of all data to use for testing or specific bibliography ID(s). Use comma-separated IDs for multiple books.
dev_split (float or str) – Either the ratio of training data to use for dev/val or specific bibliography ID(s). Use comma-separated IDs for multiple books.
dev_factor – (int): Size of development set should be divisible by this value. Useful for training on multiple GPUs.
dataset_format (str) – Base storage unit for the dataset.
vocab_size (int) – Maximum vocab size.
min_frequency (int) – Minimum frequency of type to be included in vocab.
shape_store (tuple or None) – Size to which images are resized for storage, if needed, e.g. for TFRecords. The default is to not perform any resize. Please see this note on image shape for more information.
shape_in (tuple or None) – Size to which images are resized by interpolation or padding before being input to a model. Please see this note on image shape for more information.
num_shards (int) – Number of sharded output files.
num_threads (int) – Number of threads to run in parallel.
seed (int or None) – Number for seeding rng.
**kwargs – Unused arguments.

__metaclass__¶: alias of abc.ABCMeta

__weakref__¶: list of weak references to the object (if defined)

bbox¶

When creating a dataset, generate appropriate bounding boxes for the tasks (determined by e.g. self.character, self.line).

Returns:	Use bounding boxes.
Return type:	bool

character¶

When creating a dataset, tell the meta_loader to generate character features, e.g. label, bbox.

Returns:	Use character features.
Return type:	bool

character_set¶

The Japanese characters (e.g. kana, kanji) of interest.

Preset character sets may include the following component sets:

hiragana

katakana

kana

kanji

punct (punctuation)

misc

Returns:	The character set.
Return type:	CharacterSet

chunk¶

When creating a dataset, instead of using the original image, extract non-overlapping chunks of size image_shape and the corresponding features from the original image on a regular grid. The original image is padded to divide evenly by image_shape.

Note: currently only objects that are entirely contained in the block will have its features propagated.

Returns:
Return type:	bool

image_scope¶

Portion of original image for each example.

Available scopes are ‘char’, ‘seq’, ‘line’, ‘page’.

Returns:	Task image scope
Return type:	str

input_fn(batch_size, subset, num_shards, overwrite=False)[source]¶

Returns (sharded) batches of data.

Parameters:	batch_size (int) – The batch_size subset (str) – The subset to use. One of {train, dev, test}. num_shards (int) – Number of data_shards to produce. overwrite (bool) – Overwrite existing data.
Returns:	Features of length num_shards. (list): Labels of length num_shards.
Return type:	(list)

label¶

When creating a dataset, generate character labels.

Returns:	Use character labels
Return type:	bool

line¶

When creating a dataset, tell the meta_loader to generate line features, e.g. bbox.

Returns:	Use line features.
Return type:	bool

loss_fn(features, model_output, targets, is_training)[source]¶

Computes an appropriate loss for the tasks.

Must be implemented in subclass.

Parameters:	features (dict) – Additional features for computing loss. model_output (tf.Tensor or dict of tf.Tensor) – Model output used for computing the batch loss, e.g. class logits. targets (tf.Tensor) – Ground truth targets. is_training (bool) – The model is training.
Returns:	Losses of type ‘int32’ and shape [batch_size, 1]
Return type:	tf.Tensor

max_sequence_length¶

Maximum sequence length.

Only used if image_scope == 'seq'.

Returns:
Return type:	int or None

model_fn(model, variable_strategy, num_gpus, num_workers, devices=None)[source]¶

Model function used by TensorFlow Estimator class.

Parameters:	model (pmjtc.models.generic.Model) – The models to run. variable_strategy (str) – Where to locate variable operations, either ‘CPU’ or ‘GPU’. num_gpus (int) – Number of GPUs to use, if available. devices (tuple) – Specific devices to use. If provided, overrides num_gpus. num_workers (int) – Parameter for distributed training.

Returns:

num_classes¶: Total number of output nodes, includes reserved tokens.

regularization(hparams)[source]¶

Parameters:	hparams – Hyperparameters, e.g. weight_decay

Returns:

reserved¶

Reserved tokens for the tasks.

The index of each token in the returned tuple will be used as its integer ID.

Returns:	The reserved characters
Return type:	tuple

results(loss, tower_features, tower_preds, tower_targets, is_training)[source]¶

Accumulates predictions, computes metrics, and determines the tensors to log and/or visualize.

Parameters:	loss (tf.float) – Global loss. tower_features (list of dict) – Tower feature dicts. tower_preds (list) – Tower predictions. tower_targets (list of tf.Tensor) – Tower targets. is_training (bool) – The model is training.
Returns:	The tensors to log dict: All predictions dict: Evaluation metrics
Return type:	dict

sequence_length¶

If max_sequence_length is None, this gives the deterministic length of a sequence, else the minimum sequence length.

Only used if image_scope == 'seq'.

Returns:
Return type:	int or None

sparse_labels¶

Generate labels as a SparseTensor, e.g. for CTC loss.

Returns:	Use sparse labels.
Return type:	(bool)

target¶

Determines the value against which predictions are compared.

For a list of possible targets, refer to carpedm.data.util.ImageMeta.generate_features()

Returns:	feature key for the target
Return type:	str

task_data_dir¶

Directory where tasks data is stored.

Returns:	str

Utilities¶

carpedm.util.eval¶

Evaluation helpers.

carpedm.util.eval.confusion_matrix_metric(labels, predictions, num_classes)[source]¶

A confusion matrix metric.

Parameters:	labels (tf.Tensor) – Ground truth labels. predictions (tf.Tensor) – Predictions. num_classes (int) – Number of classs.
Returns:	tf.update_op:
Return type:	tf.Tensor

carpedm.util.eval.plot_confusion_matrix(cm, classes, normalize=False, save_as=None, title='Confusion matrix')[source]¶

This function prints and plots the confusion matrix. Normalization can be applied by setting normalize=True.

Slight modification of methods here

carpedm.util.registry¶

Registry for models and tasks.

Define a new models by subclassing models.Model and register it:

@registry.register_model
class MyModel(models.Model):
    ...

Access by snake-cased name: registry.model("my_model").

See all the models registered: registry.list_models().

References

Lightly modified Tensor2Tensor registry.

carpedm.util.registry.default_name(obj_class)[source]¶

Convert class name to the registry’s default name for the class.

Parameters:	obj_class – the name of a class
Returns:	The registry’s default name for the class.

carpedm.util.registry.default_object_name(obj)[source]¶

Convert object to the registry’s default name for the object class.

Parameters:	obj – an object instance
Returns:	The registry’s default name for the class of the object.

carpedm.util.registry.display_list_by_prefix(names_list, starting_spaces=0)[source]¶: Creates a help string for names_list grouped by prefix.

carpedm.util.registry.help_string()[source]¶: Generate help string with contents of registry.

carpedm.util.registry.model(name)[source]¶: Retrieve a model by name.

carpedm.util.registry.register_model(name=None)[source]¶: Register a models. name defaults to class name snake-cased.

carpedm.util.registry.register_task(name=None)[source]¶: Register a Task. name defaults to cls name snake-cased.

carpedm.util.registry.task(name)[source]¶: Retrieve a task by name.

carpedm.util.train¶

Training utilities.

This modules provides utilities for training machine learning models. It uses or makes slight modifications to code from the TensorFlow CIFAR-10 estimator tutorial.

carpedm.util.train.config_optimizer(params)[source]¶

Configure the optimizer used for training.

Sets the learning rate schedule and optimization algorithm.

Parameters:	params (tf.contrib.training.HParams) – Hyperparameters.
Returns:	tf.train.Optimizer