cog_worker.manager¶
Previewing, chunking, and executing analysis.
The Manager class is used to divide an area of analysis into chunks of manageable size, and execute functions on each of these chunks.
When executing functions, the Manager instantiates a cog_worker.worker.Worker
and passes
it to the function as its first parameter. The Worker keeps track of the scale, projection,
and bounds of its piece of the analysis, which it uses to handle the reading and writing of
Cloud Optimized GeoTIFFs.
Example
Use the manager to preview an analysis before executing it:
from cog_worker import Manager
from rasterio.plot import show
def my_analysis(worker):
arr = worker.read('example-cog.tif')
# calculations ...
return arr
manager = Manager()
arr, bbox = manager.preview(my_analysis)
show(arr)
Execute the analysis in chunks, saving the results to disk:
manager.chuck_save('output.tif', myanalysis):
- class cog_worker.manager.Manager(bounds: Tuple[float, float, float, float] = (-180, -85, 180, 85), proj: int | str | Proj = 3857, scale: float = 10000, buffer: int = 16)[source]¶
Bases:
object
Class for managing scalable analysis of Cloud Optimized GeoTIFFs.
- __init__(bounds: Tuple[float, float, float, float] = (-180, -85, 180, 85), proj: int | str | Proj = 3857, scale: float = 10000, buffer: int = 16)[source]¶
Initialize a Manager with a projection, scale, and bounding box for analysis.
- Parameters:
bounds (BoundingBox) – The region to be analyzed as a (west, south, east, north) tuple.
proj (pyproj.Proj, str, int) – The projection to analyze in. Generally accepts any proj4 string, WKT projection, or EPSG code. See pyproj.Proj for valid values.
scale (float) – The pixel size for analysis in the projection’s units (usually meters or degrees).
buffer (int) – When dividing analysis into chunks, the number of additional pixels to read on all sides to avoid edge effects. The ideal buffer size depends on your analysis (e.g. whether you use convolutions or distance functions).
- chunk_execute(f: Callable[[Worker], ndarray] | Callable, f_args: Iterable | None = None, f_kwargs: Mapping | None = None, chunksize: int = 512) Iterator[Tuple[Any, Tuple[float, float, float, float]]] [source]¶
Return a generator that executes a function on chunks of at most chunksize pixels.
Note
Manager.chunk_execute computes each chunk sequentially, trading time for reduced memory footprint. To run large scale analysis in parallel using dask, see cog_worker.distributed.
Note
You can estimate the memory requirement of executing a function at a given chunksize as
(chunksize + 2*buffer)**2 * number_of_bands_or_arrays * bit_depth
.- Parameters:
f (
cog_worker.types.WorkerFunction
) – The function to execute. The function will recieve a cog_worker.worker.Worker as its first argument.f_args (list) – Additional arguments to pass to the function.
f_kwargs (dict) – Additional keyword arguments to pass to the function.
chunksize (int) – Size of the chunks in pixels (excluding buffer).
- Yields:
A tuple containing the return value of the function and the bounding box of the executed analysis in the target projection.
- chunk_params(chunksize: int = 512, **kwargs)[source]¶
Generate parameters to execute a function in chunks.
Generates dicts of keyword arguments that can be passed to Manager.execute to run a function in chunks of size <chunksize>. This may be useful for distributing tasks to workers to execute in parallel. Each dict will contain the projection, scale, bounding box, and buffer. Attributes will be identical except for
proj_bounds
which define the area to analyze.Note
manager.chunk_execute(f)
is equivalent to(manager.execute(f, **params) for params in manager.chunk_params())
- Parameters:
chunksize (int) – Size of the chunks in pixels (excluding buffer).
**kwargs – optional additional keyword arguments to save to the dict (to eventually pass to Manager.execute) e.g.
f
,f_args
,f_kwargs
- Yields:
Dicts of keyword arguments that can be passed to
cog_worker.manager.Manager.execute()
.
- chunk_save(dst: str | IO, f: Callable[[Worker], ndarray] | Callable, f_args: Iterable | None = None, f_kwargs: Mapping | None = None, chunksize: int = 512, **kwargs)[source]¶
Execute a function in chunks and write each chunk to disk as it is completed.
The chunk_save method is identical to Manager.chunk_execute, except it writes results to
dst
instead of yielding them. Manager.chunk_save uses the rasterio GeoTiff driver.Note
The function to be executed will recieve a cog_worker.worker.Worker as its first argument and should return a 3-dimensional numpy array of
chunksize
(optionally plus the buffer pixels). e.g.:# Read a cog in chunks and write those chunks to 'test.tif' manager.chunk_save('test.tif', lambda worker: worker.read('example-cog-url.tif'))
- Parameters:
dst (str) – The file path to write to.
f (
cog_worker.types.WorkerFunction
) – The function to execute. The function will recieve a cog_worker.worker.Worker as its first argument and must return a 3-dimensional numpy array ofchunksize
(including or excluding the buffer).f_args (list) – Additional arguments to pass to the function.
f_kwargs (dict) – Additional keyword arguments to pass to the function.
chunksize (int) – Size of the chunks in pixels (excluding buffer).
**kwargs – Additional keyword arguments to pass to rasterio.open.
- chunks(chunksize: int = 512) Iterator[Tuple[float, float, float, float]] [source]¶
Generate bounding boxes for chunks of at most <chunksize> pixels in the managers scale and projection.
The chunks method divides the Manager’s bounding box into chunks of manageable size. Each chunk will be at most <chunksize> pixels, though the geographic extent of the chunk depends on the Manager’s projection and scale.
- Parameters:
chunksize (int) – Size of the chunks in pixels (excluding buffer).
- Yields:
BoundingBox – The bounding box of the chunk in the Manager’s projection
- execute(f: Callable[[Worker], ndarray] | Callable, f_args: Iterable | None = None, f_kwargs: Mapping | None = None, clip: bool = True, **kwargs) Tuple[Any, Tuple[float, float, float, float]] [source]¶
Execute a function that takes a cog_worker.worker.Worker as its first parameter.
The execute method is the underlying method for running analysis. By default, it will run the function once for the Manager’s given scale and bounding box.
When executing functions, the Manager instantiates a cog_worker.worker.Worker and passes it to the function as its first parameter. The Worker keeps track of the scale, projection, and bounds of its piece of the analysis, which it uses to handle the reading and writing of Cloud Optimized GeoTIFFs.
- Parameters:
f (
cog_worker.types.WorkerFunction
) – The function to execute. The function will recieve a cog_worker.worker.Worker as its first argument.f_args (list) – Additional arguments to pass to the function.
f_kwargs (dict) – Additional keyword arguments to pass to the function.
clip (bool) – Whether or not to clip the buffer from the completed analysis.
**kwargs – Additional keyword arguments to overload the Manager’s properties. (bounds, proj, scale, or buffer)
- Returns:
A tuple containing the return value of the function and the bounding box of the executed analysis in the target projection.
- preview(f: Callable[[Worker], ndarray] | Callable, f_args: Iterable | None = None, f_kwargs: Mapping | None = None, bounds: Tuple[float, float, float, float] | None = None, max_size: int = 1024, **kwargs) Tuple[Any, Tuple[float, float, float, float]] [source]¶
Preview a function by executing it at a reduced scale.
The preview method automatically reduces the scale of analysis to fit within max_size.
- Parameters:
f (WorkerFunction) – The function to execute. The function will recieve a cog_worker.worker.Worker as its first argument.
f_args (list) – Additional arguments to pass to the function.
f_kwargs (dict) – Additional keyword arguments to pass to the function.
(BoundingBox (bounds) – self.bounds): The region to analize.
default – self.bounds): The region to analize.
max_size (int) – The maximum size (width or height) in pixels to compute, ignoring any buffer (default: 1024px).
**kwargs – Additional keyword arguments to overload the Manager’s properties. (proj or buffer).
- Returns:
A tuple containing the return value of the function and the bounding box of the executed analysis in the target projection.
- tile(f: Callable[[Worker], ndarray] | Callable, f_args: Iterable | None = None, f_kwargs: Mapping | None = None, z: int = 0, x: int = 0, y: int = 0, tilesize: int = 256, **kwargs) Tuple[Any, Tuple[float, float, float, float]] [source]¶
Execute a function for the scale and bounds of a TMS tile.
The tile method supports non-global and non-mercator tiling schemes via Morecantile. To generate standard web tiles, instantiate the Manager with the default parameters.
- Parameters:
f (
cog_worker.types.WorkerFunction
) – The function to execute. The function will recieve a cog_worker.worker.Worker as its first argument.f_args (list) – Additional arguments to pass to the function.
f_kwargs (dict) – Additional keyword arguments to pass to the function.
bounds (BoundingBox) – The region to analize (default: self.bounds)
max_size (int) – The maximum size (width or height) in pixels to compute, ignoring any buffer (default: 1024px). Automatically reduces the scale of analysis to fit within max_size.
**kwargs – Additional keyword arguments to overload the Manager’s properties. (buffer).
- Returns:
A tuple containing the return value of the function and the bounding box of the executed analysis in the target projection.