Roger Moussalli, Mudhakar Srivatsa, Sameh Asaad 05 May 2015 Fast and flexible conversion of geohash codes to and from lat./long. coordinates FCCM 2015 c2b2qkded © 2015 IBM Corporation Spatiotemporal analytics Ubiquitous availability of location sensing devices: – GPS, RFID, mobile devices. Overwhelming majority of geospatial data is Internet of Things that Move – geospatial being produced by the data is increasingly spatiotemporal data: – Complex pattern queries can be posed on objects moving in time and space. Opportunities for business intelligence: – Traffic congestions prediction. – Targeted advertising. – Epidemic spread characterization. – Insurance pricing. – Crime pattern analysis. 25+ TBs daily logs 100s of millions GPS-enabled devices sold annually 12+ TBs daily tweet data 1.75 Billion smartphone users worldwide CPUs unable to keep up with online and offline processing of skyrocketing amounts of data. 2 © 2015 IBM Corporation Geohash: hierarchical geocoding Algorithm introduced in 2009: – http://geohash.org (/c2b2qkded) Hierarchically divide space into grids. Geohash codes properties: Support for hierarchical regions. Arbitrary precision: remove characters from the end of the code to reduce size and precision. Near places often (but not always) present similar prefixes (simple proximity estimation). Main uses of geohashes: Geotagging. Efficient database indexing. 3 © 2015 IBM Corporation Geohash: hierarchical geocoding Algorithm introduced in 2009: – http://geohash.org (/c2b2qkded) Hierarchically divide space into grids. Geohash codes properties: Support for hierarchical regions. Arbitrary precision: remove characters from the end of the code to reduce size and precision. Near places often (but not always) present similar prefixes (simple proximity estimation). 1 2 3 4 Main uses of geohashes: Geotagging. Efficient database indexing. 4 © 2015 IBM Corporation Geohash: hierarchical geocoding Algorithm introduced in 2009: – http://geohash.org (/c2b2qkded) Hierarchically divide space into grids. Geohash codes properties: Support for hierarchical regions. Arbitrary precision: remove characters from the end of the code to reduce size and precision. Near places often (but not always) present similar prefixes (simple proximity estimation). 1 21 22 23 24 3 4 Main uses of geohashes: Geotagging. Efficient database indexing. 5 © 2015 IBM Corporation Geohash: hierarchical geocoding Algorithm introduced in 2009: – http://geohash.org (/c2b2qkded) Hierarchically divide space into grids. Geohash codes properties: Support for hierarchical regions. Arbitrary precision: remove characters from the end of the code to reduce size and precision. Near places often (but not always) present similar prefixes (simple proximity estimation). 1 21 22 23 3 4 241 242 243 244 Main uses of geohashes: Geotagging. Efficient database indexing. 6 © 2015 IBM Corporation Geohash: hierarchical geocoding Algorithm introduced in 2009: – http://geohash.org (/c2b2qkded) Hierarchically divide space into grids. Geohash codes properties: – Support for hierarchical regions. – Arbitrary precision: remove characters from the end of the code to reduce size and precision. – Near places often (but not always) present similar prefixes (simple proximity estimation). 1 21 22 23 3 4 241 242 243 244 Main uses of geohashes: ‒ Geotagging. ‒ Efficient database indexing. 7 © 2015 IBM Corporation Problem statement Geohash codes do not replace lat./long. coordinates: – Format of disseminated data. – Dependency of legacy algorithms on lat./long. coordinates. New algorithms are developed to operate in the geohash domain. The (bit-serial) conversion of geohash codes to and from lat./long. is a highly frequent problem, and is computationally demanding (double-precision FP): – MongoDB, Apache Accumulo, IBM Streams and SPSS (among others). 8 © 2015 IBM Corporation Geohash code generation (latitude) Geohash latitude bits: 1 + 90 1 0 - 90 9 © 2015 IBM Corporation Geohash code generation (latitude) Geohash latitude bits: 1, 0 + 90 1 0 -0 10 © 2015 IBM Corporation Geohash code generation (latitude) Geohash latitude bits: 1, 0, 1 + 45 1 0 -0 11 © 2015 IBM Corporation Geohash code generation Geohash latitude bits: 1, 0, 1 + 45 Geohash longitude bits: 1, 1, 1 -0 Geohash code: 111011 + 90 12 + 180 © 2015 IBM Corporation Single step conversion block input interval lat. Latitude (or longitude) to geohash • Input: Latitude (or longitude). Interval (min, mid, max). • Output: Geohash bit. New interval. min mid max lat. to geohash geohash min mid max bit Geohash to latitude (or longitude) • Input: Geohash bit. Latitude (or longitude) value. Interval (min, mid, max). • Output: New interval. Latitude (or longitude) value = new interval mid. 13 geohash bit min mid max geohash to lat. min mid max lat. © 2015 IBM Corporation Single step conversion block lat. Latitude (or longitude) to geohash • Input: Latitude (or longitude). Interval (min, mid, max). • Output: Geohash bit. New interval. min mid max lat. to geohash geohash min mid max bit Geohash to latitude (or longitude) • Input: Geohash bit. Latitude (or longitude) value. Interval (min, mid, max). • Output: New interval. Latitude (or longitude) value = new interval mid. geohash bit lat. min mid max geohash to lat. min mid max lat. 14 © 2015 IBM Corporation Flexible, high-throughput converter lat. long. single step single step single step single step … … Run-time flexibility: – Mode: geo-to-ll, ll-to-geo. – Geohash precision. – Geohash transfer size. Input interface controller single step single step batch info Design-time flexibility: – Max geohash precision. – Num deployed stages (single step blocks). – Supported transfer-side geohash sizes. – Engineering details: • Stream interface width. • Further Pipelining. Input stream Output interface controller Output stream 15 © 2015 IBM Corporation Conversion by lookup 0 1 1 2 00 -180 -135 -90 3 01 0 45 90 4 10 -90 -45 0 5 11 90 135 180 6 000 -180 -157.5 -135 7 001 address Pre-meditated (offline) conversion table. Relax requirements on compute resources (DSPs). Table size = min mid max -180 -90 0 0 90 180 0 22.5 45 conversions for longitude of length 1 bit conversions for longitude of length 2 bits conversions for longitude of length 3 bits … … longitude geohash code+offset … Address in table = longitude longitude geohash 0 Offset = 16 © 2015 IBM Corporation Combining lookup with compute length of latitude (or longitude) geohash code 2N Flexible resource-friendly solution. Lookup tables process at most X geohash bits (each). N-X bits are processed by compute single step blocks: – Multiple passes through the compute stages is allowed. latitude geohash code bits X N-X latitude interval X N min X length N-X X latitude lookup based conversion single step longitude lookup based conversion longitude interval num remaining bits single step … … 17 longitude geohash code bits N single step single step latitude value longitude value N-X back-to-back stages © 2015 IBM Corporation Experimental evaluation Hardware platform: – Pico M505 board. – PCIe gen2 x8. – Xilinx Kintex 7 325T @250MHz. – Vivado 2014.2 . – End-to-end performance: host RAM -> FPGA -> host RAM. Software framework: – Multi-threaded converter from IBM Streams and SPSS. – Single socket 8-core x2 Hyper-Threads Intel Xeon @2.5GHz. – 32GB RAM. Metric: 18 © 2015 IBM Corporation Design space exploration Resource utilization % 100 90 80 70 60 50 40 30 20 10 0 LUTs % Registers % DSPs % Number of single step blocks 19 © 2015 IBM Corporation Design space exploration Resource utilization % 100 90 80 70 60 50 40 30 20 10 0 LUTs % Registers % DSPs % Number of single step blocks 20 © 2015 IBM Corporation End-to-end throughput: PCIe gen2 x8 limitations 64-bit geohashes. Batch of 100M conversions. 250 core 200 Throughput lat./long. to geohash (end-to-end) geohash to lat./long. (end-to-end) 150 (MConversions/s) 100 50 0 4 8 16 32 64 Number of single step blocks 21 © 2015 IBM Corporation Speedup vs multi-threaded software: a socket-to-socket comparison 8-core x2 Hyper-Threads Intel Xeon @2.5GHz. 64-bit geohashes. 100M conversions. Speedup 160 140 120 100 80 60 40 20 0 geohash to long/lat long/lat to geohash 16.1 13 1 2 4 8 16 32 Number of threads 22 © 2015 IBM Corporation Speedup vs multi-threaded software: a socket-to-socket comparison 8-core x2 Hyper-Threads Intel Xeon @2.5GHz. 64-bit geohashes. 100M conversions. speedup starts at batch of 100 conversions Speedup 160 140 120 100 80 60 40 20 0 geohash to long/lat long/lat to geohash 16.1 13 1 2 4 8 16 32 Number of threads 23 © 2015 IBM Corporation Conclusions First hardware implementation of a geohash conversion engine operating at wire speed. Flexibility: – Design time. – Run time. Performance: – >13X vs multi-threaded software. – Limited by the available PCIe bandwidth. Future work: – Accelerating compute-intensive queries in the geohash domain. 24 © 2015 IBM Corporation 25 © 2015 IBM Corporation BACKUP 26 © 2015 IBM Corporation End-to-end throughput geohash-to-lat./long. Throughput (Mconversions/s) 120 100 80 60 fpga_geo_8 fpga_geo_16 fpga_geo_32 fpga_geo_64 fpga_geo_128 40 lat./long.-to-geohash 20 Number of conversions (batch size) Throughput (Mconversions/s) 0 180 160 140 120 100 80 60 40 20 0 fpga_geo_8 fpga_geo_16 fpga_geo_32 fpga_geo_64 fpga_geo_128 Number of conversions (batch size) 27 © 2015 IBM Corporation Speedup vs single-threaded software Geohash-to-lat./long. fpga_geo_128 300 fpga_geo_64 250 fpga_geo_32 200 Speedup fpga_geo_16 150 fpga_geo_8 100 50 0 10 100 1K 10K 100K 1M 10M 100M Number of conversions (batch size) 28 © 2015 IBM Corporation Speedup vs single-threaded software 8 6 Geohash-to-lat./long. 4 2 0 10 1K fpga_geo_128 300 fpga_geo_64 250 fpga_geo_32 200 Speedup 100 fpga_geo_16 150 fpga_geo_8 100 50 0 10 100 1K 10K 100K 1M 10M 100M Number of conversions (batch size) 29 © 2015 IBM Corporation
© Copyright 2025