ARSAdvances in Radio ScienceARSAdv. Radio Sci.1684-9973Copernicus GmbHGöttingen, Germany10.5194/ars-13-209-2015Multi-view point cloud fusion for LiDAR based cooperative environment detectionJaehnB.bjae@hrz.tu-chemnitz.deLindnerP.WanielikG.Professorship of Communications Engineering, Chemnitz University of Technology, Chemnitz, GermanyB. Jaehn (bjae@hrz.tu-chemnitz.de)3November201513620921515December20142February20154March2015This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/This article is available from https://ars.copernicus.org/articles/13/209/2015/ars-13-209-2015.htmlThe full text article is available as a PDF file from https://ars.copernicus.org/articles/13/209/2015/ars-13-209-2015.pdf
A key component for automated driving is 360∘ environment detection.
The recognition capabilities of modern sensors are always limited to their
direct field of view. In urban areas a lot of objects occlude important areas
of interest. The information captured by another sensor from another
perspective could solve such occluded situations. Furthermore, the
capabilities to detect and classify various objects in the surrounding can be
improved by taking multiple views into account.
In order to combine the data of two sensors into one coordinate system, a
rigid transformation matrix has to be derived. The accuracy of modern e.g.
satellite based relative pose estimation systems is not sufficient to
guarantee a suitable alignment. Therefore, a registration based approach is
used in this work which aligns the captured environment data of two sensors
from different positions. Thus their relative pose estimation obtained by
traditional methods is improved and the data can be fused.
To support this we present an approach which utilizes the uncertainty
information of modern tracking systems to determine the possible field of
view of the other sensor. Furthermore, it is estimated which parts of the
captured data is directly visible to both, taking occlusion and shadowing
effects into account. Afterwards a registration method, based on the
iterative closest point (ICP) algorithm, is applied to that data in order to
get an accurate alignment.
The contribution of the presented approch to the achievable accuracy is shown
with the help of ground truth data from a LiDAR simulation within a 3-D
crossroad model. Results show that a two dimensional position and heading
estimation is sufficient to initialize a successful 3-D registration process.
Furthermore it is shown which initial spatial alignment is necessary to
obtain suitable registration results.
Introduction
Humans were able to use indirect signals over mirrors or through the windows
of other vehicles to observe relevant areas e.g. at difficult crossroad
situations. To achieve a similar understanding of the surrounding with depth
sensors, one important aspect for future autonomous systems will be the
cooperative exchange of environment information denoted as car to car (C2C)
communication. Together with the use of real time cloud or local
infrastructure based services this will improve the recognition and reaction
capabilities towards objects located outside the direct field-of-view (FoV)
of the own sensors.
E.g. the green car of Fig. could provide
useful information about the blue truck for the red car. The foundation of
such a dense data fusion is the availability of an highly accurate relative
pose estimation which can be transferred into a homogeneous transformation
matrix. This is hard to derive by traditional satellite based relative
localization methods but their estimate could be improved using the gathered
environment data of both cars. A method for achieving this is the content of
this paper which is mainly based on the master thesis by .
For the rest of this paper we will refer to the red car as target and to the
green car as source and we will use the coordinate definitions of the right
picture from Fig. .
To support the understanding of the reader the present paper is organized as
follows: Sect. 2 provides relevant background information's which are
necessary to ensure comprehensibility. Afterwards the general outline and the
novel ideas of the presented method are shown in Sect. 3. An evaluation of
the presented approach is then given in Sect. 4. The conclusion in Sect. 5
contains a summary of all findings, it's limitations as well as improvements
and future work.
The figure shows two cars which could exchange their environment
information if an accurate enough pose estimation would be available (left
picture). If the pose estimation is inaccurate this would yield an alignment
error (center picture). The right picture shows the coordinate system
definitions which where used in the paper.
BackgroundDepth sensors
Typical depth sensors, like LiDARs and stereo cameras, measures the surface
of the surrounding environment. Using the projective geometry their range
images can be converted into a point cloud which represents the surface shape
of the surrounding. During this conversion the pixel wise neighborhood
information is maintained which is useful for further processing steps like
normal feature estimation. In the case that two of these sensors measures the
same surface area from different perspectives, this data can be used to align
them against each other if their relative pose estimation is inaccurate. The
process of doing this is called registration or shape matching which is part
of Sect. .
Normal feature estimation
To gain information about the underlying surface, which the point cloud data
represents, it is very common in 3-D data processing to determine normal
features
Vectors assigned to each point in space representing a
local plane patch of the approximated surface
. We apply here the standard
weighted normal averaging scheme which uses triangle combinations
incorporating the neighbourhood points . In
this work we use a weightning factor wj which is chosen as the reciprocal
product of the distance between the neighbours qi,j of pi according
to Eq. (). We refer to this method as distance weighted
cross product (DWCP).
wj=1|qi,j-pi|⋅|qi,j+1-pi|
Registration of point clouds
Registration, or shape matching, is the process to align two or more surfaces
against each other in such a way that a mathematical metric is minimized.
Although other registration methods exists (compare
) the most popular one is the so called ICP
algorithm introduced by .
The general idea is to find at first point correspondences between two
slightly misaligned data sets. This is done usually by searching the
currently closest point in the other data set. Secondly a locally optimal
rigid transformation matrix is calculated which minimizes a certain distance
metric in order to align them against each other. Using the result from the
previous iteration this procedure is repeated. New correspondences where
determined and the alignment is optimized again until some stopping criteria
is fulfilled.
The original formulation of the algorithm by delivers bad
results if the incorporated data sets does include outliers
E.g.
points which do not have a corresponding partner representing the same
physical object surface within the other data set.
. This happens especially
if the two surfaces do not overlap completely or if there are holes due to
occlusion effects. As a consequence a still growing number of different
variations has been proposed. The main goals are to speed up the convergence
rate and improve the robustness against outliers. Therefore, several
heuristics and modifications are applied and additional features, like
normals, are incorporated. A comparison study of several methods can be found
at .
In this work we utilize a point to plane based ICP method
which utilizes additional rejection methods . First all
correspondence where the angle between the corresponding normals exceeds
30∘ are ignored and secondly we remove the worst 10 % due to the
Euclidean distance .
Method
The primary goal of this work is to include the captured point cloud data of
a sensor platform, like the green car (compare
Fig. ), into the coordinate system of another,
e.g. the red car. Each of them has its own coordinate system indicated with
the superscript prefix S and T respectively. Thus, an arbitrary point Sp=xS,yS,zST in the source coordinate system can be
transformed in the target coordinate system, using TTS. It follows that
the corresponding point Tp=xT,yT,zTT from Sp can
be calculated with the following equation.
Tp1=TTS⋅Sp1
Unfortunately, the exact transformation matrix TTS is in general not known and has to be estimated by
T^TS. In order to achieve an initial guess of this matrix, it is
further assumed that the target platform estimates the relative position and
orientation of the source with a suitable, e.g. satellite based, relative
localization system which delivers a state x=x,y,z,ϕ,θ,ψT and its uncertainty, as covariance matrix
ΣCov. The accuracy of this transformation can be improved,
using a registration algorithm with environment information captured by both
sensor platforms simultaneously.
To achieve this the environment data, which was captured by both sensor
platforms simultaneously, is determined. Therefore the state and the
corresponding uncertainty information, namely the standard deviation values
DiagΣCov=σx2,σy2,σz2,σϕ2,σθ2,σψ2, were used to estimate the theoretical field of
view (FoV) of the source sensor within the targets coordinate system in order
to determine the data within the overlapping FoV. Further on occlusion
effects were resolved to achieve the relevant points for the registration
process within the target point cloud (denoted in the following as
T) and the source point cloud (S).
Once an accurate transformation is determined it can be applied to the whole
source point cloud. Both data sets are now fused within the target coordinate
systems and available for further processing. An overview of the previously
explained steps is given in Fig. and some steps
are explained in detail in the following.
Overview of the processing steps within the presented approach.
Overlapping FoV estimation
The FoV is the geometrical area, which is theoretically visible for a sensor,
without taking occlusion effects into account. For the most typical range
sensors, like automotive LiDAR, this can be approximated as a pyramid. All
sensor data have to lie within this field, thus it can be used as a rough
geometrical boundary. Such a pyramid can be defined for the source and target
sensor. In order to determine the overlapping part it is necessary to
transform them into the same coordinate system.
Uncertainty field of view
Problematic in this context is the accuracy of T^TS. The exact
origin and orientation of the source FoV is thus not known. To overcome this
problem the track uncertainty is used to increase the source FoV such that it
includes the error space, which is considered as normal distributed, up to a
certain confidence interval. How this can be achieved efficiently is
dependent on its original shape. In this work a pyramid model is used and
therefore the basic idea to derive a proper uncertainty field approximation
is described in the following. The full derivation can be found at
.
As mentioned already the uncertainty space is assumed to be normal
distributed with an expectation value of 0. Consequently a
surrounding confidence interval γ⋅σ would result in a
six-dimensional elliptical error space. If this is used to transform the
pyramid model into the uncertainty space this would result in an extremely
complex mathematical shape. Therefore just the diagonal values of
ΣCov are used to approximated this with a rectangular upper
boundary space. With this simplification it doesn't matter if one would take
cross correlation values into account or not. To support the understanding of
the reader a two-dimensional example is shown in
Fig. .
Within this chart three different 3σγ=3 confidence intervals of the same two dimensional correlated normal
distributed sample set are shown. The red one uses the full covariance matrix
whereas the green one just neglects the covariance values. The rectangular
confidence interval includes both ellipses and thus acts as an upper bound.
The two pictures show the coordinate system definitions and the
nomenclature of the original field of view (left) and the increased
uncertainty field of view (right).
Using this rectification, the confidence interval can be described just with
the maximum divergence from the state space in positive and negative
direction for each dimension. Consequently there are 26=64 different
maximum error variations. If all of these error variations are applied to the
FoV pyramid (compare Fig. , left
panel) the resulting shape can be approximated by a frustum pyramid as it is
shown in Fig. (right panel).
This approximation of the uncertainty FoV can be used to clip the points of
the target sensor which could have no corresponding partner within the source
data set.
Point uncertainty ranges
In this section it is estimated which parts of both point clouds could be
visible to both sensors taking occlusion effects into account. E.g. if an
object is located within both fields of view but it is occluded through
another object from just one perspective. To figure this out the uncertainty
of the initial transformation matrix T^TS have to be taken into
account again. Therefore the same assumptions, namely the rectification of
the uncertainty space, from the previous section are applied. This time the
uncertainty space is determined for each point of the source point cloud
separately and the resulting space is approximated by a sphere around the
point Spi∈S with a radius ri according to
Eq. ().
ri=maxrmin,maxri,j∀j+∥terr∥∞j∈1,2,…,|ΣRot|terr∈ΣTranslri,j=|pS,i-Rotrj⋅pS,i|rj∈ΣRotΣTransl=±σx×±σy×±σzΣRot=±σϕ×±σθ×±σψ
Transferred into the target coordinate system using T^TS these
spheres represents the area where pT,i1=TTS⋅pS,i1 could be if TTS would be known exactly. Further it
represents the area where corresponding points of T could be which
were captured from the same object. Thus each source point, which has no
target point within its range defined by ri, is clipped. The other way
around each target point is clipped if it is not within at least one of the
source uncertainty spheres (compare
Fig. ).
Evaluation
The objective of the following evaluation is to show the applicability of the
presented approach for cooperative environment recognition applications.
Compared to the master thesis , where this paper is based
on, the data which was used for the evaluation was re-evaluated to fit the
limited scope of this paper. For a more exhaustive evaluation please refer to
.
Evaluation strategy
To show the effects of different steps of the presented approach a detailed
cross road 3-D model was built with
SketchUp®
A 3-D modeling software
by Trimble Navigation
. This was used to simulate range data by two virtual
LiDAR depth sensors, which were moved through the static model such that they
overlap partially and enough non-parallel surface details for registration
were included. The recorded data includes ground truth range data as well as
position and orientation of both sensors. This was then superimposed by range
and pose (position and orientation) noise. These point clouds were afterwards
aligned and the output was compared with the applied pose error vector. This
procedure was than repeated 21 times with different data set combinations
each with 100 different range and tracking noise samples. Thus the quality of
the registration process can be statistically appraised.
Range data generation
The range data was captured using a simple ray tracing approach. Each laser
beam of the virtual LiDAR has been modelled as a single ray. In contrast to
the simulation applied by the complete sensor optic
simulation and the 3 dimensional beam structure was not taken into account.
This approach allows a much faster simulation but it does not reflect the
special properties of LiDAR sensors, e.g. just the first echo is used. Anyhow
this simple sensor model can be applied also to other range sensors, like
stereo and time of flight cameras, hence it is used here. For the evaluation
in this work we used a LiDAR model with 40 layers each with 451 beams
covering 72∘ horizontally and 10∘ vertically.
To each source point an uncertainty range is assigned. Within each
sphere at least one target point has to be present and each target point have
to lie within at least on sphere.
The range image taken directly from the simulation is noiseless and therefore
not very realistic. That's why it is superimposed by Gaussian noise.
According to the manual of the Ibeo LUX laser scanner family,
a typical standard deviation value σr is 10 cm, which is used here.
To simulate tracking noise n random error sample from a normal distributed
error space is superimposed to x according to
Eq. (). The uncertainty standard deviation values
σx,σy,σz,σϕ,σθ,σψ were
taken from Table to simulate different
tracking conditions and quality. As described in
Sect. this will influence how
the overlapping area of the sensor data is estimated.
x=x+xErrxErr∈N0,σx,σy,σz,σϕ,σθ,σψT
Tracking noise setups.
σϕ,σθ,σψσx,σy,σzTN11∘30 cmTN22∘60 cmTN33∘120 cmTN44∘240 cmAnalysis of the results
Percentiles from N0,σ intervals for
direct comparison with non-normal distributed error spaces.
Within each test cycle the resulting alignment vector xAlg is
subtracted from the previously applied tracking noise vector xErr
which results in the remaining error vector xRem (compare
Eq. ). Over all, in total 2100, test cycles the
expectation value of this is assumed to be 0. To allow an easy
comparison between the input error space, which is normal distributed, and
the output error space, which is in general not normal distributed, the
equivalent percentiles are used according to
Table . These percentiles are calculated for
each component of xRem. If the resulting value of the
percentile is smaller than the equivalent interval from the input error space
normal distribution the alignment was improved during the registration.
xErr-xAlg=xRem
Contribution of clipping & confidence interval
The table shows the results of the remaining error percentiles for
the point to plane metric with different clipping confidence intervals
γ. The light green colour indicates nearly perfect alignment results
whereas dark red stands for unsuitable results above 1∘ or 20 cm. The
achieved accuracy and the necessary number of iterations is quite similar in
all cases. Especially for higher tracking noise levels where the uncertainty
ranges were huge.
To figure out the contribution of the FoV and uncertainty range clipping from
Sects. and
the influence of the applied
confidence interval will be investigated now. Therefore the confidence
interval parameter γ is modified and an additional test was done where
both clipping methods were deactivated.
According to Table the smallest remaining
error is for γ=1.75. With γ=1.75 and without clipping at
all the error increases. This effect decreases with higher tracking noise due
to the huge uncertainty ranges where occlusion and shadowing effects cannot
be detected. Obviously the rejection step works very efficiently such that
bad correspondences are detected reliably. Consequently this pre-processing
does affect the alignment accuracy just slightly but noticeable. Thus we
conclude that the FoV and uncertainty range clipping stabilize the
registration process. Furthermore it has an influence on the number of
points, which have to be aligned and thus the computational complexity is
decreased drastically. This holds true especially if the overlap is small
compared to the rest.
A smaller confidence interval reduces the range where potential
correspondences could be found. This has positive effects if the error lies
within this interval. In this case it does not happen that some of the
potential good corresponding partners were clipped. However, if the current
error is located outside of the confidence interval, points were clipped
which could have a good corresponding partner in the other set. Thus the
number of details, which can be used for the alignment, is reduced together
with the overall possibility that ideal pairings can be found in general.
This limits the alignment abilities drastically if the current initial error
is high.
Thus we conclude that the uncertainty FoV and range clipping is especially
useful for low tracking noise. There the number of points, which have to be
aligned is reduced significantly and potential outliers were detected even
before they could take any bad influence on the registration process. The
confidence interval should not be chosen too small to avoid the clipping of
potential useful point pairings. For higher tracking noise the uncertainty
range clipping is less useful because occlusion effects can be detected just
roughly. Therefore, outliers within the overlapping area were not thrown out.
The FoV clipping on the other hand is always useful because the number of
points, which have to be aligned, is decreased at low computational costs and
potential wrong pairings were avoided.
Conclusions
The results show that an alignment of two data sets can be achieved with the
help of a pose tracking system although they were captured from completely
different perspectives. During the previous evaluation the contribution of
certain system design aspects have been examined and the findings are
summarized in the following sections.
The primary goal of the FoV clipping was to determine the overlapping area of
the sensors FoV due to the pose estimation uncertainty. This reduces the
number of points, which have to be aligned, significantly especially if the
overlapping part is just a small subset of the complete data. Thus it further
reduces the amount of possible outliers, which were not detected by the
applied heuristics. That's why we conclude that the utilization of this
method is always recommendable.
The idea of the uncertainty range clipping was to find the points, which
could have a corresponding partner in the other data set, taking occlusion
effects into account. This works fine if the tracking accuracy is already
good because in this case the uncertainty ranges are small and a lot of
potential outliers are ignored. Additionally the number of points, which have
to be aligned is reduced further, which improves the overall performance.
However, if the tracking noise is high, the contribution of the uncertainty
range clipping is reduced drastically. That's why the uncertainty ranges are
too big to detect occlusion effects, caused by small and medium sized objects
like cars and trees, effectively. Nevertheless, occlusions caused by huge
objects like buildings, were detected properly and thus points, which cannot
have a corresponding point in the other data set are ignored. However, it is
heavily situation dependent, whether the benefits justify the computational
costs of this method.
According to Table a rotation and
translation accuracy below 0.5∘ and below 15 cm is attainable in most
cases. Under good circumstances with a lot of details this falls even under
0.1∘ and 10 cm. This should be enough for a lot of practical
applications. Using the improved transformation result it is now possible to
fuse the complete data of both sensors.
Limitations of the presented approach
The presented approach is suitable for scenarios where two depth sensors
observe partly the same surface of an arbitrary scene from completely
different viewpoints. Additionally their relative pose have to be roughly
known, due to some also known confidence interval. However some additional
conditions must be met. First of all, enough details should be included in
the overlapping part, such that an unambiguous registration result is
possible. Therefore, it must contain at least 3 non parallel surface patches.
This implies further that the point density in this region is sufficient to
estimate proper normal features and point correspondences. Additionally the
extent of this region has to be significantly higher than the measurement
noise of the sensor data and the tracking error.
Further improvements and future work
So far there is no method applied which checks if the above mentioned
requirements were fulfilled. For practical applications this have to be
checked for each sensor data pair separately before the presented approach
can be applied. If that is successfully done the plausibility of the
registration result should be checked. Currently there is no method applied
which is able to determine the quality of the registration result. A
completely wrong alignment can be detected easily by evaluating, if the
alignment result lies within the confidence interval of the applied tracking
mechanism. For a further assessment of the achieved alignment accuracy it is
necessary to develop a suitable metric. This could be done based on the
remaining average distance between the correspondences found in the last ICP
iteration. Further on the similarity of the normal features between the point
pairs could be a helpful measurement.Edited by: M. Chandra Reviewed by: two anonymous referees
ReferencesBesl, P. and McKay, N. D.: A method for registration of 3-D shapes, Pattern
Analysis and Machine Intelligence, IEEE Transactions, 14, 239–256,
10.1109/34.121791, 1992.Chen, Y. and Medioni, G.: Object modeling by registration of multiple range
images, in: Robotics and Automation, 1991, Proceedings, 1991 IEEE
International Conference, Vol. 3, 2724–2729, 10.1109/ROBOT.1991.132043, 1991.
Gabriel, M.: LiDAR-Signaturberechnung von raeumlich ausgedehnten Zielen im
Fahrzeugumfeld, Bachelor thesis, Chemnitz University of Technology, 2010.Ibeo Automotive Systems GmbH: Operating Manual ibeo LUX® Laser scanner, ibeo Automobile Sensor
GmbH, Merkurring 20, 22143 Hamburg, 2.5 Edn., 2008.
Jähn, B.: Fusion of multi-view point cloud data for cooperative object
detection, Master thesis, Chemnitz University of Technology, 2014.Klasing, K., Althoff, D., Wollherr, D., and Buss, M.: Comparison of surface
normal estimation methods for range sensing applications, in: Robotics and
Automation, 2009, ICRA '09, IEEE International Conference, 3206–3211,
10.1109/ROBOT.2009.5152493, 2009.Pomerleau, F., Colas, F., Siegwart, R., and Magnenat, S.: Comparing ICP
variants on real-world data sets, Auton. Robots, 34, 133–148,
10.1007/s10514-013-9327-2, 2013.
Pottmann, H., Leopoldseder, S., and Hofer, M.: Registration without ICP,
Comput. Vis. Image Und., 95, 54–71, 2002.Pulli, K.: Multiview registration for large data sets, in: 3-D Digital Imaging
and Modeling, 1999, Proceedings, Second International Conference,
160–168, 10.1109/IM.1999.805346, 1999.Rusinkiewicz, S. and Levoy, M.: Efficient variants of the ICP algorithm, in:
3-D Digital Imaging and Modeling, 2001, Proceedings, Third International
Conference, 145–152, 10.1109/IM.2001.924423, 2001.