Skip to content

Commit 5caa9e8

Browse files
bmzhaotensorflower-gardener
authored andcommitted
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm. After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference. See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined". To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++. This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48 Successfully built on vanilla Ubuntu 18.04 VM: bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 2067.380s, Critical Path: 828.19s INFO: 12942 processes: 51 remote cache hit, 12891 local. INFO: Build completed successfully, 14877 total actions The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation. PiperOrigin-RevId: 281341817 Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
1 parent b52fe71 commit 5caa9e8

File tree

3 files changed

+19
-13
lines changed

3 files changed

+19
-13
lines changed

.bazelrc

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -209,19 +209,8 @@ build --announce_rc
209209
# Other build flags.
210210
build --define=grpc_no_ares=true
211211

212-
# See https://github.com/bazelbuild/bazel/issues/7362 for information on what
213-
# --incompatible_remove_legacy_whole_archive flag does.
214-
# This flag is set to true in Bazel 1.0 and newer versions. We tried to migrate
215-
# Tensorflow to the default, however test coverage wasn't enough to catch the
216-
# errors.
217-
# There is ongoing work on Bazel team's side to provide support for transitive
218-
# shared libraries. As part of migrating to transitive shared libraries, we
219-
# hope to provide a better mechanism for control over symbol exporting, and
220-
# then tackle this issue again.
221-
#
222-
# TODO: Remove this line once TF doesn't depend on Bazel wrapping all library
223-
# archives in -whole_archive -no_whole_archive.
224-
build --noincompatible_remove_legacy_whole_archive
212+
# Prevent regression of https://github.com/bazelbuild/bazel/issues/7362
213+
build --incompatible_remove_legacy_whole_archive
225214

226215
# Modular TF build options
227216
build:dynamic_kernels --define=dynamic_loaded_kernels=true

tensorflow/BUILD

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,18 @@ tf_cc_shared_object(
587587
] + tf_additional_binary_deps(),
588588
)
589589

590+
# This is intended to be the same as tf_binary_additional_srcs:
591+
# https://github.com/tensorflow/tensorflow/blob/cd67f4f3723f9165aabedd0171aaadc6290636e5/tensorflow/tensorflow.bzl#L396-L425
592+
# And is usable in the "deps" attribute instead of the "srcs" attribute
593+
# as a workaround for https://github.com/tensorflow/tensorflow/issues/34117
594+
cc_import(
595+
name = "libtensorflow_framework_import_lib",
596+
shared_library = select({
597+
"//tensorflow:macos": ":libtensorflow_framework.dylib",
598+
"//conditions:default": ":libtensorflow_framework.so",
599+
}),
600+
)
601+
590602
# -------------------------------------------
591603
# New rules should be added above this target.
592604
# -------------------------------------------

tensorflow/tensorflow.bzl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,11 @@ def tf_cc_binary(
626626
[
627627
clean_dep("//third_party/mkl:intel_binary_blob"),
628628
],
629+
) + if_static(
630+
extra_deps = [],
631+
otherwise = [
632+
clean_dep("//tensorflow:libtensorflow_framework_import_lib"),
633+
],
629634
),
630635
data = depset(data + added_data_deps),
631636
linkopts = linkopts + _rpath_linkopts(name_os),

0 commit comments

Comments
 (0)