- 01 Apr, 2024 1 commit
-
-
Jing Chen authored
PiperOrigin-RevId: 620968670
-
- 31 Mar, 2024 1 commit
-
-
Andrei Vagin authored
PiperOrigin-RevId: 620610190
-
- 30 Mar, 2024 1 commit
-
-
Etienne Perot authored
I am hitting the 15-second deadline for llama2-70b @ fp16. It's a beast of a model with ~130GiB of RAM. PiperOrigin-RevId: 620382594
-
- 29 Mar, 2024 3 commits
-
-
Kevin Krakauer authored
PiperOrigin-RevId: 620360076
-
Jing Chen authored
PiperOrigin-RevId: 620321751
-
Jing Chen authored
PiperOrigin-RevId: 620107765
-
- 28 Mar, 2024 4 commits
-
-
Kevin Krakauer authored
These are just little optimizations found while working on other things. PiperOrigin-RevId: 620083599
-
gVisor bot authored
PiperOrigin-RevId: 620018746
-
Lucas Manning authored
Bug scenario: T1: Creates waiter queue, adds waiter, emits mount promise block event, waits. T2: Gets waiter queue from vfs.mountPromises with read lock. T1: Daemon does mount, notifies original waiter, deletes promise. T2: Emits another mount promise block event, but mount already happened! T2: Waits forever for a mount that will never come. PiperOrigin-RevId: 619974202
-
Etienne Perot authored
This utility creates a nested structure out of a flat list of fully-qualified test names, and can then execute them using nested `t.Run`s that reflect the hierarchy properly. This is useful for CUDA sample tests, which are organized in a hierarchy. This hierarchy isn't known at compile time, so it cannot be reflected using plain `t.Run`s. PiperOrigin-RevId: 619730658
-
- 27 Mar, 2024 4 commits
-
-
Jing Chen authored
PiperOrigin-RevId: 619691808
-
Jing Chen authored
PiperOrigin-RevId: 619663125
-
Etienne Perot authored
Callers may request a container from the pool, and must release it back when they are done with it. This is useful for large tests which can `exec` individual test cases inside the same set of reusable containers, to avoid the cost of creating and destroying containers for each test. It also supports reserving the whole pool ("exclusive"), which locks out all other callers from getting any other container from the pool. This is useful for tests where running in parallel may induce unexpected errors which running serially would not cause. This allows the test to first try to run in parallel, and then re-run failing tests exclusively to make sure their failure is not due to parallel execution. I plan to use this for CUDA sample tests. PiperOrigin-RevId: 619377512
-
Etienne Perot authored
PiperOrigin-RevId: 619375261
-
- 26 Mar, 2024 4 commits
-
-
Zeling Feng authored
There is a panic opportunity if the timer fires but the socket is going through cleanup. In this case `e.route` might go away but the timer handler for keep-alive continues to execute and causes panics. The fix is to return early from the handler if we find out the route is already removed. PiperOrigin-RevId: 619268432
-
Jing Chen authored
PiperOrigin-RevId: 619265846
-
gVisor bot authored
This is necessary to ensure errno is not updated while allocating. Allocators are allowed to update errno, even in case of success. As gvisor uses matchers to check the value of errno, the tests might fail if errno is overriden by an allocation done while building the matcher. Using a custom implementation of new and delete ensures this is not the case. PiperOrigin-RevId: 619238390
-
Jing Chen authored
PiperOrigin-RevId: 619045692
-
- 25 Mar, 2024 3 commits
-
-
Jing Chen authored
PiperOrigin-RevId: 618998464
-
Etienne Perot authored
This is useful to debug long-running or stuck tests, as with `--test_output=errors` the log is only shown when the test fails or times out. PiperOrigin-RevId: 618908605
-
Etienne Perot authored
PIL requires a string format name, whereas `Format.PNG` is an "enum string" which is not quite the same type. I had not noticed this problem because I had manually tested the test on a VM where the cached Stable Diffusion XL Docker image was from before this change. Tested on a fresh VM with a freshly-built Docker image now. This is split in two changes because the image needs to be rebuilt. The first change updates the image, and the second change re-enables the test. PiperOrigin-RevId: 618874414
-
- 23 Mar, 2024 2 commits
-
-
gVisor bot authored
Allocation are not guaranteed to preserve errno, even in case of success. Because the test matchers test against errno, preserve errno when allocating new matchers. PiperOrigin-RevId: 618437007
-
Etienne Perot authored
In a previous change, the GPU images changed from any-architecture to x86-only, so they are no longer available on ARM. Therefore, rules like: ``` gpu-smoke-images: load-basic_cuda-vector-add load-gpu_cuda-tests .PHONY: gpu-smoke-images ``` ... Fail if these images don't exist on ARM. This change makes the image loading ignored instead. PiperOrigin-RevId: 618349678
-
- 22 Mar, 2024 7 commits
-
-
Steve Silva authored
PiperOrigin-RevId: 618286015
-
Kevin Krakauer authored
In a redis-benchmark PING_INLINE test, this reduces allocations by 32%. PiperOrigin-RevId: 618248114
-
gVisor bot authored
PiperOrigin-RevId: 618246127
-
Jing Chen authored
PiperOrigin-RevId: 618077589
-
Etienne Perot authored
PIL requires a string format name, whereas `Format.PNG` is an "enum string" which is not quite the same type. I had not noticed this problem because I had manually tested the test on a VM where the cached Stable Diffusion XL Docker image was from before this change. Tested on a fresh VM with a freshly-built Docker image now. This is split in two changes because the image needs to be rebuilt. The first change updates the image, and the second change re-enables the test. PiperOrigin-RevId: 618048266
-
Jing Chen authored
PiperOrigin-RevId: 618047305
-
Jing Chen authored
PiperOrigin-RevId: 618034948
-
- 21 Mar, 2024 6 commits
-
-
Kevin Krakauer authored
This was the classic "os.File loves to close important file descriptors" problem.
-
Etienne Perot authored
PiperOrigin-RevId: 617956471
-
Ayush Ranjan authored
When the sandbox process has a large memory footprint (say >25 GiB), it can take more than 5 seconds for the sandobx process to dissapear from the process table after receiving SIGKILL. This is more applicable for GPU applications which can use very large amounts of memory. PiperOrigin-RevId: 617931970
-
Nayana Bidari authored
CPUUsage() returns the CPU usage used in calculating the pod CPU utilization. The cgroup v1 version returns this value in nanoseconds,but the v2 version was returning in microseconds which resulted in incorrect CPU usage values when cgroupv2 was used as default. Fix this by changing the return value of CPUUsage() in cgroupv2 to nanoseconds PiperOrigin-RevId: 617903260
-
Ayush Ranjan authored
Tested on a T4 GPU with driver version 535.161.07: ``` $ docker run --runtime=runsc --rm -it --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ``` PiperOrigin-RevId: 617867263
-
Jing Chen authored
PiperOrigin-RevId: 617760440
-
- 20 Mar, 2024 4 commits
-
-
Kevin Krakauer authored
cl/591090544 introduced a flag that was not added to tcp_benchmark, breaking it. PiperOrigin-RevId: 617622867
-
Jing Chen authored
PiperOrigin-RevId: 617621125
-
Kevin Krakauer authored
In a redis-benchmark PING_INLINE test, this reduces allocations by 84%. PiperOrigin-RevId: 617542009
-
Ayush Ranjan authored
PiperOrigin-RevId: 617535894
-