[SYSTEMDS-3548] Optimize IO path Python interface for SystemDS #2154

Nakroma · 2024-12-17T09:52:50Z

Draft for the student project SYSTEMDS-3548.

Current contributions:

Fixes some minor bugs related to the performance tests
Parallelizes pandas_to_frame_block column processing (see image below for speed up, tested on my machine)

This commit fixes the load_numpy string performance test case. It keeps the CLI usage consistent with the other test cases, but converts the dtype to the correct one internally.

This commit fixes the array boolean convert breaking for row numbers above 64. It also adds a bit more error handling to prevent cases like this in the future.

This commit parallelizes the column processing in the pandas DataFrame to FrameBlock conversion.

codecov · 2024-12-17T10:32:29Z

Codecov Report

Attention: Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 72.33%. Comparing base (d3fcfb1) to head (6b1f68c).
Report is 16 commits behind head on main.

Files with missing lines	Patch %	Lines
.../apache/sysds/runtime/util/Py4jConverterUtils.java	20.00%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2154      +/-   ##
============================================
+ Coverage     72.03%   72.33%   +0.30%     
- Complexity    43937    44211     +274     
============================================
  Files          1441     1443       +2     
  Lines        166106   166353     +247     
  Branches      32428    32477      +49     
============================================
+ Hits         119655   120334     +679     
+ Misses        37199    36789     -410     
+ Partials       9252     9230      -22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

christinadionysio · 2024-12-18T14:10:33Z

LGTM! Thank you for your contribution @Nakroma!

Baunsgaard · 2024-12-18T14:21:47Z

LGTM as well.

How did you measure the time?
Is it with startup time of the system?

Nakroma · 2024-12-18T14:29:23Z

@Baunsgaard I used the IO benchmark scripts for the figure provided above:

https://github.com/apache/systemds/blob/main/scripts/perftest/runAllIO.sh
https://github.com/apache/systemds/blob/main/scripts/perftest/python/io/load_pandas.py

Baunsgaard · 2024-12-18T16:08:02Z

@Baunsgaard I used the IO benchmark scripts for the figure provided above:

https://github.com/apache/systemds/blob/main/scripts/perftest/runAllIO.sh https://github.com/apache/systemds/blob/main/scripts/perftest/python/io/load_pandas.py

Great!

Then you can get better numbers:

systemds/scripts/perftest/python/io/load_pandas.py

Line 37 in d3fcfb1

run = "\n".join(

Modify the script to start the context not in the 'run' part, instead move it to the 'setup' part, and remember to shut down the system with ctx.close() after you are done measuring.

…t during run statement

Nakroma · 2024-12-19T12:26:28Z

@Baunsgaard Okey yeah that makes sense - pushed a commit for that 👍 I didnt move it to the setup but rather inside the global context, so .close() time is not included in the timing and also to support the args.number parameter.

Baunsgaard · 2024-12-19T13:09:02Z

@Baunsgaard Okey yeah that makes sense - pushed a commit for that 👍 I didnt move it to the setup but rather inside the global context, so .close() time is not included in the timing and also to support the args.number parameter.

what are the times then?

Nakroma · 2024-12-19T13:41:34Z

what are the times then?

seems to be about a difference of 1-2s, at least on my local machine

Baunsgaard · 2024-12-19T14:03:38Z

what are the times then?

seems to be about a difference of 1-2s, at least on my local machine

60% speedup on int32 and 100% on int64 is great!
However, it does seem to me like there is something else taking time from your results. I would expect speedup closer to the number of cores in your system.

Nakroma · 2024-12-19T19:19:57Z

60% speedup on int32 and 100% on int64 is great! However, it does seem to me like there is something else taking time from your results. I would expect speedup closer to the number of cores in your system.

So there is some more constant time, the building of the frameblock a few lines before the .convert calls for example is around 400ms.

I looked at the profiling a bit more and it seems like most time is spent on socket communication between Java and Python. My assumption would be that this adds quite a bit of overhead and doesn't parallelize well.

Baunsgaard · 2024-12-27T23:04:07Z

So there is some more constant time, the building of the frameblock a few lines before the .convert calls for example is around 400ms.

We could put the allocation into the parallel transfer call for each column?

I looked at the profiling a bit more and it seems like most time is spent on socket communication between Java and Python. My assumption would be that this adds quite a bit of overhead and doesn't parallelize well.

I was afraid this was the case, Maybe there is a way to set the thread pool to use on the java side to make it better at receiving requests in parallel. However, that is probably just a dream.

This commit moves the assignment of column data to the FrameBlock to the parallel column processing.

Baunsgaard · 2024-12-30T14:29:38Z

LGTM, I have now merged it.

While merging I played around with parallelizing the Python API and found out that it spawns a thread per connection. There is indeed an overhead in this connection, but it is not the main problem.

To make the transfer more efficient we could:

Reduce the number of calls by fusing many operations into single calls to java.
Reduce the current serialization bottleneck by slicing up the array into many smaller byte arrays when sending over.

I see around 20% utilization of my CPU when transferring 10k by 10k integer matrices, so there is room for improvement.

Nakroma added 4 commits December 3, 2024 13:03

[SYSTEMDS-3548] Fix performance test for load_numpy string case

d9b439a

This commit fixes the load_numpy string performance test case. It keeps the CLI usage consistent with the other test cases, but converts the dtype to the correct one internally.

[SYSTEMDS-3548] Fix Py4j boolean array convert

ae15fe8

This commit fixes the array boolean convert breaking for row numbers above 64. It also adds a bit more error handling to prevent cases like this in the future.

[SYSTEMDS-3548] Parallelize pandas_to_frame_block

0b54a15

This commit parallelizes the column processing in the pandas DataFrame to FrameBlock conversion.

Merge branch 'apache:main' into main

8323515

[SYSTEMDS-3548] Update converters.py to be black compliant

a41cfe6

Nakroma marked this pull request as ready for review December 18, 2024 14:04

Nakroma added 2 commits December 19, 2024 13:21

[SYSTEMDS-3548] Update I/O tests to not include systemds context star…

b3e5126

…t during run statement

[SYSTEMDS-3548] Remove redundant close statements

89202c1

[SYSTEMDS-3548] Parallelize FrameBlock column allocation

6b1f68c

This commit moves the assignment of column data to the FrameBlock to the parallel column processing.

Baunsgaard closed this in 5f360ef Dec 30, 2024

Nakroma mentioned this pull request Jan 24, 2025

[SYSTEMDS-3548] Optimize IO path Python interface for SystemDS #2189

Closed

[SYSTEMDS-3548] Optimize IO path Python interface for SystemDS #2154

[SYSTEMDS-3548] Optimize IO path Python interface for SystemDS #2154

Uh oh!

Conversation

Nakroma commented Dec 17, 2024

Uh oh!

codecov bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

christinadionysio commented Dec 18, 2024

Uh oh!

Baunsgaard commented Dec 18, 2024

Uh oh!

Nakroma commented Dec 18, 2024

Uh oh!

Baunsgaard commented Dec 18, 2024

Uh oh!

Nakroma commented Dec 19, 2024

Uh oh!

Baunsgaard commented Dec 19, 2024

Uh oh!

Nakroma commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baunsgaard commented Dec 19, 2024

Uh oh!

Nakroma commented Dec 19, 2024

Uh oh!

Baunsgaard commented Dec 27, 2024

Uh oh!

Baunsgaard commented Dec 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 17, 2024 •

edited

Loading

Nakroma commented Dec 19, 2024 •

edited

Loading