Skip to content

Instantly share code, notes, and snippets.

View AlenkaF's full-sized avatar

Alenka Frim AlenkaF

View GitHub Profile
@AlenkaF
AlenkaF / .zprofile
Last active December 2, 2023 11:39
Apache Arrow build commands
eval "$(/opt/homebrew/bin/brew shellenv)"
export ARROW_GITHUB_API_TOKEN=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
export APACHE_JIRA_TOKEN=MjAzNDQzMjU2MDgxOpX3cy3h9s3+TiP1oWpTuiyyp4y9
arrow () {
if [[ $1 = "submodule" ]]; then
git submodule update --init
Line # Mem usage Increment Occurrences Line Contents
=============================================================
7 147.8 MiB 147.8 MiB 1 @profile
8 def my_func():
9 # Load Vaex example
10 173.0 MiB 25.3 MiB 1 df = vaex.example()
11 # Create a virtual column
12 173.0 MiB 0.0 MiB 1 df.add_virtual_column("r", "sqrt(x**2 + y**2 + z**2)")
13
14 # Create a __dataframe__ instance
@AlenkaF
AlenkaF / 01_tensor_extenstion_examample.py
Last active January 26, 2023 08:56
Example of tensor extension with tests in PyArrow
import ast
import json
import math
import numpy as np
import pyarrow as pa
class TensorType(pa.ExtensionType):
def __init__(self, value_type, shape, order):
self._value_type = value_type
@AlenkaF
AlenkaF / PyArrow_install_build.txt
Last active September 27, 2022 11:18
Installing PyArrow without doing an inplace build without setting CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
(pyarrow-dev38) C:\Users\Alenka\repos\arrow\python>pip install -e .
Obtaining file:///C:/Users/Alenka/repos/arrow/python
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.16.6 in c:\users\alenka\anaconda3\envs\pyarrow-dev38\lib\site-packages (from pyarrow==10.0.0.dev169+gcbf0ec0d0.d20220927) (1.23.3)
Building wheels for collected packages: pyarrow
Building editable for pyarrow (pyproject.toml) ... done
Created wheel for pyarrow: filename=pyarrow-10.0.0.dev169+gcbf0ec0d0.d20220927-0.editable-cp38-cp38-win_amd64.whl size=26397 sha256=317b71a1b46c66253926260a1fc9c01919ff82fa3326a264956d187276c84b77
@AlenkaF
AlenkaF / inspect_files.txt
Created September 12, 2022 04:49
Loading PyArrow
(pyarrow-dev) C:\Users\Alenka\repos\arrow\python>cd pyarrow
(pyarrow-dev) C:\Users\Alenka\repos\arrow\python\pyarrow>ls
__init__.pxd _fs.pxd benchmark.py lib.cp39-win_amd64.pyd
__init__.py _fs.pyx builder.pxi lib.pxd
__pycache__ _gcsfs.pyx cffi.py lib.pyx
_compute.cp39-win_amd64.pyd _generated_version.py compat.pxi lib_api.h
_compute.pxd _hdfs.pyx compute.py memory.pxi
_compute.pyx _hdfsio.cp39-win_amd64.pyd config.pxi orc.py
_compute_docstrings.py _hdfsio.pyx conftest.py pandas-shim.pxi
@AlenkaF
AlenkaF / pyarrow_build_output.txt
Last active September 12, 2022 04:48
The ouput of the PyArrow build
(pyarrow-dev) C:\Users\Alenka\repos\arrow\python>python setup.py build_ext --inplace
running build_ext
creating C:\Users\Alenka\repos\arrow\python\build
creating C:\Users\Alenka\repos\arrow\python\build\cpp
-- Running CMake for PyArrow C++
cmake -DARROW_BUILD_DIR=build -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=C:\Users\Alenka\repos\arrow\python\build\dist -DPYTHON_EXECUTABLE=C:\Users\Alenka\anaconda3\envs\pyarrow-dev\python.exe -DPython3_EXECUTABLE=C:\Users\Alenka\anaconda3\envs\pyarrow-dev\python.exe -DPYARROW_WITH_DATASET=on -DPYARROW_WITH_PARQUET_ENCRYPTION=on -DPYARROW_WITH_HDFS=off -G Ninja C:\Users\Alenka\repos\arrow\python\pyarrow/src
-- The C compiler identification is MSVC 19.16.27048.0
-- The CXX compiler identification is MSVC 19.16.27048.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
@AlenkaF
AlenkaF / Steps_build_Windows.txt
Last active October 4, 2022 13:09
Steps to build PyArrow on Windows
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\Tools\VsDevCmd.bat" -arch=amd64
set CC=cl.exe
set CXX=cl.exe
conda create -y -n pyarrow-dev -c conda-forge ^
--file arrow\ci\conda_env_cpp.txt ^
--file arrow\ci\conda_env_python.txt ^
--file arrow\ci\conda_env_gandiva.txt ^
python=3.9
conda activate pyarrow-dev
@AlenkaF
AlenkaF / Benchmark_baseline_PR.py
Last active August 30, 2022 13:31
Baseline PR benchmark results, 10 iterations
##########################################
# dataframe-to-table
(qa) (base) alenkafrim@Alenkas-MacBook-Pro benchmarks % conbench dataframe-to-table chi_traffic_2020_Q1 --iterations 5
Time to POST http://localhost:5000/api/login/ 0.0827488899230957
POST http://localhost:5000/api/login/ failed
Time to POST http://localhost:5000/api/benchmarks/ 0.004584789276123047
POST http://localhost:5000/api/benchmarks/ failed
@AlenkaF
AlenkaF / Benchmark_refactoring_PR.py
Last active August 30, 2022 15:19
Refactoring PR benchmark results, 10 iterations
##########################################
# dataframe-to-table
(qa) (base) alenkafrim@Alenkas-MacBook-Pro benchmarks % conbench dataframe-to-table chi_traffic_2020_Q1 --iterations 5
Time to POST http://localhost:5000/api/login/ 0.05665302276611328
POST http://localhost:5000/api/login/ failed
Time to POST http://localhost:5000/api/benchmarks/ 0.005268096923828125
POST http://localhost:5000/api/benchmarks/ failed
pushd arrow
git submodule init
git submodule update
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
popd
conda create -y -n pyarrow-dev-no-gandiva -c conda-forge \
--file arrow/ci/conda_env_unix.txt \
--file arrow/ci/conda_env_cpp.txt \