Conversation
In this commit the empty interfaces are replaces with Element interfaces, and instead of relying on type inference, we ask the vector to provide a concrete value. This has greatly increased the throughput of the program.
| return &Vector{ | ||
| s: make([]Element, 0, size), | ||
| NewElement: func(value string) Element { | ||
| num := strings.Split(value, sep)[pos] |
There was a problem hiding this comment.
What about a regular expression to capture everything from the start of the line to the first \t. It will be constant time O(1). Because I think strings.Split is O(n)
There was a problem hiding this comment.
That's a great point. I will make changes.
There was a problem hiding this comment.
I came up with this regexp, but it made the operation slower.
str := fmt.Sprintf(`^([^%s]+%s){%d}([^%s]+)%s?`, sep, sep, pos, sep, sep)
re := regexp.MustCompile(str)
return func(size int) *Vector {
return &Vector{
s: make([]Element, 0, size),
NewElement: func(value string) Element {
num := re.FindStringSubmatch(value)[2]
i, err := strconv.Atoi(num)
if err != nil {
panic(errors.Wrap(err, "converting value from string"))
}
}
}
}
}
askiada
left a comment
There was a problem hiding this comment.
Great stuff! It also gave me ideas about the next steps
| } | ||
|
|
||
| // Sort Perform a binary search to find where to put a value in a vector. Ascending order. | ||
| func Sort(v *Vector, line, sep string, pos int) error { |
There was a problem hiding this comment.
I understand the reason of the separator and why we split.
But what if the person just want to sort a file that not a CSV or a TSV. I don't know like a text file with a sentence per line.
There was a problem hiding this comment.
The AllocateStringVector should do it.
| // Vector holds a slice of Elements for sorting. The NewElement is called each | ||
| // time a new item from the file is read or inserted into the slice. The Less | ||
| // function should return true if the first element is lower than the second. | ||
| type Vector struct { |
There was a problem hiding this comment.
I really like this new Vector regarding performance and clarity. It would be even better if the underlying structure does not have to be a slice. This is not top priority, but maybe someone would like to have a linked list/other data structure.
The only thing I have in mind would require another interface for the field s
There was a problem hiding this comment.
I'm with you. Let's keep slimming to the minimum requirements and then we have to refactor. As mentioned I'm not happy about the Element and I think we can come up with a better solution that would satisfy this as well.
|
Thank you! I'm still experiencing with this and I need to use your other commit in these. I will rebase after that because clearly there are some hard-coded stuff that you've mentioned that needs fixing. |
This PR changes the way the Vector is managed. A new TableVector is added that can sort based on a column and a separator.
Todo