Skip to content

Vector datatype support #2634

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

muskan124947
Copy link
Contributor

@muskan124947 muskan124947 commented Mar 18, 2025

The VECTOR data type is designed to store multi-dimensional arrays of numerical data, which are essential for various machine learning and data analysis applications. By introducing the VECTOR type, we aim to enhance the capabilities of SQL Server and Azure SQL Database, enabling users to efficiently store, retrieve, and manipulate vector data.

Vector Design doc -> https://microsoft.sharepoint.com/:w:/t/SQLAI-Projects/Ef632IF6mp1Mqj0OyKSrI1cBWNwxtM29eM31TZ3wUPgf4w?e=Ztgtq2&wdOrigin=LOOP-WEB.PREVIEW.NT&wduihid=ea3cf057-9d25-4ea5-bc4f-518a7ff343a7&web=1&ct=1741863974397

Copy link

codecov bot commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 22.53521% with 110 lines in your changes missing coverage. Please review.

Project coverage is 51.40%. Comparing base (a0ea23a) to head (7ede77d).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/main/java/microsoft/sql/Vector.java 0.00% 43 Missing ⚠️
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java 4.54% 19 Missing and 2 partials ⚠️
...rc/main/java/com/microsoft/sqlserver/jdbc/dtv.java 5.55% 16 Missing and 1 partial ⚠️
...m/microsoft/sqlserver/jdbc/SQLServerResultSet.java 0.00% 11 Missing and 1 partial ⚠️
...in/java/com/microsoft/sqlserver/jdbc/IOBuffer.java 0.00% 7 Missing ⚠️
.../microsoft/sqlserver/jdbc/SQLServerConnection.java 75.00% 5 Missing ⚠️
...n/java/com/microsoft/sqlserver/jdbc/DataTypes.java 83.33% 3 Missing ⚠️
...n/java/com/microsoft/sqlserver/jdbc/Parameter.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2634      +/-   ##
============================================
- Coverage     51.59%   51.40%   -0.20%     
+ Complexity     3999     3994       -5     
============================================
  Files           147      148       +1     
  Lines         33706    33841     +135     
  Branches       5631     5648      +17     
============================================
+ Hits          17391    17396       +5     
- Misses        13866    13991     +125     
- Partials       2449     2454       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Ananya2 and others added 13 commits March 21, 2025 20:38
…o decode/encode data, along with additional features like dimensionType, and dimensionCount.
…roved constructor parameter positioning for clarity, removed VectorInputStream
…nsionCount for vector, fixed datatype and tdsType mappings along with setting correct typeDefinition in Parameter class.
…ector handling to ensure insertion of vector data.
…a, and measured performance for max-length vectors at varying scales — 100, 1K, 10K, 100K, and 1M records.
…ull insertion correctly, added test case to validate null insertion and test vectorsupport feature enabled. Removed getVector, setVector and updated existing setObject() along with adding helper functions in Vector class.
…CopyForBatchInsert and added support for vector datatype in SQLServerBulkCSVFileRecord along with updating strategy for Vector type in dtv.java
JDBCType.Category.TIMESTAMP, JDBCType.Category.NCHARACTER, JDBCType.Category.GUID)),

VECTOR(SSType.Category.VECTOR, EnumSet.of(JDBCType.Category.BINARY, JDBCType.Category.LONG_BINARY,
JDBCType.Category.CHARACTER, JDBCType.Category.LONG_CHARACTER, JDBCType.Category.BLOB, JDBCType.Category.VECTOR));
Copy link
Contributor Author

@muskan124947 muskan124947 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clean getter, setter and update conversions in DataTypes class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test cases to validate all data types in this list, including their conversions?

muskan124947 and others added 8 commits April 18, 2025 16:00
… for read, insert, bulkCopy, bulkCopyCSV operation and corrected typeInfo and Parameter.java along with adding test cases for BulkCopy
…ated insertion and retrieval of VECTOR data, and updated stored procedure and TVP definitions to use scale as dimensionType and precision as dimensionCount.
…CopyForBatchInsert along with updating getObject() when data is null
…d functions (UDFs); fixed previously failing test case.
… by returning normalised vector, global and local temp tables
… varbinary combinations -vector and varchar combinations
…ios along with adding test for setObject(getObject())
…peration is performed for mismatched source and destination tables
JDBCType.Category.TIME, JDBCType.Category.TIMESTAMP, JDBCType.Category.CHARACTER,
JDBCType.Category.LONG_CHARACTER, JDBCType.Category.NCHARACTER, JDBCType.Category.LONG_NCHARACTER,
JDBCType.Category.BINARY, JDBCType.Category.LONG_BINARY, JDBCType.Category.BLOB, JDBCType.Category.GUID,
JDBCType.Category.SQL_VARIANT, JDBCType.Category.VECTOR));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test cases to validate all data types in this list, including their conversions? It would also help to understand how this mapping is utilized in the execution flow.

SSType.Category.CHARACTER, SSType.Category.LONG_CHARACTER, SSType.Category.NCHARACTER,
SSType.Category.LONG_NCHARACTER, SSType.Category.XML, SSType.Category.BINARY,
SSType.Category.LONG_BINARY, SSType.Category.UDT, SSType.Category.TIMESTAMP, SSType.Category.GUID,
SSType.Category.SQL_VARIANT, SSType.Category.VECTOR));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[As above] Can we add test cases to validate all data types in this list, including their conversions? It would also help to understand how this mapping is utilized in the execution flow.

}
}
break;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write in more concise way:

case VECTOR:
byte[] bValue = (currentObject == null) ? null : (byte[]) currentObject;
writeShort((short) (bValue == null ? -1 : bValue.length));
if (bValue != null) {
writeBytes(bValue);
}
break;

writeShort((short) (8 + (scale * precision))); //maxLength
byte scaleByte = (byte) (scale == 2 ? 0x01 : 0x00);
writeByte((byte) scaleByte); //dimension type
writeShort((short) -1); // actual len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the writeTVPColumnMetaData method contains duplicate code. If possible, consider moving the common logic to a utility method to improve reusability and maintainability.

}

String vectorData = data[pair.getKey() - 1].trim();
microsoft.sql.Vector.VectorDimensionType vectorDimensionType = microsoft.sql.Vector.getVectorDimensionType(cm.scale);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to import the microsoft.sql.Vector package?

@@ -318,6 +318,21 @@ private void internalAddrow(JDBCType jdbcType, Object val, Object[] rowValues,
internalAddrow(internalJDBCType, val, rowValues, pair);
break;

case VECTOR:
nValueLen = ((microsoft.sql.Vector) val).getActualLength();
int scale = ((microsoft.sql.Vector) val).getScale() == 0x01 ? 2 : 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the hardcoded values 2 and 4 to constant variables with meaningful names?

typeInfo.displaySize = typeInfo.maxLength;
typeInfo.ssType = SSType.VECTOR;
int scaleByte = tdsReader.readUnsignedByte(); // Read the dimension type (scale)
typeInfo.scale = (scaleByte == 0) ? 4 : 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the constant that will be created as suggested in the comment above.

Comment on lines +17 to +20
public VectorDimensionType vectorType;
public int dimensionCount;

public float[] data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can those variables be made private?

return null;
}
if (bytes.length < 8) {
throw new IllegalArgumentException("Byte array length must be at least 8 bytes.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the below error message in SQLServerResource.java
{"R_InvalidVectorLength", "Byte array length must be at least "{0}" bytes."}

throw new IllegalArgumentException("Byte array length must be at least 8 bytes.");
}
if (bytes.length % 4 != 0) {
throw new IllegalArgumentException("Byte array length must be a multiple of 4.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the below error message in SQLServerResource.java
{"R_InvalidVectorLength", "Byte array length must be at least "{0}" bytes."}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants